« In sync | Main | Homeland insecurity »

Signaling intelligence

smithsam-ASIdemo-slides.pngLast month, the British Home Office announced that it had a tool that can automatically detect 94% of Daesh propaganda with 99.995% accuracy. Sophos summarizes the press release to say that only 50 out of 1 million videos would require human review.

"It works by spotting subtle patterns in the extremist videso that distinguish them from normal content..." Mark Werner, CEO of London-based ASI Data Science, the company that developed the classifier, told Buzzfeed.

Yesterday, ASI, which numbers Skype co-founder Jaan Tallinn among its investors, presented its latest demo day in front of a packed house. Most of the lightning presentations focused on various projects its Fellows have led using its tools in collaboration with outside organizations such as Rolls Royce and the Financial Conduct Authority. Warner gave a short presentation of the Home Office extremism project that included little more detail than the press reports a month ago, to which my first reaction was: it sounds impossible.

That reaction is partly due to the many problems with AI, machine learning, and big data that have surfaced over the last couple of years. Either there are hidden biases, or the media reports are badly flawed, or the system appears to be telling us only things we already know.

Plus, it's so easy - and so much fun! - to mock the flawed technology. This week, for example, neural network trainer Janelle Shane showed off the results of some of her pranks. After confusing image classifiers with sheep that don't exist, goats in trees (birds! or giraffes!) and sheep painted orange (flowers!), she concludes, "...even top-notch algorithms are relying on probability and luck." Even more than humans, it appears that automated classifiers decide what they see based on what they expect to see and apply probability. If a human is holding it, it's probably a cat or dog; if it's in a tree it's not going to be a goat. And so on. The experience leads Shane to surmise that surrealism might be the way to sneak something past a neural net.

Some of this approach appears to be what ASI's classifier probably also does (we were shown no details). As Sophos suggests, a lot of the signals ASI's algorithm is likely to use have nothing to do with the computer "seeing" or "interpreting" the images. Instead, it likely looks for known elements such as logos and facial images matched against known terrorism photos or videos. In addition it can assess the cluster of friends surrounding the account that's posted the video and look for profile information that shows the source is one that has been known to post such material in the past. And some will be based on analyzing the language used in the video. From what ASI was saying, it appears that the claim the company is making is fairly specific: the algorithm is supposed to be able to detect (specifically) Daesh videos, with a false positive rate of 0.005%, and 94% of true positives.

These numbers - assuming they're not artifacts of computerish misunderstanding about what it's looking for - of course represent tradeoffs, as Patrick Ball explained to us last year. Do we want the algorithm to block all possible Daesh videos? Or are we willing to allow some through in the interests of honoring the value of freedom of expression and not blocking masses of perfectly legal and innocent material? That policy decision is not ASI's job.

What was more confusing in the original reports is that the training dataset was said to have been "over 1,000 videos". That seems an incredibly small sample for testing a classifier that's going to be turned loose on a dataset of millions. At the demonstration, Warner's one new piece of information is that because that training set was indeed small, the project developed "synthetic data" to enlarge the training set to sufficient size. As gaming-the-system as that sounds, creating synthetic data to augment training data is a known technique. Without knowing more about the techniques ASI used to create its synthetic data it's hard to assess that work.

We would feel a lot more certain of all of these claims if the classifier had been through an independent peer review. The sensitivity of the material involved makes this tricky; and if there has been an outside review we haven't been told about it.

But beyond that, the project to remove this material rests on certain assumptions. As speakers noted at the first conference run by VOX-Pol, an academic research network studying violent online political extremism, the "lone wolf" theory posits that individuals can be radicalized at home by viewing material on the internet. The assumption that this is true underpins the UK's censorship efforts. Yet this theory is contested: humans are highly social animals. Radicalization seems unlikely to take place in a vacuum. What - if any - is the pathway from viewing Daesh videos to becoming a terrorist attacker?

All these questions are beyond ASI's purview to answer. They'd probably be the first to say: they're only a hill of technology beans being asked to solve a mountain of social problems.

Illustrations: Slides from the demonstration (Sam Smith).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Stories about the border wars between cyberspace and real life are posted occasionally during the week at the net.wars Pinboard - or follow on Twitter.


TrackBack URL for this entry:

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)