" /> net.wars: March 2018 Archives

« February 2018 | Main

March 16, 2018

Homeland insecurity

United_Kingdom_foreign_born_population_by_country_of_birth.pngTo the young people," a security practitioner said at a recent meeting, speaking of a group he'd been working with, "it's life lived on their phone."

He was referring to the tendency for adults to talk to kids about fake news, or sexting, or sexual abuse and recruitment, and so on as "online" dangers the adults want to protect them from. But, as this practitioner was trying to explain (and we have said here before), "online" isn't separate to them. Instead, all these issues are part of the context of pressures, relationships, economics, and competition that makes up their lives. This will become increasingly true as widely deployed sensors and hybrid cyber-physical systems and tracking become the norm.

This is a real generation gap. Older adults have taken on board each of these phenomena as we've added it into our existing understanding of the world. Watching each arrive singly over time allows the luxury of consideration and the mental space in which to plot a strategy. If you're 12, all of these things are arriving at once as pieces that are coalescing into your picture of the world. Even if you only just finally got your parents to let you have your own phone you've been watching videos on YouTube, FaceTiming your friends, and playing online games all your life.

An important part of "life lived on the phone" is in the UK's data protection bill implementation of the General Data Protection Regulation, now going through Parliament. The bill carves out some very broad exemptions. Most notably, opposed by the Open Rights Group and the3million, the bill would remove a person's rights as a data subject in the interests of "effective immigration control". In other words, under this exemption the Home Office could make decisions about where and whether you were allowed to live but never have to tell you the basis for its decisions. Having just had *another* long argument with a different company about whether or not I've ever lived in Iowa, I understand the problem of being unable to authenticate yourself because of poor-quality data.

It's easy for people to overlook laws that "only" affect immigrants, but as Gracie Mae Bradley, an advocacy and policy officer, made clear at this week's The State of Data 2018 event, hosted by Jen Persson, one of the consequences is to move the border from Britain's ports into its hospitals, schools, and banks, which are now supposed to check once a quarter that their 70 million account holders are legitimate. NHS Digital is turning over confidential patient information to help the Home Office locate and deport undocumented individuals. Britain's schools are being pushed to collect nationality. And, as Persson noted, remarkably few parents even know the National Pupil Database exists, and yet it catalogues highly detailed records of every schoolchild.

"It's obviously not limited to immigrants," Bradley said of the GDPR exemption. "There is no limit on the processes that might apply this exemption". It used to be clear when you were approaching a national border; under these circumstances the border is effectively gummed to your shoe.

The data protection bill also has the usual broad exemptions for law enforcement and national security.

Both this discussion (implicitly) and the security conversation we began with (explicitly) converged on security as a felt, emotional state. Even a British citizen living in their native country in conditions of relative safety - a rich country with good health care, stable governance, relatively little violence, mostly reasonable weather - may feel insecure if they're constantly being required to prove the legitimacy of their existence. Conversely, people may live in objectively more dangerous conditions and yet feel more secure because they know the local government is not eying them suspiciously with a view to telling them to repatriate post-haste.

Put all these things together with other trends, and you have the potential for a very high level of social insecurity that extends far outwards from the enemy class du jour, "illegal immigrants". This in itself is a damaging outcome.

And the potential for social control is enormous. Transport for London is progressively eliminating both cash and its Oyster payment cards in favor of direct payment via credit or debit card. What happens to people who one quarter fail the bank's inspection. How do they pay the bus or tube fare to get to work?

Like gender, immigration status is not the straightforward state many people think. My mother, brought to the US when she was four, often talked about the horror of discovering in her 20s that she was stateless: marrying my American father hadn't, as she imagined, automatically made her an American, and Switzerland had revoked her citizenship because she had married a foreigner. In the 1930s, she was naturalized without question. Now...?

Trying to balance conflicting securities is not new. The data protection bill-in-progress offers the opportunity to redress a serious imbalance, which Persson called, rightly, a "disconnect between policy, legislation, technological change, and people". It is, as she and others said, crucial that the balance of power that data protection represents not be determined by a relatively small, relatively homogeneous group.

Illustrations: 2008 map of nationalities of UK residents (via Wikipedia

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Stories about the border wars between cyberspace and real life are posted occasionally during the week at the net.wars Pinboard - or follow on Twitter.

March 9, 2018

Signaling intelligence

smithsam-ASIdemo-slides.pngLast month, the British Home Office announced that it had a tool that can automatically detect 94% of Daesh propaganda with 99.995% accuracy. Sophos summarizes the press release to say that only 50 out of 1 million videos would require human review.

"It works by spotting subtle patterns in the extremist videso that distinguish them from normal content..." Mark Werner, CEO of London-based ASI Data Science, the company that developed the classifier, told Buzzfeed.

Yesterday, ASI, which numbers Skype co-founder Jaan Tallinn among its investors, presented its latest demo day in front of a packed house. Most of the lightning presentations focused on various projects its Fellows have led using its tools in collaboration with outside organizations such as Rolls Royce and the Financial Conduct Authority. Warner gave a short presentation of the Home Office extremism project that included little more detail than the press reports a month ago, to which my first reaction was: it sounds impossible.

That reaction is partly due to the many problems with AI, machine learning, and big data that have surfaced over the last couple of years. Either there are hidden biases, or the media reports are badly flawed, or the system appears to be telling us only things we already know.

Plus, it's so easy - and so much fun! - to mock the flawed technology. This week, for example, neural network trainer Janelle Shane showed off the results of some of her pranks. After confusing image classifiers with sheep that don't exist, goats in trees (birds! or giraffes!) and sheep painted orange (flowers!), she concludes, "...even top-notch algorithms are relying on probability and luck." Even more than humans, it appears that automated classifiers decide what they see based on what they expect to see and apply probability. If a human is holding it, it's probably a cat or dog; if it's in a tree it's not going to be a goat. And so on. The experience leads Shane to surmise that surrealism might be the way to sneak something past a neural net.

Some of this approach appears to be what ASI's classifier probably also does (we were shown no details). As Sophos suggests, a lot of the signals ASI's algorithm is likely to use have nothing to do with the computer "seeing" or "interpreting" the images. Instead, it likely looks for known elements such as logos and facial images matched against known terrorism photos or videos. In addition it can assess the cluster of friends surrounding the account that's posted the video and look for profile information that shows the source is one that has been known to post such material in the past. And some will be based on analyzing the language used in the video. From what ASI was saying, it appears that the claim the company is making is fairly specific: the algorithm is supposed to be able to detect (specifically) Daesh videos, with a false positive rate of 0.005%, and 94% of true positives.

These numbers - assuming they're not artifacts of computerish misunderstanding about what it's looking for - of course represent tradeoffs, as Patrick Ball explained to us last year. Do we want the algorithm to block all possible Daesh videos? Or are we willing to allow some through in the interests of honoring the value of freedom of expression and not blocking masses of perfectly legal and innocent material? That policy decision is not ASI's job.

What was more confusing in the original reports is that the training dataset was said to have been "over 1,000 videos". That seems an incredibly small sample for testing a classifier that's going to be turned loose on a dataset of millions. At the demonstration, Warner's one new piece of information is that because that training set was indeed small, the project developed "synthetic data" to enlarge the training set to sufficient size. As gaming-the-system as that sounds, creating synthetic data to augment training data is a known technique. Without knowing more about the techniques ASI used to create its synthetic data it's hard to assess that work.

We would feel a lot more certain of all of these claims if the classifier had been through an independent peer review. The sensitivity of the material involved makes this tricky; and if there has been an outside review we haven't been told about it.

But beyond that, the project to remove this material rests on certain assumptions. As speakers noted at the first conference run by VOX-Pol, an academic research network studying violent online political extremism, the "lone wolf" theory posits that individuals can be radicalized at home by viewing material on the internet. The assumption that this is true underpins the UK's censorship efforts. Yet this theory is contested: humans are highly social animals. Radicalization seems unlikely to take place in a vacuum. What - if any - is the pathway from viewing Daesh videos to becoming a terrorist attacker?

All these questions are beyond ASI's purview to answer. They'd probably be the first to say: they're only a hill of technology beans being asked to solve a mountain of social problems.

Illustrations: Slides from the demonstration (Sam Smith).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Stories about the border wars between cyberspace and real life are posted occasionally during the week at the net.wars Pinboard - or follow on Twitter.

March 2, 2018

In sync

Discarding images-King David music.jpgUntil Wednesday, I was not familiar with the use of "sync" to stand for a music synchronization license - that is, a license to use a piece of music in a visual setting such as a movie, video game, or commercial. The negotiations involved can be Byzantine and very, very slow, in part because the music's metadata is so often wrong or missing. In one such case, described at Music 4.5's seminar on developing new deals and business models for sync (Flash), it took ten years to get the wrong answer from a label to the apparently simple question: who owns the rights to this track on this compilation album?

The surprise: this portion of the music business is just as frustrated as activists with the state of online copyright enforcement. They don't love the Digital Millennium Copyright Act (2000) any more than we do. We worry about unfair takedowns of non-infringing material and bans on circumvention tools; they hate that the Act's Safe Harbor grants YouTube and Facebook protection from liability as long as they remove content when told it's infringing. Google's automated infringement detection software, ContentID, I heard Wednesday, enables the "value gap", which the music industry has been fretting about for several years now because the sites have no motivation to create licensing systems. There is some logic there.

However, where activists want to loosen copyright, enable fair use, and restore the public domain, they want to dump Safe Harbor, either by developing a technological bypass; or change the law; or by getting FaceTube to devise a fairer, more transparent revenue split. "Instagram," said one, "has never paid the music industry but is infringing copyright every day."

To most of us, "online music" means subscription-based streaming services like Spotify or download services like Amazon and iTunes. For many younger people, especially Americans though, YouTube is their jukebox. Pex estimates that 84% of YouTube videos contain at least ten seconds of music. Google says ContentID matches 99.5% of those, and then they are either removed or monetized. But, Pex argues, 65% of those videos remain unclaimed and therefore provide no revenue. Worse, as streaming grows, downloads are crashing. There's a detectable attitude that if they can fix licensing on YouTube they will have cracked it for all sites hosting "creator-generated content".

It's a fair complaint that ContentID was built to protect YouTube from liability, not to enable revenues to flow to rights holders. We can also all agree that the present system means millions of small-time creators are locked out of using most commercial music. The dancing baby case took eight years to decide that the background existence of a Prince song in a 29-second home video of a toddler dancing was fair use. But sync, too, was designed for businesses negotiating with businesses. Most creators might indeed be willing to pay to legally use commercial music if licensing were quick, simple, and cheap.

There is also a question of whether today's ad revenues are sustainable; a graphic I can't find showed that the payout per view is shrinking. Bloomberg finds that increasingly winning YouTubers are taking all with little left for the very long tail.

The twist in the tale is this. MP3 players unbundled albums into songs as separate marketable items. Many artists were frustrated by the loss of control inherent in enabling mix tapes at scale. Wednesday's discussion heralded the next step: unbundling the music itself, breaking it apart into individual beats, phrases and bars, each licensable.

One speaker suggested scenarios. The "content" you want to enjoy is 42 minutes long but your commute is only 38 minutes. You might trim some "unnecessary dialogue" and rearrange the rest so now it fits! My reaction: try saying "unnecessary dialogue" to Aaron Sorkin and let's see how that goes.

I have other doubts. I bet "rearranging" will take longer than watching the four minutes. Speeding up the player slightly achieves the same result, and you can do that *now* for free (try really blown it. More useful was the suggestion that hearing-impaired people could benefit from being able to tweak the mix to fade the background noise and music in a pub scene to make the actors easier to understand. But there, too, we actually already have closed captions. It's clear, however, that the scenarios may be wrong, but the unbundling probably isn't.

In this world, we won't be talking about music, but "music objects". Many will be very low-value...but the value of the total catalogue might rise. The BBC has an experiment up already: The Mermaid's Tears, an "object-based radio drama" in which you can choose to follow any one of the three characters to experience the story.

Smash these things together, and you see a very odd world coming at us. It's hard to see how fair use survives a system that aims to license "music objects" rather than "music". In 1990, Pamela Samuelson warned about copyright maximlism. That agenda does not appear to have gone away.

Illustrations: King David dancing before the Ark of the Covenant, 'Maciejowski Bible', Paris ca. 1240 (via Discarding Images.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Stories about the border wars between cyberspace and real life are posted occasionally during the week at the net.wars Pinboard - or follow on Twitter.