" /> net.wars: July 2008 Archives

« June 2008 | Main | August 2008 »

July 25, 2008


A certain amount of government and practical policy is being made these days based on the idea that you can take large amounts of data and anonymize it so researchers and others can analyze it without invading anyone's privacy. Of particular sensitivity is the idea of giving medical researchers access to such anonymized data in the interests of helping along the search for cures and better treatments. It's hard to argue with that as a goal - just like it's hard to argue with the goal of controlling an epidemic - but both those public health interests collide with the principle of medical confidentiality.

The work of Latanya Sweeney was I think the first hint that anonymizing data might not be so straightforward; I've written before about her work. This week, at the Privacy Enhancing Technologies Symposium in Leuven, Belgium (which I regrettably missed) researchers Arvind Narayanan and Vitaly Shmatikov from the University of Texas at Austin won an award sponsored by Microsoft for taking reidentifying supposedly anonymized data a step further.

The pair took a database released by the online DVD rental company Netflix last year as part of the $1 million Netflix Prize, a project to improve upon the accuracy of the system's predictions. You know the kind of thing, since it's built into everything from Amazon to Tivos - you give the system an idea of your likes and dislikes by rating the movies you've rented and the system makes recommendations for movies you'll like based on those expressed preferences. To enable researchers to work on the problem of improving these recommendations, Netflix released a dataset containing more than 100 million movie ratings contributed by nearly 500,000 subscribers between December 1999 and December 2005 with, as the service stated in its FAQ, all customer identifying information removed.

Maybe in a world where researchers only had one source of information that would be a valid claim. But just as Sweeney showed in 1997 that it takes very little in the way of public records to re-identify a load of medical data supplied to researchers in the state of Massachusetts, Narayananan and Shamtikov's work reminds us that we don't live in a world like that. For one thing, people tend disproportionately to rate their unusual, quirky favorites. Rating movies takes time; why spend it on giving The Lord of the Rings another bump when what people really need is to know about the wonders of King of Hearts, All That Jazz, and The Tall Blond Man with One Black Shoe? The consequence is that the Netflix dataset is what they call "sparse" - that is, there few subscribers have very similar records.

So: how much does someone need to know about you to identify a particular user from the database? It turns out, not much. The is the public ratings and dates at the Internet Movies Database, which include dates and real names. Narayanan and Shmatikov concluded that 99 percent of records could be uniquely identified from only eight matching ratings (of which two could be wrong); for 68 percent of the records you only need two (and reidentifying the rest becomes easier). And of course, if you know a little bit about the particular person whose record you want to identify things get a lot easier - the three movies I've just listed would probably identify me and a few of my friends.

Even if you don't care if your tastes in movies are private - and both US law and the American Library Association's take on library loan records would protect you more than you yourself would - there are couple of notable things here. First of all, the compromise last week whereby Google agreed to hand Viacom anonymized data on YouTube users isn't as good a deal for users as they might think. A really dedicated searcher might well think it worth the effort to come up with a way to re-identify the data - and so far rightsholders have shown themselves to be very dedicated indeed.

Second of all, the Thomas-Walport review on data-sharing actually recommends requiring NHS patients to agree to sharing data with medical researchers. There is a blithe assumption running through all the government policies in this area that data can be anonymized, and that as long as they say our privacy is protected it will be. It's a perfect example of what someone this week called "policy-based evidence-making".

Third of all, most such policy in this area assumes it's the past that matters. What may be of greater significance, as Narayanan and Shmatikov point out, is the future: forward privacy. Once a virtual identity has been linked to a real-world identity, that linkage is permanent. Yes, you can create a new virtual identity, but any slip that links it to either your previous virtual or your real-world identity blows your cover.

The point is not that we should all rush to hide our movie ratings. The point is that we make optimistic assumptions every day that the information we post and create has little value and won't come back to bite us on the ass. We do not know what connections will be possible in the future.

Wendy M. Grossman's Web site has an extensive archive of her books, articles, and music, and an archive of all the earlier columns in this series. Readers are welcome to post here, at net.wars home, at her personal blog, or by email to netwars@skeptic.demon.co.uk (but please turn off HTML).

July 18, 2008


This week the European Commission decided to ignore protests and economic evidence in favour of the record companies and adopted a proposal for term extension in sound recordings (PDF) to 95 years instead of the current 50 years.

There is one almost good thing in the proposal: that when a recording is due to enter the public domain the publisher has one year to use it or lose it - and losing it means the rights will revert to the performer. If the performer also doesn't use it, then it comes into the public domain. I say "almost good" because that reversion of rights needs to happen much earlier in the life of a recording; rights should revert, as they do in book publishing, when the company takes a recording out of commercial release.

The rest is a batch of justifications for giving the record companies what they want based on the very real and very terrible economics of most musical careers. These bad arguments are begetting wrongheaded debate.

Ars Technica, for example, has chosen to complain that the proposed extension smacks of grotesque entitlement. But this is wholly unfair: in general the push for term extension is not coming from musicians but from record companies. And that article's complaint that musicians should have made provision for their old age the way everyone else has to is undercut by the figures quoted a few paragraphs later. The couple of thousand pounds that represents the high end of what an "average" pensioner musician might receive according to McCreevy's estimates might help them buy a new condensing boiler one year when it gets cold. It's not going to make the difference between poverty and a comfortable life.

Even if we were talking about riches, though, that particular argument, if followed to its logical conclusion, would do away with copyright altogether: if it sounds like special pleading to ask for term extension to fund retirement, then surely the same must be true of the money received during the first 50 years of copyright.

PWJs - people with jobs - may not see why a musician recording a song should be different than a plumber installing a bathtub. Even some of our own don't. The late journalist John Diamond used to say you don't pay the plumber royalties every time you use a bathtub he's installed, not even if people buy tickets for the privilege of seeing it. But royalties are a trade-off; in return for shouldering the considerable risk of a creative career creators get the right to exploit their work. It is some incentive. But it's not why people try for artistic careers, since people do this - and fail at it - by the thousand. The odds are terrible: the proposal itself notes that only 5 percent of performers make a living from their profession. But over all the point all musicians would make is that if someone is going to be making money out of their work they feel they deserve some of it.

Royalties from sound recordings are part of what at least theoretically makes a full-time professional musical career possible. This is the bargain society made in allowing copyright in sound recordings in the first place. It is not about paying people pensions. The proposal estimates that term extension will mean continued payments to approximately 7,000 musicians in the larger EU countries; if we simply want to support retired musicians it would be cheaper to let the state give them a handout.

A more reasonable argument is to say that when today's 70-somethings went into the studio in 1958 - and when the record companies paid them to do so - they made a contract with society that after 50 years their recordings would go into the public domain. There certainly is now no question of incentive: if term is extended they can't retroaactively decide to have recorded more back then. Generations of musicians since have gone on recording and the record companies are not complaining that it's difficult to find people who will accept "only" a 50-year term of copyright. Whining that they don't like the terms of the social contract now should bear as little weight as someone in a divorce hearing claiming they signed the pre-nup without reading it.

But these are negative arguments. The more positive arguments have come from for example the Gowers report, which argued against term extension on economic grounds. These are being ignored. My favourite bit of the Commission's proposal is the completely backward bit that argues that there will be no difference to consumers because public domain recordings do not sell for less than copyrighted ones - and anyway it doesn't matter because there are plenty of alternative noises people can listen to.

The good news is that this is the last time the recording industry will be able to claim that it is lobbying for term extension to benefit artists. Unless the anti-aging folks get a miracle together, 40 years from now, when these recordings are nearing their new expiration date, all the artists will be dead. Trying to garner sympathy for their heirs will be a lot weaker argument, emotionally speaking.

Wendy M. Grossman's Web site has an extensive archive of her books, articles, and music, and an archive of all the earlier columns in this series. Readers are welcome to post here, at net.wars home, at her personal blog, or by email to netwars@skeptic.demon.co.uk (but please turn off HTML).

July 11, 2008

Voters for sale

It must be hard to be the Direct Marketing Association. All individuals in the DMA must know that they themselves hate getting marketing calls during dinner, weeding the real post out from the junk mail, and constantly having to unsubscribe from email lists that they're only on because they had the misfortune to buy something from the sender. Collectively, the DMA remains firmly convinced that people want advertising really, it just has to be targeted right (at which point people no longer call it advertising). It must be very hard for everyone involved to maintain this level of cognitive dissonance.

And it leads them to do things as an organization that probably each individual would oppose if they were working for someone else. Today the DMA is opposing the withdrawal of the edited electoral register, a recommendation appearing in the Data-Sharing Review, published by the Ministry of Justice and written by Information Commissioner Richard Thomas and Dr Mark Walport. There's a lot of interesting stuff to digest; the electoral register issue is one of the simpler bits.

To recap: historically the UK, like the US, treated the electoral rolls as public information. In the UK every household gets sent a canvassing form once a year that comes with a stern warning that you are legally required to register.

Starting in the 1830s, the British electoral rolls have been available for public inspection and sale; what a godsend for direct marketers as their industry grew up. As of 2001, electoral registration officers are required to sell a copy of the register at a specified price to anyone who wants it under Regulation 48 of the Representation of the People (England and Wales) Regulations. Almost immediately there were objections on privacy grounds, most notably a complaint by Pontefract-based Brian Robertson, a retired accountant, against Wakefield City Council because there was no provision for him to prevent the sale of his information for commercial use. He refused to register, took them to court - and won.

The regulations were promptly amended to require councils to maintain two registers: the full public register and an edited version that could be sold to commercial organizations and others and to which voters would be added automatically - but with the right to opt out. The first edited registers appeared in 2002.

And there was a lot of confusion. The canvassing forms that first year didn't make it very clear what the edited register was, and it was easy to make the mistake of thinking that if you opted out you would not be able to vote. Subsequent years saw amended forms that made it more clear just what you were opting out of. And the results really shouldn't surprise anyone: in the latest rolls 40 percent of voters opted out, double the percentage in the first years. Given that, it's not entirely clear why the government needs to withdraw the register. If they just wait a few more years everyone of any value to marketers will have opted out, and the edited rolls will become useful again as a list of all the people who aren't worth marketing to. Anyone left presumably either didn't understand the form, so lonely they enjoy the attention, or so mentally afflicted that someone else filled out the form for them.

The full register is available - at least in theory - only to a select group of people and organizations: political parties for electoral purposes, credit reference agencies to check names and addresses when people apply for credit, and law enforcement. The main purchasers of the edited register, the Thomas-Walport report notes, are direct marketing companies and companies compiling directories.

Thomas and Walport disapprove of its existence on these grounds: "It sends a particularly poor message to the public that personal information collected for something as vital as participation in the democratic process can be sold to 'anyone for any purpose'."

A key data protection principle is that a chance of use in personal information requires the consent of the individual. If ever there were a more significant change of use than selling information collected to enable people to vote to third party companies for general marketing purposes, I don't know what it would be.

The DMA's objection to its withdrawal is that its members won't be able to clean their lists and keep them accurate and up-to-date. And it happily sees the direct mail envelope as more than half full: "Some householders have opted out, but around 60 petrcent have chosen to remain on the edited register." They don't believe the forms are all confusing. And the DMA plays the environmental card: targeting reduces the amount of waste paper the industry produces.

One issue neither group tackles is whether the register represents a significant source of income for councils. How much are we willing to pay for privacy. This warrants more research; a quick glance turns up figures from Bath and North East Somerset Counil. In 2005-2006, the council netted £1,553 and £380.50 for the sales of the full and edited registers respectively; in 2006-2007 those figures were £1558.50 and £681. If that's indicative of national trends, we can afford it, especially given the savings on administering the opt-out process.

"The edited register does serve a purpose," the DMA concludes, "and so should not be abolished." A purpose, yes. Just not our purpose.

Wendy M. Grossman's Web site has an extensive archive of her books, articles, and music, and an archive of all the earlier columns in this series. Readers are welcome to post here, at net.wars home, at her personal blog, or by email to netwars@skeptic.demon.co.uk (but please turn off HTML).

July 4, 2008

The new normal

The (only) good thing about a war is you can tell when it's over.

The problem with the "War on Terror" is that terrorism is always with us, as Liberty's director, Shami Chakrabarti, said yesterday at the Homeland and Border Security 08 conference. "I do think the threat is very serious. But I don't think it can be addressed by a war." Because, "We, the people, will not be able to verify a discernible end."

The idea that "we are at war" has justified so much post 9/11 legislation, from the ID card (in the UK) and Real ID (US) to the continued expansion of police powers.

How long can you live in a state of emergency before emergency becomes the new normal? If there is no end, when do you withdraw the latitude wartime gives a government?

Several of yesterday's speakers talked about preserving "our way of life" while countering the threat with better security. But "our way of life" is a moving target.

For example, Baroness Pauline Neville-Jones, the shadow security minister, talked about the importance of controlling the UK's borders. "Perimeter security is absolutely basic." Her example: you can't go into a building without having your identity checked. But it's not so long ago - within the 18 years I've been living in London - that you could do exactly that, even sometimes in central London. In New York, of course, until 9/11, everything was wide open; these days midtown Manhattan makes you wait in front of barriers while you're photographed, checked, and treated with great suspicion if the person you're visiting doesn't answer the phone.

Only seven years ago, flying did not involve two hours of standing in line. Until January, tourists do not have to register three days before flying to the US for pre-screening.

It's not clear how much would change with a Conservative government. "There is a very great deal by this government we would continue," said Neville-Jones. But, she said, besides trackling threats, whether motivated (terrorists) or not (floods, earthquakes, "we are also at any given moment in the game of deciding what kind of society we want to have and what values we want to preserve." She wants "sustainable security, predicated on protecting people's freedom and ensuring they have more, not less, control over their lives." And, she said, "While we need protective mechanisms, the surveillance society is not the route down which we should go. It is absolutely fundamental that security and freedom lie together as an objective."

To be sure, Neville-Jones took issue with some of the present government's plans - the Conservatives would not, she said, go ahead with the National Identity Register, and they favour "a more coherent and wide-ranging border security force". The latter would mean bringing together many currently disparate agencies to create a single border strategy. The Conservatives also favour establishing a small "homeland command for the armed forces" within the UK because, "The qualities of the military and the resources they can bring to complex situations are important and useful." At the moment, she said, "We have to make do with whoever happens to be in the country."

OK. So take the four core elements of the national security strategy according to Admiral Lord Alan West, a Parliamentary under-secretary of state at the Home Office: pursue, protect, prepare, and prevent. "Prevent" is the one that all this is about. If we are in wartime, and we know that any measure that's brought in is only temporary, our tolerance for measures that violate the normal principles of democracy is higher.

Are the Olympics wartime? Security is already in the planning stages, although, as Tarique Ghaffur pointed out, the Games are one of several big events in 2012. And some events like sailing and Olympic football will be outside London, as will 600 training camps. Add in the torch relay, and it's national security.

And in that case, we should be watching very closely what gets brought in for the Olympics, because alongside the physical infrastructure that the Games always leave behind - the stadia and transport - may be a security infrastructure that we wouldn't necessarily have chosen for daily life.

As if the proposals in front of us aren't bad enough. Take for example, the clause of the counterterrorism bill (due for its second reading in the Lords next week) that would allow the authorities to detain suspects for up to 42 days without charge. Chakrabarti lamented the debate over this, which has turned into big media politics.

"The big frustration," she said, "is that alternatives created by sensible, proportionate means of early intervention are being ignored." Instead, she suggested, make the data legally collected by surveillance and interception admissible in fair criminal trials. Charge people with precursor terror offenses so they are properly remanded in custody and continue the investigation for the more serious plot. "That is a way of complying with ancient principles that you should know what you are accused of before being banged up, but it gives the police the time and powers they need."

Not being at war gives us the time to think. We should take it.

Wendy M. Grossman's Web site has an extensive archive of her books, articles, and music, and an archive of all the earlier columns in this series. Readers are welcome to post here, at net.wars home, at her personal blog, or by email to netwars@skeptic.demon.co.uk (but please turn off HTML).