" /> net.wars: July 2011 Archives

« June 2011 | Main | August 2011 »

July 29, 2011

Name check

How do you clean a database? The traditional way - which I still experience from time to time from journalist directories - is that some poor schnook sits in an office and calls everyone on the list, checking each detail. It's an immensely tedious job, I'm sure, but it's a living.

The new, much cheaper method is to motivate the people in the database to do it themselves. A government can pass a law and pay benefits. Amazon expects the desire to receive the goods people have paid for to be sufficient. For a social network it's a little harder, yet Facebook has managed to get 750 million users to upload varying amounts of information. Google hopes people will do the same with Google+,

The emotional connections people make on social networks obscure their basic nature as databases. When you think of them in that light, and you remember that Google's chief source of income is advertising, suddenly Google's culturally dysfunctional decision to require real names on |Google+ makes some sense. For an advertising company,a fuller, cleaner database is more valuable and functional. Google's engineers most likely do not think in terms of improving the company's ability to serve tightly targeted ads - but I'd bet the company's accountants and strategists do. The justification - that online anonymity fosters bad behavior - is likely a relatively minor consideration.

Yet it's the one getting the attention, despite the fact that many people seem confused about the difference between pseudonymity, anonymity, and throwaway identity. In the reputation-based economy the Net thrives on, this difference matters.

The best-known form of pseudonymity is the stage name, essentially a form of branding for actors, musicians, writers, and artists, who may have any of a number of motives for keeping their professional lives separate from their personal lives: privacy for themselves, their work mates, or their families, or greater marketability. More subtly, if you have a part-time artistic career and a full-time day job you may not want the two to mix: will people take you seriously as an academic psychologist if they know you're also a folksinger? All of those reasons for choosing a pseudonym apply on the Net, where everything is a somewhat public performance. Given the harassment some female bloggers report, is it any wonder they might feel safer using a pseudonym?

The important characteristic of pseudonyms, which they share with "real names", is persistence. When you first encounter someone like GrrlScientist, you have no idea whether to trust her knowledge and expertise. But after more than ten years of blogging, that name is a known quantity. As GrrlScientist writes about Google's shutting down her account, it is her "real-enough" name by any reasonable standard. What's missing is the link to a portion of her identity - the name on her tax return, or the one her mother calls her. So what?

Anonymity has long been contentious on the Net; the EU has often considered whether and how to ban it. At the moment, the driving justification seems to be accountability, in the hope that we can stop people from behaving like malicious morons, the phenomenon I like to call the Benidorm syndrome.

There is no question that people write horrible things in blog and news site comments pages, conduct flame wars, and engage in cyber bullying and harassment. But that behaviour is not limited to venues where they communicate solely with strangers; every mailing list, even among workmates, has flame wars. Studies have shown that the cyber versions of bullying and harassment, like their offline counterparts, are most often perpetrated by people you know.

The more important downside of anonymity is that it enables people to hide, not their identity but their interests. Behind the shield, a company can trash its competitors and those whose work has been criticized can make their defense look more robust by pretending to be disinterested third parties.

Against that is the upside. Anonymity protects whistleblowers acting in the public interest, and protesters defying an authoritarian regime.

We have little data to balance these competing interests. One bit we do have comes from an experiment with anonymity conducted years ago on the WELL, which otherwise has insisted on verifying every subscriber throughout its history. The lesson they learned, its conferencing manager, Gail Williams, told me once, was that many people wanted anonymity for themselves - but opposed it for others. I suspect this principle has very wide applicability, and it's why the US might, say, oppose anonymity for Bradley Manning but welcome it for Egyptian protesters.

Google is already modifying the terms of what is after all still a trial service. But the underlying concern will not go away. Google has long had a way to link Gmail addresses to behavioral data collected from those using its search engine, docs, and other services. It has always had some ability to perform traffic analysis on Gmail users' communications; now it can see explicit links between those pools of data and, increasingly, tie them to offline identities. This is potentially far more powerful than anything Facebook can currently offer. And unlike government databases, it's nice and clean, and cheap to maintain.

Wendy M. Grossman's Web site has an extensive archive of her books, articles, and music, and an archive of all the earlier columns in this series.

July 22, 2011

Face to face

When, six weeks or so back, Facebook implemented facial recognition without asking anyone much in advance, Tim O'Reilly expressed the opinion that it is impossible to turn back the clock and pretend that facial recognition doesn't exist or can be stopped. We need, he said, to stop trying to control the existence of these technologies and instead concentrate on controlling the uses to which collected data might be put.

Unless we're prepared to ban face recognition technology outright, having it available in consumer-facing services is a good way to get society to face up to the way we live now. Then the real work begins, to ask what new social norms we need to establish for the world as it is, rather than as it used to be.

This reminds me of the argument that we should be teaching creationism in schools in order to teach kids critical thinking: it's not the only, or even best, way to achieve the object. If the goal is public debate about technology and privacy, Facebook isn't a good choice to conduct it.

The problem with facial recognition, unlike a lot of other technologies, is that it's retroactive, like a compromised private cryptography key. Once the key is known you haven't just unlocked the few messages you're interested in but everything ever encrypted with that key. Suddenly deployed accurate facial recognition means the passers-by in holiday photographs, CCTV images, and old TV footage of demonstrations are all much more easily matched to today's tagged, identified social media sources. It's a step change, and it's happening very quickly after a long period of doesn't-work-as-hyped. So what was a low-to-moderate privacy risk five years ago is suddenly much higher risk - and one that can't be withdrawn with any confidence by deleting your account.

There's a second analogy here between what's happening with personal data and what's happening to small businesses with respect to hacking and financial crime. "That's where the money is," the bank robber Willie Sutton explained when asked why he robbed banks. But banks are well defended by large security departments. Much simpler to target weaker links, the small businesses whose money is actually being stolen. These folks do not have security departments and have not yet assimilated Benjamin Woolley's 1990s observation that cyberspace is where your money is. The democratization of financial crime has a more direct personal impact because the targets are closer to home: municipalities, local shops, churches, all more geared to protecting cash registers and collection plates than to securing computers, routers, and point-of-sale systems.

The analogy to personal data is that until relatively recently most discussions of privacy invasion similarly focused on celebrities. Today, most people can be studied as easily as famous, well-documented people if something happens to make them interesting: the democratization of celebrity. And there are real consequences. Canada, for example, is doing much more digging at the border, banning entry based on long-ago misdemeanors. We can warn today's teens that raiding a nearby school may someday limit their freedom to travel; but today's 40-somethings can't make an informed choice retroactively.

Changing this would require the US to decide at a national level to delete such data; we would have to trust them to do it; and other nations would have to agree to do the same. But the motivation is not there. Judith Rauhofer, at the online behavioral advertising workshop she organised a couple of weeks ago, addressed exactly this point when she noted that increasingly the mantra of governments bent on surveillance is, "This data exists. It would be silly not to use it."

The corollary, and the reason O'Reilly is not entirely wrong, is that governments will also say, "This *technology* exists. It would be silly not to use it." We can ban social networks from deploying new technologies, but we will still be stuck with it when it comes to governments and law enforcement. In this, govermment and business interests align perfectly.

So what, then? Do we stop posting anything online on the basis of the old spy motto "Never volunteer information", thereby ending our social participation? Do we ban the technology (which does nothing to stop the collection of the data)? Do we ban collecting the data (which does nothing to stop the technology)? Do we ban both and hope that all the actors are honest brokers rather than shifty folks trading our data behind our backs? What happens if thieves figure out how to use online photographs to break into systems protected by facial recognition?

One common suggestion is that social norms should change in the direction of greater tolerance. That may happen in some aspects, although Anders Sandberg has an interesting argument that transparency may in fact make people more judgmental. But if the problem of making people perfect were so easily solved we wouldn't have spent thousands of years on it with very little progress.

I don't like the answer "It's here, deal with it." I'm sure we can do better than that. But these are genuinely tough questions. The start, I think, has to be building as much user control into technology design (and its defaults) as we can. That's going to require a lot of education, especially in Silicon Valley.

Wendy M. Grossman's Web site has an extensive archive of her books, articles, and music, and an archive of all the earlier columns in this series.

July 15, 2011

Dirty digging

The late, great Molly Ivins warns (in Molly Ivins Can't Say That, Can She?) about the risk to journalists of becoming "power groupies" who identify more with the people they cover than with their readers. In the culture being exposed by the escalating phone hacking scandals the opposite happened: politicians and police became "publicity groupies" who feared tabloid wrath to such an extent that they identified with the interests of press barons more than those of the constituents they are sworn to protect. I put the apparent inconsistency between politicians' former acquiescence and their current baying for blood down to Stockholm syndrome: this is what happens when you hold people hostage through fear and intimidation for a few decades. When they can break free, oh, do they want revenge.

The consequences are many and varied, and won't be entirely clear for a decade or two. But surely one casualty must have been the balanced view of copyright frequently argued for in this column. Murdoch's media interests are broad-ranging. What kind of copyright regime do you suppose he'd like?

But the desire for revenge is a really bad way to plan the future, as I said (briefly) on Monday at the Westminster Skeptics.

For one thing, it's clearly wrong to focus on News International as if Rupert Murdoch and his hired help were the only contaminating apple. In the 2006 report What price privacy now? the Information Commissioner listed 30 publications caught in the illegal trade in confidential information. News of the World was only fifth; number one, by a considerable way, was the Daily Mail (the Observer was number nine). The ICO wanted jail sentences for those convicted of trading in data illegally, and called on private investigators' professional bodies to revoke or refuse licenses to PIs who breach the rules. Five years later, these are still good proposals.

Changing the culture of the press is another matter.
When I first began visiting Britain in the late 1970s, I found the tabloid press absolutely staggering. I began asking the people I met how the papers could do it.

"That's because *we* have a free press," I was told in multiple locations around the country. "Unlike the US." This was only a few years after The Washington Post backed Bob Woodward and Carl Bernstein's investigation of Watergate, so it was doubly baffling.

Tom Stoppard's 1978 play Night and Day explained a lot. It dropped competing British journalists into an escalating conflict in a fictitious African country. Over the course of the play, Stoppard's characters both attack and defend the tabloid culture.

"Junk journalism is the evidence of a society that has got at least one thing right, that there should be nobody with power to dictate where responsible journalism begins," says the naïve and idealistic new journalist on the block.

"The populace and the popular press. What a grubby symbiosis it is," complains the play's only female character, whose second marriage - "sex, money, and a title, and the parrots didn't harm it, either" - had been tabloid fodder.

The standards of that time now seem almost quaint. In the movie Starsuckers, filmmaker Chris Atkins fed fabricated celebrity stories to a range of tabloids. All were published. That documentary also showed in action illegal methods of obtaining information. In 2009, right around the time The Press Complaints Commission was publishing a report concluding, "there is no evidence that the practice of phone message tapping is ongoing".

Someone on Monday asked why US newspapers are better behaved despite First Amendment protection and less constraint by onerous libel laws. My best guess is fear of lawsuits. Conversely, Time magazine argues that Britain's libel laws have encouraged illegal information gathering: publication requires indisputable evidence. I'm not completely convinced: the libel laws are not new, and economics and new media are forcing change on press culture.

A lot of dangers lurk in the calls for greater press regulation. Phone hacking is illegal. Breaking into other people's computers is illegal. Enforce those laws. Send those responsible to jail. That is likely to be a better deterrent than any regulator could manage.

It is extremely hard to devise press regulations that don't enable cover-ups. For example, on Wednesday's Newsnight, the MP Louise Mensch, head of the DCMS committee conducting the hearings, called for a requirement that politicians disclose all meetings with the press. I get it: expose too-cosy relationships. But whistleblowers depend on confidentiality, and the last thing we want is for politicians to become as difficult to access as tennis stars and have their contact with the press limited to formal press conferences.

Two other lessons can be derived from the last couple of weeks. The first is that you cannot assume that confidential data can be protected simply by access rules. The second is the importance of alternatives to commercial, corporate journalism. Tom Watson has criticized the BBC for not taking the phone hacking allegations seriously. But it's no accident that the trust-owned Guardian was the organization willing to take on the tabloids. There's a lesson there for the US, as the FBI and others prepare to investigate Murdoch and News Corp: keep funding PBS.

Wendy M. Grossman's Web site has an extensive archive of her books, articles, and music, and an archive of all the earlier columns in this series.

July 8, 2011

The grey hour

There is a fundamental conundrum that goes like this. Users want free information services on the Web. Advertisers will support those services if users will pay in personal data rather than money. Are privacy advocates spoiling a happy agreement or expressing a widely held concern that just hasn't found expression yet? Is it paternalistic and patronizing to say that the man on the Clapham omnibus doesn't understand the value of what he's giving up? Is it an expression of faith in human nature to say that on the contrary, people on the street are smart, and should be trusted to make informed choices in an area where even the experts aren't sure what the choices mean? Or does allowing advertisers free rein mean the Internet will become a highly distorted, discriminatory, immersive space where the most valuable people get the best offers in everything from health to politics?

None of those questions are straw men. The middle two are the extreme end of the industry point of view as presented at the Online Behavioral Advertising Workshop sponsored by the University of Edinburgh this week. That extreme shouldn't be ignored; Kimon Zorbas from the Internet Advertising Bureau, who voiced those views, also genuinely believes that regulating behavioral advertising is a threat to European industry. Can you prove him wrong? If you're a politician intent on reelection, hear that pitch, and can't document harm, do you dare to risk it?

At the other extreme end are the views of Jeff Chester, from the Center for Digital Democracy, who laid out his view of the future both here and at CFP a few weeks ago. If you read the reports the advertising industry produces for its prospective customers, they're full of neuroscience and eyeball tracking. Eventually, these practices will lead, he argues, to a highly discriminatory society: the most "valuable" people will get the best offers - not just in free tickets to sporting events but the best access to financial and health services. Online advertising contributed to the subprime loan crisis and the obesity crisis, he said. You want harm?

It's hard to assess the reality of Chester's argument. I trust his research through the documents of what advertising companies tell their customers. What isn't clear is whether the neuroscience these companies claim actually works. Certainly, one participant here says real neuroscientists heap scorn on the whole idea - and I am old enough to remember the mythology surrounding subliminal advertising.

Accordingly, the discussion here seems to me less of a single spectrum and more like a triangle, with the defenders of online behavioural advertising at one point, Chester and his neuroscience at another, and perhaps Judith Rauhofer, the workshop's organizer, at a third, with a lot of messy confusion in the middle. Upcoming laws, such as the revision of the EU ePrivacy Directive and various other regulatory efforts, will have to create some consensual order out of this triangular chaos.

The fourth episode of Joss Whedon's TV series Dollhouse, "The Gray Hour", had that week's characters enclosed inside a vault. They have an hour to accomplish their mission of theft which is the time between the time it takes for the security system to reboot. Is this online behavioral advertising's grey hour? Their opportunity to get ahead before we realize what's going on?

A persistent issue is definitely technology design.

One of Rauhofer's main points is that the latest mantra is, "This data exists, it would be silly not to take advantage of it." This is her answer to one of those middle points, that we should not be regulating collection but simply the use of data. This view makes sense to me: no one can abuse data that has not been collected. What does a privacy policy mean when the company that is actually collecting the data and compiling profiles is completely hidden?
One help would be teaching computer science students ethics and responsible data practices. The science fiction writer Charlie Stross noted the other day that the average age of entrepreneurs in the US is roughly ten years younger than in the EU. The reason: health insurance. Isn't is possible that starting up at a more mature age leads to a different approach to the social impact of what you're selling?

No one approach will solve this problem within the time we have to solve it. On the technology side, defaults matter. The "software choice architect" of researcher Chris Soghoian is rarely the software developer, more usually the legal or marketing department. The three of the biggest browser manufacturers who are most funded by advertising not-so-mysteriously have the least privacy-friend default settings. Advertising is becoming an arms race: first cookies, then Flash cookies, now online behavioral advertising, browser fingerprinting, geolocation, comprehensive profiling.

The law also matters. Peter Hustinx, lecturing last night believes existing principles are right; they just need stronger enforcement and better application.

Consumer education would help - but for that to be effective we need far greater transparency from all these - largely American - companies.

What harm can you show has happened? Zorbas challenged. Rauhofer's reply: you do not have to prove harm when your house is bugged and constantly wiretapped. "That it's happening is the harm."

Wendy M. Grossman's Web site has an extensive archive of her books, articles, and music, and an archive of all the earlier columns in this series.

July 1, 2011

Free speech, not data

Congress shall make no law...abridging the freedom of speech...

Is data mining speech? This week, in issuing its ruling in the case of IMS Health v Sorrell, the Supreme Court of the United States took the view that it can be. The majority (6-3) opinion struck down a Vermont law that prohibited drug companies from mining physicians' prescription data for marketing purposes. While the ruling of course has no legal effect outside the US, the primary issue in the case - the use of aggregated patient data - is being considered in many countries, including the UK, and the key technical debate is relevant everywhere.

IMS Health is a new species of medical organization: it collects aggregated medical data and mines it for client pharmaceutical companies, who use the results to determine their strategies for marketing to doctors. Vermont's goal was to save money by encouraging doctors to prescribe lower-cost generic medications. The pharmaceutical companies know, however, that marketing to doctors is effective. IMS Health accordingly sued to get the law struck down, claiming that the law abrogated the company's free speech rights. NGOs from the digital - EFF and EPIC - to the not-so-digital - AARP, - along with a host of medical organizations, filed amicus briefs arguing that patient information is confidential data that has never before been considered to fall within "free speech". The medical groups were concerned about the threat to trust between doctors and patients; EPIC and EFF added the more technical objection that the deidentification measures taken by IMS Health are inadequate.

At first glance, the SCOTUS ruling is pretty shocking. Why can't a state protect its population's privacy by limiting access to prescription data? How do marketers have free speech?

The court's objection - or rather, the majority opinion - was that the Vermont law is selective: it prohibits the particular use of this data for marketing but not other uses. That, to the six-judge majority, made the law censorship. The three remaining judges dissented, partly on privacy grounds, but mostly on the well-established basis that commercial speech typically enjoys a lower level of First Amendment protection than non-commercial speech.

When you are talking about traditional speech, censorship means selectively banning a type or source of content. Let's take Usenet in the early 1990s as an example. When spam became a problem, a group of community-minded volunteers devised cancellation practices that took note of this principle and defined spam according to the behavior involved in posting it. Deciding a particular posting was spam requires no subjective judgments about who posted the message or whether it was a commercial ad. Instead, postings are scored against a bunch of published, objective criteria: x number of copies, posted to y number of newsgroups, over z amount of time., or off-topic for that particular newsgroup, or a binary file posted to a text-only newsgroup. In the Vermont case, if you can accept the argument that data mining is speech, as SCOTUS did, then the various uses of the data are content and therefore a law that bans only one of many possible uses or bans use by specified parties is censorship.

The decision still seems intuitively wrong to me, as it apparently also did to the three remaining judges, who wrote a dissenting opinion that instead viewed the Vermont law as an attempt to regulate commercial activity, something that has never been covered by the First Amendment.

But note this: the concern for patient privacy that animated much of the interest in this case was only a bystander (which must surely have pleased the plaintiffs).

Obscured by this case, however, is the technical question that should be at the heart of such disputes (several other states have passed Vermont-style laws): how effectively can data be deidentified? If it can be easily reidentified and linked to specific patients, making it available for data mining ends medical privacy. If it can be effectively anonymized, then the objections go away.

At this year's Computers, Freedom, and Privacy there was some discussion of this issue; an IMS Health representative and several of the experts EPIC cited in its brief were present and disagreeing. Khaled El Emam, from the University of Ottawa, filed a brief (PDF) opposing EPIC's analysis; Latanya Sweeney, who did the seminal work in this area in the early 2000s, followed with a rebuttal. From these, my non-expert conclusion is that just as you cannot trust today's secure cryptographic system to remain unbreakable for the future as computing power continues to increase in speed and decrease in price, you cannot trust today's deidentification to remain robust against the increasing masses of data available for matching to it.

But it seems the technical and privacy issues raised by the Vermont case are yet to be decided. Vermont is free to try again to frame a law that has the effect the state wants but takes a different approach. As for the future of free speech, it seems clear that it will encompass many technological artefacts still being invented - and that it will be quite a fight to keep it protecting individuals instead of, increasingly, commercial enterprises.


Wendy M. Grossman's Web site has an extensive archive of her books, articles, and music, and an archive of all the earlier columns in this series.