Free speech, not data
Is data mining speech? This week, in issuing its ruling in the case of IMS Health v Sorrell, the Supreme Court of the United States took the view that it can be. The majority (6-3) opinion struck down a Vermont law that prohibited drug companies from mining physicians' prescription data for marketing purposes. While the ruling of course has no legal effect outside the US, the primary issue in the case - the use of aggregated patient data - is being considered in many countries, including the UK, and the key technical debate is relevant everywhere.
IMS Health is a new species of medical organization: it collects aggregated medical data and mines it for client pharmaceutical companies, who use the results to determine their strategies for marketing to doctors. Vermont's goal was to save money by encouraging doctors to prescribe lower-cost generic medications. The pharmaceutical companies know, however, that marketing to doctors is effective. IMS Health accordingly sued to get the law struck down, claiming that the law abrogated the company's free speech rights. NGOs from the digital - EFF and EPIC - to the not-so-digital - AARP, - along with a host of medical organizations, filed amicus briefs arguing that patient information is confidential data that has never before been considered to fall within "free speech". The medical groups were concerned about the threat to trust between doctors and patients; EPIC and EFF added the more technical objection that the deidentification measures taken by IMS Health are inadequate.
At first glance, the SCOTUS ruling is pretty shocking. Why can't a state protect its population's privacy by limiting access to prescription data? How do marketers have free speech?
The court's objection - or rather, the majority opinion - was that the Vermont law is selective: it prohibits the particular use of this data for marketing but not other uses. That, to the six-judge majority, made the law censorship. The three remaining judges dissented, partly on privacy grounds, but mostly on the well-established basis that commercial speech typically enjoys a lower level of First Amendment protection than non-commercial speech.
When you are talking about traditional speech, censorship means selectively banning a type or source of content. Let's take Usenet in the early 1990s as an example. When spam became a problem, a group of community-minded volunteers devised cancellation practices that took note of this principle and defined spam according to the behavior involved in posting it. Deciding a particular posting was spam requires no subjective judgments about who posted the message or whether it was a commercial ad. Instead, postings are scored against a bunch of published, objective criteria: x number of copies, posted to y number of newsgroups, over z amount of time., or off-topic for that particular newsgroup, or a binary file posted to a text-only newsgroup. In the Vermont case, if you can accept the argument that data mining is speech, as SCOTUS did, then the various uses of the data are content and therefore a law that bans only one of many possible uses or bans use by specified parties is censorship.
The decision still seems intuitively wrong to me, as it apparently also did to the three remaining judges, who wrote a dissenting opinion that instead viewed the Vermont law as an attempt to regulate commercial activity, something that has never been covered by the First Amendment.
But note this: the concern for patient privacy that animated much of the interest in this case was only a bystander (which must surely have pleased the plaintiffs).
Obscured by this case, however, is the technical question that should be at the heart of such disputes (several other states have passed Vermont-style laws): how effectively can data be deidentified? If it can be easily reidentified and linked to specific patients, making it available for data mining ends medical privacy. If it can be effectively anonymized, then the objections go away.
At this year's Computers, Freedom, and Privacy there was some discussion of this issue; an IMS Health representative and several of the experts EPIC cited in its brief were present and disagreeing. Khaled El Emam, from the University of Ottawa, filed a brief (PDF) opposing EPIC's analysis; Latanya Sweeney, who did the seminal work in this area in the early 2000s, followed with a rebuttal. From these, my non-expert conclusion is that just as you cannot trust today's secure cryptographic system to remain unbreakable for the future as computing power continues to increase in speed and decrease in price, you cannot trust today's deidentification to remain robust against the increasing masses of data available for matching to it.
But it seems the technical and privacy issues raised by the Vermont case are yet to be decided. Vermont is free to try again to frame a law that has the effect the state wants but takes a different approach. As for the future of free speech, it seems clear that it will encompass many technological artefacts still being invented - and that it will be quite a fight to keep it protecting individuals instead of, increasingly, commercial enterprises.