Runaway
For me, the scariest presentation of 2019 was a talk given by Cornell University professor Vitaly Shmatikov about computer models. It's partly a matter of reframing the familiar picture; for years, Bill Smart and Cindy Grimm have explained to attendees at We Robot that we don't necessarily really know what it is that neural nets are learning when they're deep learning.
In Smart's example, changing a few pixels in an image can change the machine learning algorithm's perception of it from "Abraham Lincoln" to "zebrafish". Misunderstanding what's important to an algorithm is the kind of thing research scientist Janelle Shane exploits when she pranks neural networks and asks them to generate new recipes or Christmas carols from a pile of known examples. In her book, You Look Like a Thing and I Love You, she presents the inner workings of many more examples.
All of this explains why researchers Kate Crawford and Trevor Paglen's ImageNet Roulette experiment tagged my my Twitter avatar as "the Dalai Lama". I didn't dare rerun it, because how can you beat that? The experiment over, would-be visitors are now redirected to Crawford's and Paglen's thoughtful examination of the problems they found in the tagging and classification system that's being used in training these algorithms.
Crawford and Paglen write persuasively about the world view captured by the inclusion of categories such as "Bad Person" and "Jezebel" - real categories in the Person classification subsystem. The aspect has gone largely unnoticed until now because conference papers focused on the non-human images in ten-year-old ImageNet and its fellow training databases. Then there is the *other* problem, that the people's pictures used to train the algorithm were appropriated from search engines, photo-sharing sites such as Flickr, and video of students walking their university campuses. Even if you would have approved the use of your forgotten Flickr feed to train image recognition algorithms, I'm betting you wouldn't have agreed to be literally tagged "loser" so the algorithm can apply that tag later to a child wearing sunglasses. Why is "gal" even a Person subcategory, still less the most-populated one? Crawford and Paglen conclude that datasets are "a political intervention". I'll take "Dalai Lama", gladly.
Again, though, all of this fits with and builds upon an already known problem: we don't really know which patterns machine learning algorithms identify as significant. In his recent talk to a group of security researchers at UCL, however, Shmatikov, whose previous work includes training an algorithm to recognize faces despite obfuscation, outlined a deeper problem: these algorithms "overlearn". How do we stop them from "learning" (and then applying) unwanted lessons? He says we can't.
"Organically, the model learns to recognize all sorts of things about the original data that were not intended." In his example, in training an algorithm to recognize gender using a dataset of facial images, alongside it will learn to infer race, including races not represented in the training dataset, and even identities. In another example, you can train a text classifier to infer sentiment - and the model also learns to infer authorship.
Options for counteraction are limited. Censoring unwanted features doesn't work because a) you don't know what to censor; b) you can't censor something that isn't represented in the training data; and c) that type of censoring damages the algorithm's accuracy on the original task. "Either you're doing face analysis or you're not." Shmatikov and Congzheng Song explain their work more formally in their paper Overlearning Reveals Sensitive Attributes.
"We can't really constrain what the model is learning," Shmatikov told a group of security researchers at UCL recently, "only how it is used. It is going to be very hard to prevent the model from learning things you don't want it to learn." This drives a huge hole through GDPR, which relies on a model of meaningful consent. How do you consent to something no one knows is going to happen?
What Shmatikov was saying, therefore, is that from a security and privacy point of view, the typical question we ask, "Did the model learn its task well?", is too limited. "Security and privacy people should also be asking: what else did the model learn?" Some possibilities: it could have memorized the training data; discovered orthogonal features; performed privacy-violating tasks; or incorporated a backdoor. None of these are captured in assessing the model's accuracy in performing the assigned task.
My first reaction was to wonder whether a data-mining company like Facebook could use Shmatikov's explanation as an excuse when it's accused of allowing its system to discriminate against people - for example, in digital redlinining. Shmatikov thought not, at least, not more than their work helps people find out what their models are really doing.
"How to force the model to discover the simplest possible representation is a separate problem worth invdstigating," he concluded.
So: we can't easily predict what computer models learn when we set them a task involving complex representations, and we can't easily get rid of these unexpected lessons while retaining the usefulness of the models. I was not the only person who found this scary. We are turning these things loose on the world and incorporating them into decision making without the slightest idea of what they're doing. Seriously?
Illustrations: Vitaly Shmatikov (via Cornell).
Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Stories about the border wars between cyberspace and real life are posted occasionally during the week at the net.wars Pinboard - or follow on Twitter.