« False economy | Main | Well may the bogeyman come »

Mona Lisa smile

Mona Lisa - cropped for net.wars.jpgA few weeks ago, Zoom announced that it intends to add emotion detection technology to its platform. According to Mark DeGeurin at Gizmodo, in response, 27 human rights groups from across the world, led by Fight for the Future, have sent an open letter demanding that the company abandon this little plan, calling the software "invasive" and "inherently biased". On Twitter, I've seen it called "modern phrenology"; a deep insult for those who remember the pseudoscience of studying the bumps on people's heads to predict their personalities.

It's an insult, but it's not really wrong. In 2019, Angela Chen at MIT Technology Review highlighted a study showing that facial expressions on their own are a poor guide to what someone is feeling. Cultures, context, personal style all affect how we present ourselves, and the posed faces AI developers use as part of their training of machine learning systems are even worse indicators, since few of us really how our faces look under the influence of different emotions. In 2021, Kate Crawford, author of Atlas of AI, used the same study to argue in The Atlantic that the evidence that these systems work at all is "shaky".

Nonetheless, Crawford goes on to report, this technology is being deployed in hiring systems and added into facial recognition. A few weeks ago, Kate Kaye reported at Protocol that Intel and virtual school software provider Classroom Technologies are teaming up to offer a version that runs on top of Zoom.

Cue for a bit of nostalgia: I remember the first time I heard of someone proposing computer emotion detection over the Internet. It was the late 1990s, and the source - or the perpetrator, depending on your point of view, was Rosalind Picard at the MIT Media Lab. Her book on the subject, Affective Computing, came out in 1997.

Picard's main idea was that to be truly intelligent - or at least, seem that way to us - computers would have to learn to recognize emotions and produce appropriate responses. One of the potential applications I remember hearing about was online classrooms, where the software could monitor students' expressions for signs of boredom, confusion, or distress and alert the teacher - exactly what Intel and Classroom Technologies want to do now. I remember being dubious: shouldn't teachers be dialed in on that sort of thing? Shouldn't they know their students well enough to notice? OK, remote, over a screen, maybe dozens or hundreds of students at a time...not so easy.... (Of course, the expensive schools offer mass online education schemes to exploit their "brands", but they still keep the small, in-person classes that creates those "brands" by churning out prime ministers and Silicon Valley dropouts.)

That wasn't Picard's main point, of course. In a recent podcast interview, she explains her original groundbreaking insight: that computers need to have emotional intelligence in order to make them less frustrating for us to deal with. If computers can capture the facial expressions we choose to show, the changes in our vocal tones, our gestures and muscle tension, perhaps they can respond more appropriately - or help humans to do so. Twenty-five years later, the ideas in Picard's work are now in use in media companies, ad agencies, and call centers - places where computer-human communication happens.

It seems a doubtful proposition. Humans learn from birth to read faces, and even we have argued for centuries over the meaning of the expression on the face of the Mona Lisa.

In 1997, Picard did not foresee the creepiness and giant technology exploiters. It's hard to know whether to be more alarmed about the technology's inaccuracy or its potential improvement. While it's inaccurate and biased, the dangers are the consequences of mistakes in interpretation; a student marked "inattentive", for example, may be penalized in their grade. But improving and debiasing the technology opens the way for fine-tuned manipulation and far more pervasive and intimate surveillance as it becomes embedded in every company, every conference, every government agency, every doctor's office, all of law enforcement. Meanwhile, the technological imperative of improving the system will require the collection of more and more data: body movements, heart rates, muscle tension, posture, gestures, surroundings.

I'd like to think that by this time we are smarter about how technology can be abused. I'm sure many of Zoom's corporate clients want emotion recognition technology; as in so many other cases, we are pawns because we're largely not the ones paying the bills or making the choice of platform. There's an analogy here to Elon Musk's negotiations with Twitter shareholders; the millions who use the service every day and find it valuable have no say in what will happen to it. If Zoom adopts emotion recognition, how long before law enforcement starts asking for user data in order to feed it into predictive policing systems? One of this week's more startling revelations was Aaron Gordon's report at Vice that San Francisco police are using driverless cars as mobile surveillance cameras, taking advantage of the fact that they are continuously recording their surroundings.

Sometimes the only way to block abuse of technology is to retire the idea entirely. If you really want to know what I'm thinking and feeling, just ask. I promise I'll tell you.

Illustrations: The emotional enigma that is the Mona Lisa.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Stories about the border wars between cyberspace and real life are posted occasionally during the week at the net.wars Pinboard - or follow on Twitter.


TrackBack URL for this entry:

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)