December 30, 2022

Lost for words

Thumbnail image for sbisson-parrot-49487515926_0c97364f80_o.jpgPrivacy advocacy has the effect of making you hyper-conscious of the exponentially increasing supply of data. All sorts of activities that used to leave little or no trace behind now create a giant stream of data exhaust, from interactions with friends (now captured by social media companies), to TV viewing (now captured by streaming services and cable companies), where and when we travel (now captured by credit card companies, municipal smart card systems, and facial recognition-equipped cameras), and everything we buy (unless you use cash). And then there are the vast amounts of new forms of data being gathered by the sensors attached to Internet of Things devices and, increasingly, more intimate devices, such as medical implants.

And yet. In a recent paper (PDF) that Tammy Xu summarizes at MIT Technology Review, the EPOCH AI research and forecasting unit argues that we are at risk of running out of a particular kind of data: the stuff we use to train large language models. More precisely, the stock of data deemed suitable for use in language training datasets is growing more slowly than the size of the datasets these increasingly large and powerful models require for training. The explosion of privacy-invasive, mechanically captured data mentioned above doesn't help with this problem; it can't help train what today passes for "artificial intelligence to improve its ability to generate content that reads like it could have been written by a sentient human.

So in this one sense the much-debunked saw that "data is the new oil" is truer than its proponents thought. Like drawing water from aquifers or depleting oil reserves, data miners have been relying on capital resources that have taken eras to build up and that can only be replenished over similar time scales. We professional writers produce new "high-quality" texts too slowly.

As Xu explains, "high-quality" in this context generally means things like books, news articles, scientific papers, and Wikipedia pages - that is, the kind of prose researchers want their models to copy. Wikipedia's English language section makes up only 0.6% of GPT-3 training data. "Low-quality" is all the other stuff we all churn out: social media postings, blog postings, web board comments, and so on. There is of course vastly more of this (and some of it is, we hope, high-quality)..

The paper's authors estimate that the high-quality text modelers prefer could be exhausted by 2026. Images, which are produced at higher rates, will take longer to exhaust - lasting to perhaps between 2030 and 2040. The paper considers three options for slowing exhaustion: broaden the standard for acceptable quality; find new sources; and develop more data-efficient solutions for training algorithms. Pursuing the fossil fuel analogy, I guess the equivalents might be: turning to techniques such as fracking to extract usable but less accessible fossil fuels, developing alternative sources such as renewables, and increasing energy efficiency. As in the energy sector, we may need to do all three.

I suppose paying the world's laid-off and struggling professional writers to produce text to feed the training models can't form part of the plan?

The first approach might have some good effects by increasing the diversity of training data. The same is true of the second, although using AI-generated text (synthetic data to train the model seems as recursive as using an algorithm to highlight trends to tempt users. Is there anything real in there?

Regarding the third... It's worth remembering the 2020 paper On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? (the paper over which Google apparently fired AI ethics team leader Timnit Gebru). In this paper (and a FaCCT talk), Gebru, Emily M. Bender, Angelina McMillan-Major, and Shmargaret Shmitchell outlined the escalating environmental and social costs of increasingly large language models and argued that datasets needed to be carefully curated and documented, and tailored to the circumstances and context in which the model was eventually going to be used.

As Bender writes at Medium, there's a significant danger that humans reading the language generated by systems like GPT-3 may *believe* it's the product of a sentient mind. At IAI News, she and Chirag Shah call text generators like GPT-3 dangerous because they have no understanding of meaning even as they spit out coherent answers to user questions in natural language. That is, these models can spew out plausible-sounding nonsense at scale; in 2020, Renée DiResta predicted at The Atlantic that generative text will provide an infinite supply of disinformation and propaganda.

This is humans finding patterns even where they don't exist: all the language model does is make a probabilistic guess about the next word based on statistics derived from the data it's been trained on. It has no understanding of its own results. As Ben Dickson puts it at TechTalks as part of an analysis of the workings of the language model BERT, "Coherence is in the eye of the beholder." On Twitter, Bender quipped that a good new name would be PSEUDOSCI (for Pattern-matching by Syndicate Entities of Uncurated Data Objects, through Superfluous (energy) Consumption and Incentives).

If running out of training data means a halt on improving the human-like quality of language generators' empty phrases, that may not be such a bad thing.

Illustrations: Drunk parrot (taken by Simon Bisson).

December 23, 2022

An inherently adverse environment

Rockettes_2239922329_8e6ffd44de-370.jpgEarlier this year, I wrote a short story/provocation for the recent book 22 Ideas About the Future. My story imagined a future in which the British central government had undermined local authorities by allowing local communities to opt out and contract for their own services. One of the consequences was to carve London up into tiny neighborhoods, each with its own rules and sponsorships, making it difficult to plot a joined-up route across town. Like an idiot, I entirely overlooked the role facial recognition would play in such a scenario. Community blocs like these, some openly set up to exclude unwanted diversity, would absolutely grab at facial recognition to repel - or charge - unwelcome outsiders.

Most discussion of facial recognition to date has focused on privacy: that it becomes impossible to move around public spaces without being identified and tracked. We haven't thought enough about the potential use of facial recognition to underpin a braad permission-based society in which our presence in any space can be detected and terminated at any time. In such a society, we are all migrants.

That particular unwanted dystopian future is upon us. This week, we learned that a New Jersey lawyer was blocked from attending the Radio City Music Hall Christmas show with her daughter because the venue's facial recognition system identified her as a member of a law firm involved in litigation against Radio City's owner, MSG Entertainment. Security denied her entry, despite her protests that she was not involved in the litigation. Whether she was or wasn't shouldn't really matter; she had committed no crime, she was causing no disturbance, she was granted no due process, and she had no opportunity for redress.

Soon after she told her story a second instance emerged, a male lawyer who was blocked from attending a New York Knicks basketball game at Madison Square Garden. Then, quickly, a third: a woman and her husband were removed from their seats at a Brandi Carlile concert, also at Madison Square Garden.

MSG later explained that litigation creates "an inherently adverse environment". I read that this way: the company has chosen to use developing technology in an abusive display of power. In other words, MSG is treating its venues as if they were the new-style airports Edward Hasbrouck has detailed, also covered here a few weeks back. In its original context, airport thinking is bad enough; expanded to the world's many privately-owned public venues, the potential is terrifying.

Early adopters of sharing data to exclude bad people talked about barring known shoplifters from chains of pubs or supermarkets, or catching and punishing criminals much more quickly. The MSG story means the mission has crept from "terrorist" to "don't like their employer" at unprecedented speed.

The right to navigate the world without interference is one privileged folks have taken for granted. With some exceptions: in England, the right to ramble all parts of the countryside took more than a century to codify into law.To an American, exclusion from a public venue *feels* like it should be a Constitutional issue - but of course it's not, since the affected venues are owned by a private company. In the reactions I've seen to the MSG stories, people have called for a ban on live facial recognition. By itself that's probably not going to be enough, now that this compost heap of worms has been opened; we are going to need legislation to underpin the right to assemble in privately-owned public spaces. Such a right sort of exists already in the conditions baked into many relevant local licensing laws that require venue operators to be the real-world equivalent of common carriers in telecommunications, who are not allowed to pick and choose whose data they will carry.

In a fourth MSG incident, a lawyer who is suing Madision Square Garden for barring him from entering, tricked the cameras at the MSG-owned Beacon Theater by disguising himself with a beard and a baseball cap. He didn't exactly need to, as his company had won a restraining order requiring MSG to let its lawyers into its venues (the case continues).

In that case, MSG's lawyer told the court barring opposition lawyers was essential to protect the company: "It's not feasible for any entertainment venue to operate any other way,"

Since when? At the New York Times, Kashmir Hill explains that the company adopted this policy last summer and depends on the photos displayed on law firms' websites to feed into its facial recognition to look for matches. But really the answer can only be: since the technology became available to enforce such a ban. It is a clear case where the availability of a technology leads to worse behavior on the part of its owner.

In 1996, the software engineer turned essayist and novelist Ellen Ujllman wrote about exactly this with respect to databases: they infect their owners with the desire to use their new capabilities. In one of her examples, a man suddenly realized he could monitor what his long-trusted secretary did all day. In another, a system to help ensure AIDS patients were getting all the benefits they were entitled to slowly morphed into a system for checking entitlement. In the case of facial recognition, its availability infinitely extends the British Tories' concept of the hostile environment.

Illustrations: The Rockettes performing in 2008 (via skividal at Wikimedia).

December 16, 2022

A garden of snakes

Thumbnail image for Thumbnail image for coyote-roadrunner-cliff.pngIt's hard to properly enjoy I-told-you-so schadenfreude when you know, from Juan Vargas (D-CA)'s comments this week, that disproportionately the people most affected by the latest cryptocurrency collapse are those who can least afford it. What began as a cultish libertarian desire to bypass the global financial system became a vector for wild speculation, and is now the heart of a series of collapsing frauds.

From the beginning, I've called bitcoin and its sequels as "the currency equivalent of being famous for being famous". Crypto(currency) fans like to claim that the world's fiat currencies don't have any underlying value either, but those are backed by the full faith and credit of governments and economies. Logically, crypto appeals most to those with the least reason to trust their governments: the very rich who resent paying taxes and those who think they have nothing to lose.

This week the US House and Senate both held hearings on the collapse of cryptocurrency exchange and hedge fund FTX and its deposed, arrested, and charged CEO Sam Bankman-Fried. The key lesson: we can understand the main issues surrounding FTX and its fellow cryptocurrency exchanges without understanding either the technical or financial intricacies.

A key question is whether the problem is FTX or the entire industry. Answers largely split along partisan lines. Republican member chose FTX, and tended to blame Securities and Exchange Commission chair Gary Gensler. Democrats were more likely to condemn the entire industry.

As Jesús G. "Chuy" García (D-IL) put it, "FTX is not an anomaly. It's not just one corrupt guy stealing money, it's an entire industry that refuses to comply with existing regulation that thinks it's above the law." Or, per Brad Sherman (D-CA), "My fear is that we'll view Sam Bankman-Fried as just one big snake in a crypto garden of Eden. The fact is, crypto is a garden of snakes."

When Sherrod Brown (D-OH) asked whether FTX-style fraud existed at other crypto firms, all four expert speakers said yes.

Related is the question of whether and how to regulate crypto, which begins with the problem of deciding whether crypto assets are securities under the decades-old Howey test. In its ongoing suit against Ripple, Gensler's SEC argues for regulation as securities. Lack of regulation has enabled crypto "innovation" - and let it recreate practices long banned in traditional financial markets. For an example see Ben McKenzie's and Jacob Silverman's analysis of leading crypto exchange Binance's endemic conflicts of interest and the extreme risks it allows customers to take that are barred under securities regulations.

Regulation could correct some of this. McKenzie gave the Senate committee numbers: fraudulent financier Bernie Madoff had 37,000 clients; FTX had 32 times that in the US alone. The collective lost funds of the hundreds of millions of victims worldwide could be ten times bigger than Madoff.

But: would regulating crypto clean up the industry or lend it legitimacy it does not deserve? Skeptics ask this about alt-med practitioners.

Some background. As software engineer Stephen Diehl explains in his new book, Popping the Crypto Bubble, securities are roughly the opposite of money. What you want from money is stability; sudden changes in value spark cost-of-living crises and economic collapse. For investors, stability is the enemy: they want investments' value to go up. The countervailing risk is why the SEC's requires companies offering securities to publish sufficient truthful information to enable investors to make a reasonable assessment.

In his book, Diehl compares crypto to previous bubbles: the Internet, tulips, the railways, the South Sea. Some, such as the Internet and the railways, cost early investors fortunes but leave behind valuable new infrastructure and technologies on which vast new industries are built. Others, like tulips, leave nothing of new value. Diehl, like other skeptics, believes cryptocurrencies are like tulips.

The idea of digital cash was certainly not new in 2008, when "Satoshi" published their seminal paper on bitcoin; the earliest work is usually attributed to David Chaum, whose 1982 dissertation contained the first known proposal for a blockchain protocol, proposed digital cash in a 1983 paper, and set up a company to commercialize digital cash in 1990 - way too early. Crypto's ethos came from the cypherpunks mailing list, which was founded in 1992 and explored the idea of using cryptography to build a new global financial system.

Diehl connects the reception of Satoshi's paper to its timing, just after the 2007-2008 financial crisis. There's some logic there: many have never recovered.

For a few years in the mid-2010s, a common claim was that cryptocurrencies were bubbles but the blockchain would provide enduring value. Notably disagreeing was Michael Salmony, who startled the 2016 Tomorrow's Transactions Forum by saying the blockchain was a technology in search of a solution. Last week, IBM and Maersk announced they are shutting down their enterprise blockchain because, Dan Robinson writes at The Register, despite the apparently idea use case, they couldn't attract industry collaboration.

More recently we've seen the speculative bubble around NFTs, but otherwise we've heard only about their wildly careening prices in US dollars and the amount of energy mining them consumes. Until this year, when escalating crashes and frauds are taking over. Distrust does not build value.

Illustrations: The Warner Brothers coyote, realizing he's standing on thin air.

December 9, 2022


wires-370.jpgSay you haven't moved (house) in 30 years without saying you haven't moved in 30 years. Or say you're over 50 without saying you're over 50. "I just pulled a big box of wires out of my house."

What began as a project to turn the attic (loft) into more usable space has metastasized all over the house (apartment), as every crowded corner gets reevaluated. Behind every piece of furniture, some being moved for the first time since 1991, lurk wires. Wires of all kinds. Speaker wire that ran to the amplifier down the hall. TV cables connecting various items - computer, DVD player, the VCR I can't throw out until all the tapes are gone. Ethernet cables, because wired connections are more stable. Telephone cables running to remote extensions that were replaced with DECT phones 15 years ago. A weird, extraordinarily thin wire for a device called a Rabbit that once connected the TV in my office to the cable box in the living room; an infrared sender even let you change channels. All of the cable box, the Rabbit, and the TV are long gone, but the wire lives on because it runs behind furniture that has settled too deeply into the carpet to move. Even now, I haven't got it all out. And, because this apartment (flat) has just one single electrical outlet per room, multio-way extension cords and plugs *everywhere*.

The phone, stereo system, and TV cabling went in first. Layered on top of all that was an ethernet network that accreted over time to serve various computers in odd locations. There was an extra wifi router in the living room because the original one's wifi didn't reach the kitchen. And so on. So the box of pulled wiring also includes three network switches, which still leaves two in place. This in a four-room flat!

I still haven't touched the Giant Rat of Sumatra's nest behind my desk.

This is the result of 30 years of adding bits that were needed at the time but never subtracting them when their original purpose has gone. If you move frequently this sort of thing doesn't happen because you tear it all down and build back only what you need each time. I know, because between the ages of 17 and 27 I moved nine times. I got really good at packing books and LPs. (Say you're over 60 without saying you're over 60.)

Were I a 30-something modern renter, my entire life would lift out of each successive abode leaving no trace and requiring few boxes. My books, audio, and video would be computer files or streaming subscriptions. All my telecommunications connections would be wireless. And, for best results, any furniture I had would be either on 30-day free trial or inflatable. It's like having a printer: modern people are app people. Wires need not apply. Wires are for old people. Wires...are a sign of privilege.

I now realize that accretion has led me to the equivalent of buying a tractor but continuing to feed and care for the Clydesdale horses it replaced without really noticing they're no longer doing anything useful. Or, in a higher-risk example, this sort of accretion leads older people into overly complex medication regimes as their doctors add new medications, often to control the side effects of the ones they're already on, without reconsidering the whole list; that situation is common enough to have bred a subspecialty of pharmacology to review and rationalize people's medications.

More technologically, there's the phenomenon consultants remark upon of finding ancient machines, even in banks that are running mission-critical but ancient software no one dares touch because no one knows how it works. I suspect that as the time between computer replacements continues to lengthen accretion of this type will be the fate of all computer systems. The reason is simple: adding things to patch localized problems without touching what's already in place will always feel safer than pulling an unlabeled plug and risking breaking the whole system because you didn't understand the complex dependencies. And there's little motivation. For the most part, everything works fine until one day the increasing complexity overwhelms the system and it all falls over - at which point tracing and the fault is excruciatingly difficult, and fixing will likely require a workaround that, like the one for the Y2K bug, has an expiration date when you'll have to trace and replace - or find another workaround.

There are lots of knock-on effects from accretion, most notably unnoticed security vulnerabilities. In her days running RISCS, Angela Sasse used to say that often important solutions to endemic cybersecurity problems are overlooked because they're not specifically technological fixes. Instead, she argued, reducing stress on employees by ensuring they're not overworked and have systems that make their work easier instead of harder, pays dividends in fewer mistakes. Similarly, upgrading and replacing old equipment with newer equipment with better security and usability built in can solve many seemingly intractable problems, over time costing less than continuing to patch the old system.

In my own case, there was a small but definite cost in wasted electricity (those extra switches) and, I imagine, a slightly higher risk of fire (all those extension cords). Life, as Gilbert and Sullivan observed, is a closely complicated tangle.

Illustrations: The box of wires, with more to come.

December 2, 2022

Hearing loss

amazon-echo-dot-charcoal-front-on-370.jpgSome technologies fail because they aren't worth the trouble (3D movies). Some fail because the necessary infrastructure and underlying technologies aren't good enough yet (AI in the 1980s, pen computing in the 1990s). Some fail because the world goes another, simpler, more readily available way (Open Systems Interconnection). Some fail because they are beset with fraud (the fate that appears to be unfolding with respect to cryptocurrencies), And some fail even though they work as advertised and people want them and use them because they make no money to sustain their development for their inventors and manufacturers.

The latter appears to be the situation with smart speakers, which in 2015 were going to take over the world, and today, in 2022, are installed in 75% of US homes. Despite this apparent success, they are losing money even for market leaders Amazon (third) and Google (second), as Business Insider reported this week. Amazon's Worldwide Digital division, which includes Prime Video as well as Echo smart speakers and Alexa voice technology, lost $3 billion in the first quarter of this year alone, primarily due to Alexa and other devices. The division will now be the biggest target for the layoffs the company announced last week.

The gist: they thought smart speakers would be like razors or inkjet printers, where you sell the hardware at or below cost and reap a steady income stream from selling razor blades or ink cartridges. Amazon thought people would buy their smart speakers, see something they liked, and order the speaker to put through the purchase. Instead, judging from the small sample I have observed personally, people use their smart speakers as timers, radios, and enhanced remote controls, and occasionally to get a quick answer from Wikipedia. And that's it. The friends I watched order their smart speaker to turn on the basement lights and manage their shopping list have, as far as I could tell on a recent visit, developed no new uses for their voice assistant in three years of being locked up at home with it.

The system has developed a new feature, though. It now routinely puts the shopping list items on the wrong shopping list. They don't know why.

In raising this topic at The Overspill, Charles Arthur referred back to a 2016 Wired aritcle summarizing venture capitalist Mary Meeker's assessment in her annual Internet Trends report that voice was going to take over the world and the iPhone had peaked. In slides 115-133, Meeker outlined her argument: improving accuracy would be a game-changer.

Even without looking at recent figures, it's clear voice hasn't taken over. People do use speech when their hands are occupied, especially when driving or when the alternative is to type painfully into their smartphone - but keyboards still populate everyone's desks, and the only people I know who use speech for data entry are people for whom typing is exceptionally difficult.

One unforeseen deterrent may be that privacy emerged as a larger issue than early prognosticators may have expected. Repeated stories have raised awareness that the price of being able to use a voice assistant at will is that microphones in your home listen to everything you say waiting for their cue to send your speech to a distant server to parse. Rising consciousness of the power of the big technology companies has made more of us aware that smart speakers are designed more to fulfill their manufacturers' desires to intermediate and monetize our lives than to help us.

The notion that consumers would want to use Amazon's Echo for shopping appears seriously deluded with hindsight. Even the most dedicated voice users I know want to see what they're buying. Years ago, I thought that as TV and the Internet converged we'd see a form of interactive product placement in which it would be possible to click to buy a copy of the shirt a football player was wearing during a game or the bed you liked in a sitcom. Obviously, this hasn't happened; instead a lot of TV has moved to streaming services without ads, and interactive broadcast TV is not a thing. But in *that* integrated world voice-activated shopping would work quite well, as in "Buy me that bed at the lowest price you can find", or "Send my brother the closest copy you can find of Novak Djokovic's dark red sweatshirt, size large, as soon as possible, all cotton if possible."

But that is not our world, and in our world we have to make those links and look up the details for ourselves. So voice does not work for shopping beyond adding items to lists. And if that doesn't work, what other options are there? As Ron Amadeo writes at Ars Technica, the queries where Alexa is frequently used can't be monetized, and customers showed little interest in using Alexa to interact with other companies such as Uber or Domino's Pizza. And, even Google, which is also cutting investment in its voice assistant, can't risk alienating consumers by using its smart speaker to play ads. Only Apple appears unaffected.

"If you build it, they will come," has been the driving motto of a lot of technological development over the last 30 years. In this case, they built it, they came, and almost everyone lost money. At what point do they turn the servers off?

Illustrations: Amazon Echo Dot.

