Main

January 21, 2022

Power plays

thames-kew-2022-01-17.jpegWe are still catching up on updates and trends.

Two days before the self-imposed deadline, someone blinked in the game of financial chicken between Amazon UK and Visa. We don't know which one it was, but on January 17 Amazon said it wouldn't stop accepting Visa credit cards after all. Negotiations are reportedly ongoing.

Ostensibly, the dispute was about the size of Visa's transaction fees. At Quartz, Ananya Bhattacharya quotes Banked.com's Ben Goodall's alternative explanation: the dispute allowed Amazon to suck up a load of new data that will help it build "the super checkout for the future. For Visa, she concludes, resolving the dispute has relatively little value beyond PR: Amazon accounts for only 1% of its UK credit card volume. For the rest of us, it remains disturbing that our interests matter so little. If you want proof of market dominance, look no further.

In June 2021, the Federal Trade Commission tried to bring an antitrust suit against Facebook, and failed when the court ruled that in its complaint the FTC had failed to prove its most basic assumption: that Facebook had a dominant market position. Facebook was awarded the dismissal it requested. This week, however, the same judge ruled that the FTC's amended complaint, which was filed in August, will be allowed to go ahead, though he suggests in his opinion that the FTC will struggle to substantiate some of its claims. Essentially, the FTC accuses Facebook of a "buy or bury" policy when faced with a new and innovative competitor and says it needed to make up for its own inability to adapt to the mobile world.

We will know if Facebook (or its newly-renamed holding company owner, Meta) is worried if it starts claiming that damaging the company is bad for America. This approach began as satire, Robert Heller explained in his 1994 book The Fate of IBM. Heller cites a 1990 PC Magazine column by William E. Zachmann, who used it as the last step in an escalating list of how the "IBMpire" would respond to antitrust allegations.

This week, Google came close to a real-life copy in a blog posting opposing an amendment to the antitrust bill currently going through the US Congress. The goal behind the bill is to make it easier for smaller companies to compete by prohibiting the major platforms from advantaging their own products and services. Google argues, however, that if the bill goes through Americans might get worse service from Google's products, American technology companies could be placed at a competitive disadvantage, and America's national security could be threatened. Instead of suggesting ways to improve the bills, however, Google concludes with the advice that Congress should delay the whole thing.

To be fair, Google isn't the only one that dislikes the bill. Apple argues its provisions might make it harder for users to opt out of unwanted monitoring. Free Press Action argues that it will make it harder to combat online misinformation and hate speech by banning the platforms from "discriminating" against "similarly situated businesses" (the bill's language), competitor or not. EFF, on the other hand, thinks copyright is a bigger competition issue. All better points than Google's.

A secondary concern is the fact that these US actions are likely to leave the technology companies untouched in the rest of the world. In Africa, Nesrine Malik writes at the Guardian, Facebook is indispensable and the only Internet most people know because its zero-rating allows its free use outside of (expensive) data plans. Most African Internet users are mobile-only, and most data users are on pay-as-you-go plans. So while Westerners deleting their accounts is a real threat to the company's future - not least because, as Frances Haugen testified, they produce the most revenue - the company owns the market in Africa. There, it is literally the only game in town for both businesses and individuals. Twenty-five years ago, we thought the Internet would be a vehicle for exporting the First Amendment. Instead...

Much of the discussion about online misinformation focuses on content moderation. In a new report the Royal Society asks how to create a better information environment. Despite its harm, the report comes down against simply removing scientific misinformation. Like Charles Arthur in his 2021 book Social Warming, the report's authors argue for slowing the spread by various methods - adding a friction to social media sharing, reconfiguring algorithms, in a few cases de-platforming superspreaders. I like the scientists' conclusion that simple removal doesn't work; in science you must show your work, and deletion fuels conspiracy theories. During this pandemic, Twitter has been spectacular at making it possible to watch scientists grapple with uncertainty in real time.

The report also disputes some of our longstanding ideas about how online interaction works. A literature review finds that the filter bubbles and echo chambers Eli Pariser posited in 2011 are less important than we generally think. Instead most people have "relatively diverse media diets" and the minority who "inhabit politically partisan online news echo chambers" is about 6% to 8% of users.

Keeping it that way, however, depends on having choices, which leads back to these antitrust cases. The bigger and more powerful the platforms are, the less we - as both individuals and societies - matter to them.


Illustrations: The Thames at an unusually quiet moment, in January 2022.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Stories about the border wars between cyberspace and real life are posted occasionally during the week at the net.wars Pinboard - or follow on Twitter.

December 9, 2021

"Crypto"

rotated-birch-contactlessmonopoly-ttf2016.jpgA few weeks ago, digital rights activist Amie Stepanovich was in the news for making a T-shirt objecting to the new abuse of "crypto" to mean "cryptocurrencies". As Stepanovich correctly says, "crypto" has meant "cryptography" for at least 30 years and old-timers do not appreciate its appropriation. I am enough of an oldtimer to agree with her, but fear she's fighting a losing battle. For decades "hackers" meant clever people who bent hardware and software systems to their will. Hackers built the first computers. Hackers made the Internet. "Hacker" was a term of honor, applied by others. And what happened circa the mid-1990s? It was repurposed for petty criminals running scripts to break into websites. Real hackers were furious. Did anyone respond sympathetically? They did not. Hackers are now criminals. So: "Crypto" is doomed. Exhibit A: Jeff John Roberts' 2020 history of Coinbase, Kings of Crypto.

This week, anti-monopolist author Matt Stoller unleashed a rant about "crypto", calling the whole shebang - which for him includes the non-fungible token (NFT) craze, cryptocurrencies, and the blockchain, as well as web3, which we tried to make sense of a couple of weeks ago - "a bunch of bullshit". The only use cases Stoller could find were speculation and money laundering; the tools that exist he dismissed as "don't work". He attributes its anti-monopoly zeitgeist to cryptocurrencies' emergence "out of the financial crisis", adding on Twitter that they were "invented about the same time as the iPhone".

This is when I realized: this use of "crypto" is less evolving language, more loss of culture. We all think the world started when we discovered it.

So.

"Crypto", as in cryptography, is probably as old as humanity, basically because every time someone figures out how to protect a secret someone else tries to crack it. For that history read Simon Singh's Cryptography. The development of the specific type of cryptography the nascent Internet needed, public key cryptography, is thoroughly documented in Steven Levy's Crypto. For cryptography in military communications try David Kahn's The Codebreakers.

Cryptocurrencies as a digital equivalent of cash, are usually traced to 1991, when David Chaum described ecash in Scientific American. In the mid-1990s, Chaum attempted to commercialize ecash via his company, Digicash.

Nothing was ready. Commercial traffic on the Internet began in 1994, soon followed by the first ecommerce companies: eBay, Amazon, and Paypal. Graphical web browsers were slow and bare-bones. People were afraid to use *credit cards* online. Yet Chaum hoped they would opt to turn their familiar, hard-earned money into his incomprehensible mathematical thing and bet they could find somewhere to buy something with it. The web was too small, the user base was too small, and it was all so strange and clever, way too soon. Chaum was not the only one to discover this sad reality.

This timing was due to the unexpected democratization of cryptography, which began in 1976, when Martin Hellman and Whitfield Diffie published the basis of public key cryptography (later, it emerged that the UK spy agency GCHQ had already developed it, but the mathematicians couldn't tell anybody). Besides allowing strangers to communicate spontaneously in a trustworthy way, Diffie's and Hellman's work pulled cryptography out of the spy agencies into entirely new communities. By 1991, a single programmer in his home with a personal computer was able to write a piece of powerful encryption software that anyone could use to protect their data and communications, setting off 30 years of crypto debates. Phil Zimmermann's program, PGP, is still in use today, having withstood the tests cryptoanalysts have thrown at it.

These technical developments inspired the beginnings of the movement and the anti-government motivations that Stoller identifies. To many of this crowd, finding easier and more efficient ways to move money around was only part of its appeal. Many embraced the idea of being able to bypass banks, governments, tax collectors, and all the other trappings of the regulated world by using encryption to create untraceable forms of money. In her 1997 book, Close to the Machine, Ellen Ullman tells the story of her close encounters with one of the 1990s movement's leads, and their inability to understand each other's world.

Throughout the 1990s these ideas were swapped back and forth on the Cypherpunks mailing list. You can get the gist from this CrypoInsider tribute to Timothy C. May or May's Cyphernomicon. At Computers, Freedom, and Privacy 1997, May outlined BlackNet, an anonymous market for everything from assassinations to government secrets, all enabled by untraceable digital cash. May's information market is so like early Wikileaks, that at its inception I failed to take it seriously (Julian Assange has said he read the Cypherpunks list).

However: blockhain-based cryptocurrencies are not untraceable. The 1997 Internet was also awash in libertarian predictions, too - and what got built and who's profiting? Sure, some cryptocurrency nuts want to bypass banks and play anti-regulatory games. But some of today's experimenters with cryptocurrencies are central banks, governments, and credit card companies, as fintech expert Dave Birch writes in his book The Cryptocurrency Cold War. If there are winners, they will be the ones claiming most of the spoils. Unless Web3 works out?


Illustrations: Dave Birch, trying to figure out how to play contactless Monopoly.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Stories about the border wars between cyberspace and real life are posted occasionally during the week at the net.wars Pinboard - or follow on Twitter.

December 3, 2021

Trust and antitrust

coyote-roadrunner-cliff.pngFour years ago, 2021's new Federal Trade Commission chair, Lina Khan, made her name by writing an antitrust analysis of Amazon that made three main points: 1) Amazon is far more dangerously dominant than people realize; 2) antitrust law, which for the last 30 years has used consumer prices as its main criterion, needs reform; and 3) two inventors in a garage can no longer upend dominant companies because they'll either be bought or crushed. She also accused Amazon of leveraging the Marketplace sellers data it collects to develop and promote competing products.

For context, that was the year Amazon bought Whole Foods.

What made Khan's work so startling is that throughout its existence Amazon has been easy to love: unlike Microsoft (system crashes and privacy), Google (search spam and privacy), or Facebook (so many issues), Amazon sends us things we want when we want them. Amazon is the second-most trusted institution in America after the military, according to a 2018 study by Georgetown University and NYU Rounding out the top five: Google, local police, and colleges and universities. The survey may need some updating.

And yet: recent stories suggest our trust is out of date.

This week, a study by the Institute for Local Self-Reliance claims that Amazon's 20-year-old Marketplace takes even higher commissions - 34% - than the 30% Apple and Google are being investigated for taking (30%) from their app stores. The study estimates that Amazon will earn $121 billion from these fees in 2021, double its 2019 takings and that Amazon's 2020 operating profits from Marketplace will reach $24 billion. The company responded to TechCrunch that some of those fees are optional add-ons, while report author Stacy Mitchell counters that "add-ons" such as better keyword search placement and using Amazon's shipping and warehousing have become essential because of the way the company disadvantages sellers who don't "opt" for them. In August, Amazon passed Walmart as the world's largest retailer outside of China). It is the only source of income for 22% of its sellers and the single biggest sales channel for many more; 56% of items sold on Amazon are from third-party sellers.

I started buying from Amazon so long ago that I have an insulated mug they sent every customer as a Christmas gift. Sometime in the last year, I started noticing the frequency of unfamiliar brand names in search results for things like cables, USB sticks, or socks. Smartwool I recognize, but Yuedge, KOOOGEAR, and coskefy? I suddenly note a small, new? tickbox on the left: "our brands". And now I see : "our brands" this time are ouhos, srclo, SuMade, and Sunew. Is it me, or are these names just plain weird?

Of course I knew Amazon owned Zappos, IMDB, Goodreads, and Abe Books, but this is different. Amazon now has hundreds of house brands, according to a study The Markup published in October. The main finding: Amazon promotes its own brands at others' expense, and being an Amazon brand or Amazon-exclusive is more important to your product's prominence than its star ratings or reviews. Amazon denies doing this. It's a classic antitrust conflict of interest: shoppers rarely look beyond the first five listed products, and the platform owner has full control over the order. The Markup used public records to identify more than 150 Amazon brands and developed a browser add-on that highlights them for you. Personally, I'm more inclined to just shop elsewhere.

Also often overlooked is Amazon's growing advertising business. Insider Intelligence estimates its digital ad revenues in 2021 at $24.47 billion - 55.5% higher than 2020, and representing 11.6% (and rising) of the (US) digital advertising market. In July, noting its riseCNBC surmised that Amazon's first-party relationship with its customers relieves it of common technology-company privacy issues. This claim - perhaps again based on the unreasonable trust so many of us place in the company - has to be wrong. Amazon collects vast arrays of personal data from search and purchase records, Alexa recordings, home camera videos, and health data from fitness trackers. We provide it voluntarily, but we don't sign blank checks for its use. Based on confidential documents, Reuters reports that Amazon's extensive lobbying operation has "killed or undermined" more than three dozen privacy bills in 25 US states. (The company denies the story and says it has merely opposed poorly crafted privacy bills.)

Privacy may be the thing that really comes to bite the company. A couple of weeks ago, Will Evans reported at Reveal News, based on a lengthy study of leaked internal documents, that Amazon's retail operation has so much personal data that it has no idea what it has, where it's stored, or how many copies are scattered across its IT estate: "sprawling, fragmented, and promiscuously shared". The very long story is that prioritizing speed of customer service has its downside, in that the company became extraordinarily vulnerable to insider threats such as abuse of access.

Organizations inevitably change over time, particularly when they're as ambitious as this one. The systems and culture that are temporary in startup mode become entrenched and patched, but never fixed. If trust is the land mass we're running on, what happens is we run off the edge of a cliff like Wile E. Coyote without noticing that the ground we trust isn't there any more. Don't look down.


Illustrations: Wile E. Coyote runs off a cliff, while the roadrunner watches.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Stories about the border wars between cyberspace and real life are posted occasionally during the week at the net.wars Pinboard - or follow on Twitter.

October 29, 2021

Majority report

Frari_(Venice)_nave_left_-_Monument_to_Doge_Giovanni_Pesaro_-_Statue_of_the_Doge.jpgHow do democracy and algorithmic governance live together? This was the central question of a workshop this week on computational governance. This is only partly about the Internet; many new tools for governance are appearing all the time: smart contracts, for example, and AI-powered predictive systems. Many of these are being built with little idea of how they can go wrong.

The workshop asked three questions:

- What can technologists learn from other systems of governance?
- What advances in computer science would be required for computational systems to be useful in important affairs like human governance?
- Conversely, are there technologies that policy makers can use to improve existing systems?

Implied is this: who gets to decide? On the early Internet, for example, decisions were reached by consensus among engineers, funded by hopeful governments, who all knew each other. Mass adoption, not legal mandate, helped the Internet's TCP/IP protocols dominate over many other 1990s networking systems: it was free, it worked well enough, and it was *there*. The same factors applied to other familiar protocols and applications: the web, email, communications between routers and other pieces of infrastructure. Proposals circulated as Requests for Comments, and those that found the greatest acceptance were adopted. In those early days, as I was told in a nostalgic moment at a conference in 1998, anyone pushing a proposal because it was good for their company would have been booed off the stage. It couldn't last; incoming new stakeholders demanded a voice.

If you're designing an automated governance system, the fundamental question is this: how do you deal with dissenting minorities? In some contexts - most obviously the US Supreme Court - dissenting views stay on the record alongside the majority opinion. In the long run of legal reasoning, it's important to know how judgments were reached and what issues were considered. You must show your work. In other contexts where only the consensus is recorded, minority dissent is disappeared - AI systems, for example, where the labelling that's adopted is the result of human votes we never see.

In one intriguing example, a panel of judges may rule a defendant is guilty or not guilty depending on whether you add up votes by premise - the defendant must have both committed the crime and possessed criminal intent - or by conclusion, in which each judge casts a final vote and only these are counted. In a small-scale human system the discrepancy is obvious. In a large-scale automated system, which type of aggregation do you choose, and what are the consequences, and for whom?

Decentralization poses a similarly knotty conundrum. We talk about the Internet's decentralized origins, but its design fundamentally does not prevent consolidation. Centralized layers such as the domain name system and anti-spam blocking lists are single points of control and potential failure. If decentralization is your goal, the Internet's design has proven to be fundamentally flawed. Lots of us have argued that we should redecentralize the Internet, but if you adopt a truly decentralized system, where do you seek redress? In a financial system running on blockchains and smart contracts, this is a crucial point.

Yet this fundamental flaw in the Internet's design means that over time we have increasingly become second-class citizens on the Internet, all without ever agreeing to any of it. Some US newspapers are still, three and a half years on, ghosting Europeans for fear of GDPR; videos posted to web forums may be geoblocked from playing in other regions. Deeper down the stack, design decisions have enabled surveillance and control by exposing routing metadata - who connects to whom. Efforts to superimpose security have led to a dysfunctional system of digital certificates that average users either don't know is there or don't know how to use to protec themselves. Efforts to cut down on attacks and network abuse have spawned a handful of gatekeepers like Google, Akamai, Cloudflare, and SORBS that get to decide what traffic gets to go where. Few realize how much Internet citizenship we've lost over the last 25 years; in many of our heads, the old cooperative Internet is just a few steps back. As if.

As Jon Crowcroft and I concluded in our paper on leaky networks for this year's this year's Gikii, "leaky" designs can be useful to speed development early on even though they pose problems later, when issues like security become important. The Internet was built by people who trusted each other and did not sufficiently imagine it being used by people who didn't, shouldn't, and couldn't. You could say it this way: in the technology world, everything starts as an experiment and by the time there are problems it's lawless.

So this the main point of the workshop: how do you structure automated governance to protect the rights of minorities? Opting to slow decision making to consider the minority report impedes decision making in emergencies. If you limit Internet metadata exposure, security people lose some ability to debug problems and trace attacks.

We considered possible role models: British corporate governance; smart contracts;and, presented by Miranda Mowbray, the wacky system by which Venice elected a new Doge. It could not work today: it's crazily complex, and impossible to scale. But you could certainly code it.


Illustrations: Monument to the Doge Giovanni Pesaro (via Didier Descouens at Wikimedia).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Stories about the border wars between cyberspace and real life are posted occasionally during the week at the net.wars Pinboard - or follow on Twitter.

June 11, 2021

The fragility of strangers

Colonial_Pipeline_System.pngThis week, someone you've never met changed the configuration settings on their individual account with a company you've never heard of and knocked out 85% of that company's network. Dumb stuff like this probably happens all the time without attracting attention, but in this case the company, Fastly. is a cloud provider that also runs an intermediary content delivery network intended to speed up Internet connections. Result: people all over the world were unable to reach myriad major Internet sites such as Amazon, Twitter, Reddit, and the Guardian for about an hour.

The proximate cause of these outages, Fastly has now told the world, was a bug that was introduced (note lack of agency) into its software code in mid-May, which laid dormant until someone did something completely normal to trigger it.

In the early days, we all assumed that as more companies came onstream and admins built experience and expertise, this sort of thing would happen less and less. But as the mad complexity of our computer systems and networks continues to increase - Internet of Things! AI! - now it's more likely that stuff like this will also increase, will be harder to debug, and will cause far more ancillary damage - and that damage will not be limited to the virtual world. A single random human, accidentally or intentionally, is now capable of creating physical-world damage at scale.

Ransomware attacks earlier this month illustrate this. Attackers' use of a single leaked password linked to a disused VPN account in the systems that run the Colonial Pipeline compromised gasoline supplies down a large swathe of the US east coast. Near-simultaneously, a ransomware attack on the world's largest meatpacker, JBS, briefly halted production, threatening food security in North America and Australia. In December, an attack on network management software supplied by the previously little-known SolarWinds compromised more than 18,000 companies and government agencies. In all these cases, random strangers reached out across the world and affected millions of personal lives by leveraging a vulnerability inside a company that is not widely known but that provides crucial services to companies we do know and use every day.

An ordinary person just trying to live their life has no defense except to have backups of everything - not just data, but service providers and suppliers. Most people either can't afford that or don't have access to alternatives, which means that precarious lives are made even more so by hidden vulnerabilities they can't assess.

An earlier example: in 2012, journalist Matt Honan's data was entirely wiped out through an attack that leveraged quirks of two unrelated services - Apple and Amazon - against each other to seize control of his email address and delete all his data. Moral: data "in the cloud" is not a backup, even if the hosting company says they keep backups. Second moral: if there is a vulnerability, someone will find it, sometimes for motives you would never guess.

If memory serves, Akamai, founded in 1998, was the first CDN. The idea was that even though the Internet means the death of distance, physics matters. Michael Lewis captured this principle in detail in his book Flash Boys, in which a handful of Wall Street types pay extraordinary amounts to shave a few split-seconds off the time it takes to make a trade by using a ruler and map to send fiber topic cables along the shortest possible route between exchanges. Just so, CDNs cache frequently accessed content on mirror servers around the world. When you call up one of those pages, it, or frequently-used parts of it in the case of dynamically assembled pages, is served up from the nearest of those servers, rather than from the distant originator. By now, there are dozens of these networks and what they do has vastly increased in sophistication, just as the web itself has. A really major outlet like Amazon will have contracts with more than one, but apparently switching from one to the other isn't always easy, and because so many outages are very short it's often easier to wait it out. Not in this case!

At The Conversation, criminology professor David Wall also sees this outage as a sign of the future for the same reason I do: centralization and consolidation have shrunk, and continue to shrink, the number of single points of widespread failure. Yes, the Internet was built to withstand a bomb outage is true - but as we have been writing for 20 years now, this Internet is not that Internet. The path to today's Internet has led from the decentralized era of Usenet, IRC, and own-your-own mail server to web hosting farms to the walled gardens of Facebook, Google, and Apple, and the AI-dominating Big Nine. In 2013, Edward Snowden's revelations made plain how well that suits surveillance-hungry governments, and it's only gotten worse since, as companies seek to insert themselves into every aspect of our lives - intermediaries that bring us a raft of new insecurities that we have no time or ability to audit.

Increasing complexity, hidden intermediation, increasing numbers of interferers, and increasing scale all add up to a brittle and fragile Internet, onto which we continue to pile all our most critical services and activities. What could possibly go wrong?


Illustrations: Map of the Colonial Pipeline.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Stories about the border wars between cyberspace and real life are posted occasionally during the week at the net.wars Pinboard - or follow on Twitter.

April 30, 2021

The tonsils of the Internet

Screenshot from 2021-04-30 13-02-46.pngLast week the US Supreme Court decided the ten-year-old Google v. Oracle copyright case. Unlike anyone in Jarndyce v. Jarndyce, which bankrupted all concerned, Google will benefit financially, and in other ways so will the rest of us.

Essentially, the case revolved around whether Google violated Oracle's copyright by copying about 11,500 lines of the software code (out of millions) that makes up the Java platform, part of the application programming interface. Google claimed fair use. Oracle disagreed.

Tangentially: Oracle owns Java because in 2010 it bought its developer, Sun Microsystems, which open-sourced the software in 2006. Google bought Android in 2005; it, too, is open source. If the antitrust authorities had blocked the Oracle acquisition, which they did consider, there would have been no case.

The history of disputes over copying and interoperability case goes back to the 1996 case Lotus v. Borland, in which Borland successfully argued that copying the way Lotus organized its menus was copying function, not expression. By opening the way for software programs to copy functional elements (like menus and shortcut keys), the Borland case was hugely important. It paved the way for industry-wide interface standards and thereby improved overall usability and made it easier for users to switch from one program to another if they wanted to. This decision, similarly, should enable innovation in the wider market for apps and services.

Also last week, the US Congress conducted both the latest in the series of antitrust hearings and interrogated Lina Khan, who has been nominated for a position at the Federal Trade Commission. Biden's decision to appoint her, as well as Tim Wu to the National Economic Council, has been taken as a sign of increasing seriousness about reining in Big Tech.

The antitrust hearing focused on the tollbooths known as app stores; in his opening testimony, Mark Cooper, director of research at the Consumer Federations of America, noted that the practices described by the chair, Senator Amy Klobuchar (D-MN) were all illegal in the Microsoft case, which was decided in 1998. A few minutes later, Horacio Gutierrez, Spotify's head of global affairs and chief legal officer, noted that "even" Microsoft never demanded a 30% commission from software developers to run on its platform".

Watching this brought home the extent to which the mobile web, with its culture of walled gardens and network operator control, has overwhelmed the open web we Old Net Curmudgeons are so nostalgic about. "They have taken the Internet and moved it into the app stores", Jared Sine told the committee, and that's exactly right. Opening the Internet back up requires opening up the app stores. Otherwise, the mobile web will be little different than CompuServe, circa 1991.

BuzzFeed technology reporter Ryan Mac posted on Twitter the anonymous account of a just-quit Accenture employee's account of their two and a half years as a content analyst for Facebook. The main points: the work is a constant stream of trauma; there are insufficient breaks and mental health support; the NDAs they are forced to sign block them from turning to family and friends for help; and they need the chance to move around to other jobs for longer periods of respite. "We are the tonsils of the Internet," they wrote. Medically, we now know that the tonsils that doctors used to cheerfully remove play an important role in immune system response. Human moderation is essential if you want online spaces to be tolerably civil; machines simply aren't good enough, and likely never will be, and abuse appears to be endemic in online spaces above a certain size. But just as the exhausted health workers who have helped so many people survive this pandemic should be viewed as a rare and precious resource instead of interchangeable parts whose distress the anti-lockdown, no-mask crowd are willing to overlook, the janitors of the worst and most unpleasant parts of the Internet need to be treated with appropriate care.

The power differential, the geographic spread, their arms-length subcontractor status, and the technology companies' apparent lack of interest combine to make that difficult. Exhibit B: Protocol reports that contract workers in Google's data centers are required to leave the company for six months every two years and reapply for their jobs, apparently just so they won't gain the rights of permanent employees.

In hopes of change, many were watching the Bessemer, Alabama Amazon warehouse workers' vote on unionizing. Now, the results are in: 1,798 to 738 against. You would think that one thing that could potentially help these underpaid, traumatized content moderators - as well as the drivers, warehouse workers, and others who are kept at second-class arm's length from the technology companies who so diligently ensure they don't become full employees - is a union. Because of the potential impact on the industry at large, many were watching closely, both the organizating efforts and Amazon's drive to oppose them.

Nonetheless, this isn't over. Moves toward unionizing have been growing for years in pockets all over the technology industry, and eventually it will be inescapable. We're used to thinking about technology companies' power in terms of industry consolidating and software licensing; workers are the ones who most directly feel the effects.


Illustrations: The chancellor (Ian Richardson), announcing the end of Jarndyce and Jarndyce in the BBC's 2005 adaptation of Bleak House.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Stories about the border wars between cyberspace and real life are posted occasionally during the week at the net.wars Pinboard - or follow on Twitter.

December 4, 2020

Scraped

Somehow I had missed the hiQ Labs v. LinkedIn case until this week, when I struggled to explain on Twitter why condemning web scraping is a mistake. Over the years, many have made similar arguments to ban ordinary security tools and techniques because they may also be abused. The usual real world analogy is: we don't ban cars just because criminals can use them to escape.

The basics: hiQ, which styles itself as a "talent management company", used automated bots to scrape public LinkedIn profiles, and analyze them into a service advising companies what training they should invest in or which employee might be on the verge of leaving. All together now: *so* creepy! LinkedIn objected that the practice violates its terms of service and harms its business. In return, hiQ accused LinkedIn of purely anti-competitive motives, and claimed it only objected now because it was planning its own version.

LinkedIn wanted the court to rule that hiQ's scraping its profiles constitutes felony hacking under the Computer Fraud and Abuse Act (1986). Meanwhile, hiQ argued that because the profiles it scraped are public, no "hacking" was involved. EFF, along with DuckDuckGo and the Internet Archive, which both use web scraping as a basic tool, filed an amicus brief arguing correctly that web scraping is a technique in widespread use to support research, journalism, and legitimate business activities. Sure, hiQ's version is automated, but that doesn't make it different in kind.

There are two separate issues here. The first is web scraping itself, which, as EFF says, has many valid uses that don't involve social media or personal data. The TrainTimes site, for example, is vastly more accessible than the National Rail site it scrapes and re-presents. Over the last two decades, the same author, Matthew Somerville, has built numerous other such sites that avoid the heavy graphics and scripts that make so many information sites painful to use. He has indeed gotten in trouble for it sometimes; in this example, the Odeon movie theaters objected to his making movie schedules more accessible. (Query: what is anyone going to do with the Odeon movie schedule beyond choosing which ticket to buy?)

As EFF writes in its summary of the case, web scraping has also been used by journalists to investigate racial discrimination on Airbnb and find discriminatory pricing on Amazon; in the early days of the web, civic-minded British geeks used web scraping to make information about Parliament and its debates more accessible. Web scraping should not be illegal!

However, that doesn't mean that all information that can be scraped should be scraped or that all information that can be scraped should be *legal* to scrape. Like so many other basic techniques, web scraping has both good and bad uses. This is where the tricky bit lies.

Intelligence agency personnel these days talk about OSINT - "open source intelligence". "Open source" in this context (not software!) means anything they can find and save, which includes anything posted publicly on social media. Journalists also tend to view anything posted publicly as fair game for quotation and reproduction - just look at the Guardian's live blog any day of the week. Academic ethics require greater care.

There is plenty of abuse-by-scraping. As Olivia Solon reported last year, IBM scraped Flickr users' innocently posted photographs repurposed them into a database to train facial recognition algorithms, later used by Immigration and Customs Enforcement to identify people to deport. (In June, when the protests after George Floyd's murder led IBM to pull back on selling facial recognition "for mass surveillance or racial profiling".) Clearview AI scraped billions of photographs off social media and collating them into a database service to sell to law enforcement. It's safe to say that no one posted their profile on LinkedIn with the intention of helping a third-party company get paid by their employer to spy on them.

Nonetheless, those abuse cases do not make web scraping "hacking" or a crime. They are difficult to rectify in the US because, as noted in last week's review of 30 years of data protection, the US lacks relevant privacy laws. Here in the UK, since the data Somerville was scraping was not personal, his complainants typically argued that he was violating their copyright. The hiQ case, if brought outside the US, would likely be based in data protection law.

In 2019, the Ninth Circuit ruled in favor of hiQ, saying it did not violate CFAA because LinkedIn's servers were publicly accessible. In March, LinkedIn asked the Supreme Court to review the case. SCOTUS could now decide whether scraping publicly accessible data is (or is not) a CFAA violation.

What's wrong in this picture is the complete disregard for the users in the case. As the National Review says, a ruling for hiQ could deprive users of all control over their publicly posted information. So, call a spade a spade: at its heart this case is about whether LinkedIn has an exclusive right to abuse its users' data or whether it has to share that right with any passing company with a scraping bot. The profile data hiQ scraped is public, to be sure, but to claim that opens it up for any and all uses is no more valid than claiming that because this piece is posted publicly it is not copyrighted.


Illustrations: I simply couldn't think of one.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Stories about the border wars between cyberspace and real life are posted occasionally during the week at the net.wars Pinboard - or follow on Twitter.