« "Crypto" | Main | Scale »

Dependencies at scals

xkcd-dependency.pngIt's the complexity that's going to get us. (We're talking cyber system failures, not covid!)

In the 1990s and early 2000s Internet pundits used to have a fun game: what was going to kill the Internet? Or, what was going to kill the Internet *next*? The arrival of the web, which brought a much larger user base and comparatively data-hungry graphics (comparatively as in text; obviously much worse was to come), nearly did it for a bit, which is why a lot of us called it the "World Wide Wait".

Here's one example, a net.wars from 2002, based on a panel from the 1998 Computers, Freedom, and Privacy conference: 50 ways to crash the net. The then-recent crisis that had suggested the panel was a denial-of-service attack on the core 13 routers that form the heart of the domain name system. But also: the idea was partly suggested by a Wired article by Simson Garfinkel about how to crash the Internet, based on both the router incident and another in which a construction crew in Virginia sliced through a crucial fiber optic cable. As early as that, Garfinkel blamed centralization and corporatization; the "Internet" that was built to withstand a bomb outage was the old military Internet, not the commercial one built on its bones.

But that's not what's going to get us. People learn! People fix things! In fact, experts tell me, the engineering that underlies the Internet is nothing like it was even ten years ago. "The Internet" as an engineer would talk about it is remarkably solid and robust. When the rest of us sloppily complain about "the Internet" what we mean is buggy software, underfunded open source projects that depend on one or a few overworked people but underpin software used by billions, human error, database leaks, sloppy security policies, corporate malfeasance, criminal attacks, failures of content moderation on Facebook, and power outages. When these factors come into play and connections break, "the Internet" is actually still fine. The average user, however, when unable to reach Netflix and find many other sites are also unreachable, interprets the situation as "the Internet is out". It's a mental model issue.

A few months ago, we noted the fragile brittleness of today's "Internet" after an incident in which one person made a perfectly ordinary configuration change that should have done nothing more than alter the settings on their account and instead set off a cascade of effects that knocked out a load of other Internet services. Also right around then, a ransomware attack using a leaked password and a disused VPN account led to corporate anxiety that shut down the Colonial pipeline, leading to gas shortages up and down the US east coast. These were not outages of "the Internet", but without the Internet they would not have happened.

This year is ending with more such issues. Last week, Amazon Web Services had an outage service event in which "unexpected behavior" created a feedback loop of increasing congestion that might as well have been a denial-of-service attack. What followed was an eight-hour lesson in service dependence. Blocked during that time: parts of Amazon's own retail and delivery operations, including Whole Foods; Disney+; Netflix; Internet of Things devices including Amazon Ring doorbells, Roomba vacuum cleaners, and connected cat litter boxes; and the teaching platform Canvas.

Separately but almost simultaneously, a vulnerability now dubbed Log4Shell was reported to the Apache Foundation, which notified the world at large on December 9. The vulnerability is one of a classic type in which a program - in this case popular logging software Log4j - interprets an input data string as an instruction to execute. In this case, as Dan Goodin explains at Ars Technica, the upshot is that attackers can execute any Java code they like on the affected computer. The vulnerability, which has been present since 2013, is all over the place, embedded in systems that run...everything. Within a few days 44% of corporate networks had been probed and more than 60 exploit variants had been developed, with some attacks coming from state actors and criminal hacking groups. As Goodin explains, your best hope is that your bank, brokerage, and favorite online shops are patching their systems right now.

The point about all this is that greater complexity breeds more, and more difficult to find and fix, errors. Even many technical experts had never heard of Log4j until this bug appeared. Few would expect a bug in a logging utility to be so broadly dangerous, just as few could predict which major businesses would be taken out by an AWS outage. As Kurt Marko writes at Diginomica, the two incidents show the hidden and unexpected dependencies lurking on today's "Internet". The same permissionlessness that allowed large businesses to start with nothing and scale up means dependencies no one has found (yet). In 2014, shortly after Heartbleed reminded everyone of the dangers of infrastructure dependence on software maintained by one or two volunteers, Farhad Majoo warned at the New York Times about the risks of just this complexity.

Complexity and size bring dependencies at scale - harder to predict than the weather, in part because software is forever. Humans are not good at understanding scale.

Illustrations: XKCD's classic cartoon, "Dependency".

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Stories about the border wars between cyberspace and real life are posted occasionally during the week at the net.wars Pinboard - or follow on Twitter.


TrackBack URL for this entry:

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)