October 5, 2012

The doors of probability

Mike Lynch has long been the most interesting UK technology entrepreneur. In 2000, he became Britain's first software billionaire. In 2011 he sold his company, Autonomy, to Hewlett-Packard for $10 billion. A few months ago, Hewlett-Packard let him escape back into the wild of Cambridge. We've been waiting ever since for hints of what he'll do next; on Monday, he showed up at NESTA to talk about his adventures with Wired UK editor David Rowan.

Lynch made his name and his company by understanding that the rule formulated in 1750 by the English vicar and mathematician Thomas Bayes could be applied to getting machines to understand unstructured data. These days, Bayes is an accepted part of the field of statistics, but for a couple of centuries anyone who embraced his ideas would have been unwise to admit it. That began to change in the 1980s, when people began to realize the value of his ideas.

"The work [Bayes] did offered a bridge between two worlds," Lynch said on Monday: the post-Renaissance world of science, and the subjective reality of our daily lives. "It leads to some very strange ideas about the world and what meaning is."

As Sharon Bertsch McGrayne explains in The Theory That Would Not Die, Bayes was offering a solution to the inverse probability problem. You have a pile of encrypted code, or a crashed airplane, or a search query: all of these are effects; your problem is to find the most likely cause. (Yes, I know: to us the search query is the cause and the page of search results if the effect; but consider it from the computer's point of view.) Bayes' idea was to start with a 50/50 random guess and refine it as more data changes the probabilities in one direction or another. When you type "turkey" into a search engine it can't distinguish between the country and the bird; when you add "recipe" you increase the probability that the right answer is instructions on how to cook one.

Note, however, that search engines work on structured data: tags, text content, keywords, and metadata all going into building an index they can run over to find the hits. What Lynch is talking about is the stuff that humans can understand - raw emails, instant messages, video, audio - that until now has stymied the smartest computers.

Most of us don't really like to think in probabilities. We assume every night that the sun will rise in the morning; we call a mug a mug and not "a round display of light and shadow with a hole in it" in case it's really a doughnut. We also don't go into much detail in making most decisions, no matter how much we justify them afterwards with reasoned explanations. Even decisions that are in fact probabilistic - such as those of the electronic line-calling device Hawk-Eye used in tennis and cricket - we prefer to display as though they were infallible. We could, as Cardiff professor Harry Collins argued, take the opportunity to educate people about probability: the on-screen virtual reality animation could include an estimate of the margin for error, or the probability that the system is right (much the way IBM did in displaying Watson's winning Jeopardy answers). But apparently it's more entertaining - and sparks fewer arguments from the players - to pretend there is no fuzz in the answer.

Lynch believes we are just at the beginning of the next phase of computing, in which extracting meaning from all this unstructured data will bring about profound change.

"We're into understanding analog," he said. "Fitting computers to use instead of us to them." In addition, like a lot of the papers and books on algorithms I've been reading recently, he believes we're moving away from the scientific tradition of understanding a process to get an outcome and into taking huge amounts of data about outcomes and from it extracting valid answers. In medicine, for example, that would mean changing from the doctor who examines a patient, asks questions, and tries to understand the cause of what's wrong with them in the interests of suggesting a cure. Instead, why not a black box that says, "Do these things" if the outcome means a cured patient? "Many people think it's heresy, but if the treatment makes the patient better..."

At the beginning, Lynch said, the Autonomy founders thought the company could be worth £2 to £3 million. "That was our idea of massive back then."

Now, with his old Autonomy team, he is looking to invest in new technology companies. The goal, he said, is to find new companies built on fundamental technology whose founders are hungry and strongly believe that they are right - but are still able to listen and learn. The business must scale, requiring little or no human effort to service increased sales. With that recipe he hopes to find the germs of truly large companies - not the put in £10 million sell out at £80 million strategy he sees as most common, but multi-billion pound companies. The key is finding that fundamental technology, something where it's possible to pick a winner.

Wendy M. Grossman's Web site has an extensive archive of her books, articles, and music, and an archive of all the earlier columns in this series.

February 18, 2011

What is hyperbole?

This seems to have been a week for over-excitement. IBM gets an onslaught of wonderful publicity because it built a very large computer that won at the archetypal American TV game, Jeopardy. And Eben Moglen proposes the Freedom box, a more-or-less pocket ("wall wart") computer you can plug in and that will come up, configure itself, and be your Web server/blog host/social network/whatever and will put you and your data beyond the reach of, well, everyone. "You get no spying for free!" he said in his talk outlining the idea for the New York Internet Society.

Now I don't mean to suggest that these are not both exciting ideas and that making them work is/would be an impressive and fine achievement. But seriously? Is "Jeopardy champion" what you thought artificial intelligence would look like? Is a small "wall wart" box what you thought freedom would look like?

To begin with Watson and its artificial buzzer thumb. The reactions display everything that makes us human. The New York Times seems to think AI is solved, although its editors focus, on our ability to anthropomorphize an electronic screen with a smooth, synthesized voice and a swirling logo. (Like HAL, R2D2, and Eliza Doolittle, its status is defined by the reactions of the surrounding humans.)

The Atlantic and Forbes come across as defensive. The LA Times asks: how scared should we be? The San Francisco Chronicle congratulates IBM for suddenly becoming a cool place for the kids to work.

If, that is, they're not busy hacking up Freedom boxes. You could, if you wanted, see the past twenty years of net.wars as a recurring struggle between centralization and distribution. The Long Tail finds value in selling obscure products to meet the eccentric needs of previously ignored niche markets; eBay's value is in aggregating all those buyers and sellers so they can find each other. The Web's usefulness depends on the diversity of its sources and content; search engines aggregate it and us so we can be matched to the stuff we actually want. Web boards distributed us according to niche topics; social networks aggregated us. And so on. As Moglen correctly says, we pay for those aggregators - and for the convenience of closed, mobile gadgets - by allowing them to spy on us.

An early, largely forgotten net.skirmish came around 1991 over the asymmetric broadband design that today is everywhere: a paved highway going to people's homes and a dirt track coming back out. The objection that this design assumed that consumers would not also be creators and producers was largely overcome by the advent of Web hosting farms. But imagine instead that symmetric connections were the norm and everyone hosted their sites and email on their own machines with complete control over who saw what.

This is Moglen's proposal: to recreate the Internet as a decentralized peer-to-peer system. And I thought immediately how much it sounded like...Usenet.

For those who missed the 1990s: invented and implemented in 1979 by three students, Tom Truscott, Jim Ellis, and Steve Bellovin, the whole point of Usenet was that it was a low-cost, decentralized way of distributing news. Once the Internet was established, it became the medium of transmission, but in the beginning computers phoned each other and transferred news files. In the early 1990s, it was the biggest game in town: it was where the Linus Torvalds and Tim Berners-Lee announced their inventions of Linux and the World Wide Web.

It always seemed to me that if "they" - whoever they were going to be - seized control of the Internet we could always start over by rebuilding Usenet as a town square. And this is to some extent what Moglen is proposing: to rebuild the Net as a decentralized network of equal peers. Not really Usenet; instead a decentralized Web like the one we gave up when we all (or almost all) put our Web sites on hosting farms whose owners could be DMCA'd into taking our sites down or subpoena'd into turning over their logs. Freedom boxes are Moglen's response to "free spying with everything".

I don't think there's much doubt that the box he has in mind can be built. The Pogoplug, which offers a personal cloud and a sort of hardware social network, is most of the way there already. And Moglen's argument has merit: that if you control your Web server and the nexus of your social network law enforcement can't just make a secret phone call, they'll need a search warrant to search your home if they want to inspect your data. (On the other hand, seizing your data is as simple as impounding or smashing your wall wart.)

I can see Freedom boxes being a good solution for some situations, but like many things before it they won't scale well to the mass market because they will (like Usenet) attract abuse. In cleaning out old papers this week, I found a 1994 copy of Esther Dyson's Release 1.0 in which she demands a return to the "paradise" of the "accountable Net"; 'twill be ever thus. The problem Watson is up against is similar: it will function well, even engagingly, within the domain it was designed for. Getting it to scale will be a whole 'nother, much more complex problem.

Wendy M. Grossman's Web site has an extensive archive of her books, articles, and music, and an archive of all the earlier columns in this series.

September 1, 2006

The elephant in the dark

Yesterday, August 31, was the actual 50th anniversary of the first artificial intelligence conference, held at Dartmouth in 1956 and recently celebrated with a kind of rerun. John McCarthy, who convened the original conference, spent yesterday giving a talk to a crowd of students at Imperial College, London, on challenges for machine learning, specifically recounting a bit of recent progress working with Stephen Muggleton and Ramon Otero on a puzzle he proposed in 1999.
Here is the puzzle, which expresses the problem of determining an underlying reality from an outward appearance. Most machine learning research, he noted, has concerned the classification of appearance. But this isn't enough for a robot – or a human – to function in the real world. "Robots will have to infer relations between reality and appearance."

One of his examples was John Dalton's work discovering atoms. "Computers need to be able to propose theories," he said – and later modify them according to new information. (Though I note that there are plenty of humans who are unable to do this and who will, despite all evidence and common sense to the opposite, cling desperately to their theory.)

Human common sense reasons in terms of the realities. Some research suggests, for example, that babies are born with some understanding of the permanence of objects – that is, that when an object is hidden by a screen and reappears it is the same object.

Take, as McCarthy did, the simple (for a human) problem of identifying objects without being able to see them; his example was reaching into your pocket and correctly identifying and pulling out your Swiss Army knife (assuming you live in a country where it's legal to carry one). Or identifying the coin you want from a collection of similar coins. You have some idea of what the knife looks and feels like, and you choose the item by its texture and what you can feel of the shape. McCarthy also cited an informal experiment in which people were asked to draw a statuette hidden in a paper bag – they could reach into the paper bag to feel the statue. People can actually do this with little difference than if they can see the object.

But, he said, "You never form an image of the contents of the pocket as a whole. You might form a list." He has, he said, been trying to get Stanford to make a robotic pickpocket.

You can, of course, have a long argument about whether there is such a thing as any kind of objective reality. I've been reading a lot of Philip K. Dick lately, and he had robots that were indistinguishable from humans, even to themselves; yet in Dick's work reality is a fluid, subjective concept that can be disrupted and turned back on itself at any time. You can't trust reality.

But even if you – or philosophers in general – reject the notion of "reality" as a fundamental concept, "You may still accept the notion of relative reality for the design and debugging of robots." Seems a practical approach.
But the more important aspect may be the amount of pre-existing knowledge. "The common view," he said, "is that a computer should solve everything from scratch." His own view is that it's best to provide computers with "suitably formalized" common sense concepts – and that formalizing context is a necessary step.

For example: when you reach into your pocket you have some idea of the contents are likely to be. Partly, of course, because you put them there. But you could make a reasonable guess even about other people's pockets because you have some idea of the usual size of pockets and the kinds of things people are likely to put in them. We often call that "common sense", but a lot of common sense is experience. Other concepts have been built into human and most animal infants through evolution.

Although McCarthy never mentioned it, that puzzle and these other examples all remind me of the story of the elephant and the blind men, which I first came across in the writings of Idries Shah, who attributed it to the Persian poet Rumi. Depending which piece of the elephant a blind man got hold of, he diagnosed the object as a fan (ear), pillar (leg), hose (trunk), or throne (back). It seems to me a useful analogy to explain why, 50 years on, human-level artificial intelligence still seems so far off. Computers don't have our physical advantages in interacting with the world.

An amusing sidelight that seemed to reinforce that point. After the talk, there was some discussion of building the three-dimensional reality behind McCarthy's puzzle. The longer it went on, the more confused I got about what the others thought they were building; they insisted there was no difficulty in getting around the construction problem I had, which was how to make the underlying arcs turn one and only one stop in each direction. How do you make it stop? I asked. Turns out: they were building it mentally with Meccano. I was using cardboard circles with a hole and a fastener in the middle, and marking pens. When I was a kid, girls didn't have Meccano. Though, I tell you, I'm going to get some *now*.

Wendy M. Grossman’s Web site has an extensive archive of her books, articles, and music, and an archive of all the earlier columns in this series. Readers are welcome to post here, at net.wars home, at her , or by email to (but please turn off HTML).