Sunday, August 26, 2007

What is Machine Learning

I'm often asked what it actually is what I do for a living. I'm not sure if you can understand this, but "we Germans" have a special relationships with anything which involves mathematics. There is a whole family of jokes about a mathematician who has a date and is asked what his job is. They all end with the disclosure "I'm a mathematician" followed by "Oh... . I was never good at math in school". And that is the end of it.

The next thing is that "Machine Learning" is somewhat close to "Artificial Intelligence" and, come on, who would be able to hear that somebody is working on building "intelligent machines" and keep a straight face. Have you ever had to call a company and realized that their replaced their already annoying menu by number scheme by an incredibly more annoying "speech recognition" software. "If you are calling to ask about your contract, say 'contract', if you want to ask about your invoice, say 'invoice'..." - "CON-TRACT!!!" - "Sorry, I could not understand you. If your are calling to ask..." and repeat ad infinitum.

Where was I? Ah, so the question is, what is Machine Learning about. In my Ph.D. thesis, I state that "machine learning is concerned which constructing algorithms which are able to learn from data". Well, this is certainly accurate but it does not answer a few important questions: Why would you want to learn from data? And what?

I lately have come to realize that machine learning is nothing else than an extension of how to write programs which solve complex problems. In fact problems which are so complex that you don't manage to come up with a formal specification of what the program should accomplish. Classically, programs have been written to address problems which could be formalized well: Basic arithmetics (but make it really fast), how to compute the shortest path in a graph, how to optimize network flow, and so on. But there were always these problems whose solution were elusive: making computers see, understand natural language, control robots which can interact with the real world. Most of these problems are "easily" solved by humans, but maybe only because evolution has outfitted us with the right hardware, er, wetware.

For most of these problems, it is easy to find partial solutions. For example, for object recognition in images, it is clear that the size or position of an object in an image does usually not make a difference (well, maybe unless the relative position of objects in an image does. Or since when can elephants fly?). But things quickly become less clear, and the "old way" of solving such problems by first understanding the problem fully and then devising a list of basic operations which always result in the right solution plainly does not work.

Enters Machine Learning! ML algorithms learn a mapping from input-output pair examples and state-of-the-art ML algorithms can deal with sets which contain up to several million examples. But wait, this does not mean that the "old way" is obsolete. As it turns out just taking a few million images and throwing some vanilla ML algorithm against the data won't work pretty well.

As almost every partitioner of ML knows, it is all in the preprocessing. In principle, the methods might be able to eventually work, but they work so much better (read: require less data) if you perform some form of preprocessing. For object recognition, it might help if the object is nicely centered, for example. ML people are often a bit annoyed that finding the right preprocessing is so important. They want to build machines which can do everything by themselves.

But if you look at it at a different angle, you see that the preprocessing is actually the place where the "old way" and the "ML way" nicely meet. The preprocessing roughly amounts to solving the problem partially, taking all available information into account. The remaining part of the problem is then handed over to the machine which solves it the way it works best: by brute force, quenching the required information from several thousand to millions of examples.

So, the next time somebody asks me what I do for a living, I'll try to tell them nothing about intelligence or statistics, but solving problems which are so hard that nobody has found a solution yet using the sheer computational power of modern processors. Let's see if I'll manage to circumvent the inevitable "I was bad at math" reaction. Or at least delay it for 5 minutes.

Tuesday, August 21, 2007

Hackers & Painters

Yet another book I recently picked up again: "Hackers & Painters" by Paul Graham. Paul Graham is sort of a celebrity in the LISP community, but I fear I'm a bit too young for that. Don't tell anybody, but I got my basic training on the good ol' commodore basic v3.5. Yes, I owned a C=16!

Anyway, from what I gather, Paul has developed the software which would eventually power the Yahoo Store! back in the 90s, and got really rich by selling his startup company to Yahoo. Oh, and I forgot that he wrote the whole thing in LISP.

So it seems that Paul Graham has since had an ample amount of time to work on some new projects, like arc, the last LISP dialect you'd ever need to learn, show people how to start your own company and get rich, and writing books.

I read the whole thing a while ago and it contains a number of interesting little essays. This morning, on my way to the lab I started to read the first few pages of the chapter "Hackers and Painters" and I had to smile how accurate his descriptions were.

According to Mr. Graham, computer science is home to many different types of people. On the theoretical end, there are mathematicians who prove theorems and are basically doing, well, math. Then there is what he calls the "natural sciences" of computation: people who study algorithms and their behavior the way you would study an animal. You devise experiments to collect data and statistics about how one algorithm compares to the other.

The "true hacker", though, is again different because his goal is to create new software. It is not just engineering, but it is a mixture of architecture and engineering, and the hacker ideally combines both these professions in one person.

However, Paul argues that the hacker's endeavor is unscientific at heart, or at least does not fit in well with the usually scientific measures of quality. Often, writing a program is not about reinventing something, or solving really hard problems, but about combining existing techniques in a clever new fashion.

When I look at my field, machine learning, I can easily identify each of these three different school of thought, sometimes even within one paper. For example, you first proving some theorems which nevertheless fail to give a complete answer, and then compare algorithms experimentally.

And sometimes, you end up doing work which you think is cool and valuable, but which just brings together existing techniques in a really nice fashion. People will object that it all has been done before, and you agree that they are right, but still, your specific mixture really adds something to, well, the mixture.

An example is our little programming language pet-project rhabarber. Well, it borrows a lot of LISP, maybe some smalltalk, and a lot from python and ruby in terms of how you would want to work with a language. People instantly ask why we need a new computer language. And yet, I think that the specific mixture is something special, which will lead to a more useful tool than all existing languages.

But I disgress. So if you've finished reading Steve Levy's "Hackers", I recommend reading Paul Grahams "Hackers & Painters" next ;) And if you don't feel like buying the book, Paul has many essays on his homepage.

Tuesday, August 14, 2007

Visual Timeline

I know this is quite old, but it is absolutely hilarious. Some of the "visuals" have the potential to become community memes of their own.