Monday, November 12, 2007

Steve Yegge

I recently stumbled upon (in its actual meaning, not using the website) over Steve Yegge's blog. Actually it was via some page on emacs lisp which I'm unable to retrieve now. In any case, his Tour de Babel tour of programming languages is quite funny, as most of his other posts are. If you have some time to kill, I can heartily recommend his blog. Also, don't miss his talk at OSCON 2007.

Tuesday, September 18, 2007

Why Ruby > Python

Lately, I've been hacking around a bit with ruby and I must confess, I've started to like it quite much. In particular, there are some things which I like much better than in python. And no, this won't be all about the enforcing nice syntax.

So here are the things I consider a big win:

  • Extendable standard classes There is some nice feature for strings which the ruby developers missed? You can just add it to the String class. No need to derive your special MyStrings, or anything.
  • Blocks Okay, it took a while to get used to this, but once you understood how it works, blocks allow you to extend the ruby language by new syntactic constructions (well, mostly loops, but you can also use them for resource management. Just pass out a handle to an object and take care of the proper cleanup afterwards. Ruby's open can be used like this). Also, I find blocks much cleaner than iterators, since the whole loop logic is encoded at a single place (instead of being split over two or three functions)
  • Function calls without parenthesis Again, you can write new functions and really extend the ruby syntax. Paired with introspection, you can write quite powerful class modifiers and call them in a clean, simple syntax. This is heavily used by rails, for example.

Okay, for me the biggest problem with ruby is a lack of a large numeric and matrix library. Python with its scipy it clearly ahead in this respect. There exist ruby bindings of the GNU Scientific Library, but it lacks the lapack functions which means that, for example, the eigenvalue functions are not really fast.

That, and that ruby is reported to be somewhat slower than python. But maybe that changes with the next version which will include a virtual machine... .

Tuesday, September 11, 2007

SICP

Some time before I began studying computer science, I heard of the book "structure and interpretation of computer programs". As I was always on the search for new, life changing computer languages (as someone once said: "A programming language which does not change the way you think about programming is not worth learning"), I was quite eager to get my hands on the book. However, it was out of print back then, or at least impossible to get in Germany for an agreeable price.

So I started to learn scheme from the official standard document which turned out to be rather painful. Anyway, years passed by, I learned C++, finally got back to C, learned python, then ruby, played a bit around with haskell, and emacs lisp. And suddenly I learn that the whole book is online.

Well, turns out that ten years later things are maybe not that exciting any more, but still, maybe this book can still change your life ;) Who knows what I'd be doing now if I read this books back then... .

Wednesday, September 05, 2007

Palm cancels Foleo

It seems that palm has canceled the Foleo just a few weeks before its official release. This must have been a very hard decision, after building up all that expectation and being almost ready to ship the product.

I have personally owned Palm PDAs for the last few years. Let's say that overall it was a pleasing experience (including some killer-apps like Kanji Gym), although in terms of stability Palm was always a bit of a nuisance. Maybe the most amazing thing is that I never lost data during the countless soft-resets. Now if I think about it, syncing the Palm has become quite of an issue lately... .

Anyway, if they manage to focus all their efforts on finally releasing the next (hopefully more stable) PalmOS, this might eventually turn out to have been the right decision.

Sunday, August 26, 2007

What is Machine Learning

I'm often asked what it actually is what I do for a living. I'm not sure if you can understand this, but "we Germans" have a special relationships with anything which involves mathematics. There is a whole family of jokes about a mathematician who has a date and is asked what his job is. They all end with the disclosure "I'm a mathematician" followed by "Oh... . I was never good at math in school". And that is the end of it.

The next thing is that "Machine Learning" is somewhat close to "Artificial Intelligence" and, come on, who would be able to hear that somebody is working on building "intelligent machines" and keep a straight face. Have you ever had to call a company and realized that their replaced their already annoying menu by number scheme by an incredibly more annoying "speech recognition" software. "If you are calling to ask about your contract, say 'contract', if you want to ask about your invoice, say 'invoice'..." - "CON-TRACT!!!" - "Sorry, I could not understand you. If your are calling to ask..." and repeat ad infinitum.

Where was I? Ah, so the question is, what is Machine Learning about. In my Ph.D. thesis, I state that "machine learning is concerned which constructing algorithms which are able to learn from data". Well, this is certainly accurate but it does not answer a few important questions: Why would you want to learn from data? And what?

I lately have come to realize that machine learning is nothing else than an extension of how to write programs which solve complex problems. In fact problems which are so complex that you don't manage to come up with a formal specification of what the program should accomplish. Classically, programs have been written to address problems which could be formalized well: Basic arithmetics (but make it really fast), how to compute the shortest path in a graph, how to optimize network flow, and so on. But there were always these problems whose solution were elusive: making computers see, understand natural language, control robots which can interact with the real world. Most of these problems are "easily" solved by humans, but maybe only because evolution has outfitted us with the right hardware, er, wetware.

For most of these problems, it is easy to find partial solutions. For example, for object recognition in images, it is clear that the size or position of an object in an image does usually not make a difference (well, maybe unless the relative position of objects in an image does. Or since when can elephants fly?). But things quickly become less clear, and the "old way" of solving such problems by first understanding the problem fully and then devising a list of basic operations which always result in the right solution plainly does not work.

Enters Machine Learning! ML algorithms learn a mapping from input-output pair examples and state-of-the-art ML algorithms can deal with sets which contain up to several million examples. But wait, this does not mean that the "old way" is obsolete. As it turns out just taking a few million images and throwing some vanilla ML algorithm against the data won't work pretty well.

As almost every partitioner of ML knows, it is all in the preprocessing. In principle, the methods might be able to eventually work, but they work so much better (read: require less data) if you perform some form of preprocessing. For object recognition, it might help if the object is nicely centered, for example. ML people are often a bit annoyed that finding the right preprocessing is so important. They want to build machines which can do everything by themselves.

But if you look at it at a different angle, you see that the preprocessing is actually the place where the "old way" and the "ML way" nicely meet. The preprocessing roughly amounts to solving the problem partially, taking all available information into account. The remaining part of the problem is then handed over to the machine which solves it the way it works best: by brute force, quenching the required information from several thousand to millions of examples.

So, the next time somebody asks me what I do for a living, I'll try to tell them nothing about intelligence or statistics, but solving problems which are so hard that nobody has found a solution yet using the sheer computational power of modern processors. Let's see if I'll manage to circumvent the inevitable "I was bad at math" reaction. Or at least delay it for 5 minutes.

Tuesday, August 21, 2007

Hackers & Painters

Yet another book I recently picked up again: "Hackers & Painters" by Paul Graham. Paul Graham is sort of a celebrity in the LISP community, but I fear I'm a bit too young for that. Don't tell anybody, but I got my basic training on the good ol' commodore basic v3.5. Yes, I owned a C=16!

Anyway, from what I gather, Paul has developed the software which would eventually power the Yahoo Store! back in the 90s, and got really rich by selling his startup company to Yahoo. Oh, and I forgot that he wrote the whole thing in LISP.

So it seems that Paul Graham has since had an ample amount of time to work on some new projects, like arc, the last LISP dialect you'd ever need to learn, show people how to start your own company and get rich, and writing books.

I read the whole thing a while ago and it contains a number of interesting little essays. This morning, on my way to the lab I started to read the first few pages of the chapter "Hackers and Painters" and I had to smile how accurate his descriptions were.

According to Mr. Graham, computer science is home to many different types of people. On the theoretical end, there are mathematicians who prove theorems and are basically doing, well, math. Then there is what he calls the "natural sciences" of computation: people who study algorithms and their behavior the way you would study an animal. You devise experiments to collect data and statistics about how one algorithm compares to the other.

The "true hacker", though, is again different because his goal is to create new software. It is not just engineering, but it is a mixture of architecture and engineering, and the hacker ideally combines both these professions in one person.

However, Paul argues that the hacker's endeavor is unscientific at heart, or at least does not fit in well with the usually scientific measures of quality. Often, writing a program is not about reinventing something, or solving really hard problems, but about combining existing techniques in a clever new fashion.

When I look at my field, machine learning, I can easily identify each of these three different school of thought, sometimes even within one paper. For example, you first proving some theorems which nevertheless fail to give a complete answer, and then compare algorithms experimentally.

And sometimes, you end up doing work which you think is cool and valuable, but which just brings together existing techniques in a really nice fashion. People will object that it all has been done before, and you agree that they are right, but still, your specific mixture really adds something to, well, the mixture.

An example is our little programming language pet-project rhabarber. Well, it borrows a lot of LISP, maybe some smalltalk, and a lot from python and ruby in terms of how you would want to work with a language. People instantly ask why we need a new computer language. And yet, I think that the specific mixture is something special, which will lead to a more useful tool than all existing languages.

But I disgress. So if you've finished reading Steve Levy's "Hackers", I recommend reading Paul Grahams "Hackers & Painters" next ;) And if you don't feel like buying the book, Paul has many essays on his homepage.

Tuesday, August 14, 2007

Visual Timeline

I know this is quite old, but it is absolutely hilarious. Some of the "visuals" have the potential to become community memes of their own.

Friday, June 29, 2007

Steve Levy's Hackers

I've recently started to read "Hackers" by Steven Levy. Actually, this is my second attempt. On my first try, I got stuck in all the talk about railroads in the first pages. This time, I directly went to page 30.

And I have to say (while this might reveal what a incurable computer-nerd I actually am), reading this book repeatedly sent shivers down my spine. And although our computers are several orders of magnitude smaller, and more powerful, the basic fascination is still the same.

It is pretty amazing to see how they came up with all these ideas of interactivity, or time-sharing even then, because it just seemed to be the right thing. Otherwise computers might still be some huge machines sitting in some basements with access restricted by several layers of bureaucracy.

Anyway, if you want to know if you're a computer nerd at hard, take a look at this book ;)

Thursday, April 26, 2007

Tool of the Week - Part II: sshfs

Recently I stumbled upon sshfs. And this is really the best thing since sliced bread. sshfs allows you to mount a directory over sftp (of course, under Linux). Since it relies on sftp, you can mount remote directories as soon as you have an ssh access, without the need to set up further servers or using vpn or anything.

Being able to access remote files like normal files means that you can use any program to modify these files. For example, I have a small personal webspace which allows access only via sftp. This made updating quite ugly, because I couldn't just rsync everything to the server. But with sshfs you just mount your webspace and do a "local" rsync between two directories - and you're done. Amazing!

Thursday, March 22, 2007

Gadget Wishlist

How about a small GPS-mouse sized device which bridges from bluetooth to wireless. Think about it. Almost every cell phone or PDA has built-in bluetooth, and a web-browser. But very few come with support for wireless.

Please, somebody build this baby and sell it for around 50€!

Monday, February 12, 2007

Tool of the Week: baobab

I'm amazed at what useful things you can find within gnome. I accidentally discovered this one, when I installed ubuntu on my old laptop the other week. baobab is a tool which summarizes and visualized filesystem usage. So when you're wondering where all those gigabytes have gone, try this tool. It seems that baobab has migrated into gnome-utils, although I find the current gnome-utils homepage less informative than the (obsolete) original one.