Tuesday, September 09, 2008
Why are people following me on twitter?
A few months ago I created an account on twitter. Part of me justed wanted to try out the newest Web 2.0 thing everybody's crazy about, but I also got myself convinced to perceive a certain need as I was going to a wedding without my family and thought that this way I could keep them up to date on my whereabouts.
So basically, I posted a few tweets for about one weekend in German, and that was more or less it.
The funny thing is that after that weekend I got 3 followers, most of which didn't even speak German. By now, I have 10 followers, and I really don't think I deserve them. So why are the people subscribing to my feed?
Well, one person I know personally, and a few seem to follow me as I stated in my profile that I'm working on machine learning, but that still leaves about 5 people, and frankly, I don't even know how they even found my feed.
Anyway, I also haven't really yet understood what twitter could do for me. I'm not saying that it doesn't make sense at all. For example, I'm following Charles Nutter, one of the main guys working on jruby, and I found his tweets to be a nice way to track what he is doing and what he is working on.
In my case, however, it doesn't really work. I'm involved in so many things that people would get seriously confused if I wrote down every little bit (writing proposal/discussing with students/thinking about world-domination (muahahah)/reviewing a paper/fixing cron jobs). I could tweet about my research, but I'm not even sure if it would be wise if I told everybody what I'm working on, because either it doesn't work out, and then it could be kinda embarassing, or it actually works, and then I'm just giving other people ideas what to look into.
Lately, I've had kind of an insight: the penalty for subscribing to a low-volume twitter is quite small (apart from you loosing track of what the heck you're subscribed to). Some people have like ten thousand subscriptions. But if most of them don't post anything useful, everything's fine. And if you subscribe to somebody who posts a lot but you lose interest, you can get rid of him easily. So maybe everything's making sense.
Well, I'll be attending this years NIPS conference in December. An excellent opportunity to try twitter again ;)
Monday, September 08, 2008
NIPS outcome and MLOSS Workshop
Well, the NIPS results are out. If you don't know it, it is one of the largest (maybe the largest) conferences in machine learning held each year in early December, and they have just sent around which papers are accepted and which are not on Saturday.
Unfortunately, none of my papers made, although one got quite close. On the other hand, I'm very glad to announce that our workshop on machine learning open source software has been accepted. This we be the second (actually third) installment: In 2005, the workshop was not included into the program, but many people found the issue important enough to come to Vancouver a day earlier and take part in a "Satellite Workshop".
In 2006 we were accepted and actually had a very nice day in Whistler. When I noticed that I was personally enjoying the workshop, I knew that we had managed to put together a nice program. Maybe the highlight was the final discussion session with Fernando Pereira stating that there is little incentive for researchers to work on software because there is no measurable merit in doing so. Eventually, this discussion lead to a position paper and finally to a special track on machine learning software at the Journal of Machine Learning Research.
I'm looking forward to this years workshop, and hope that it will be equally interesting and productive!
Labels:
/Projects/mloss,
/Research,
machine learning
Thursday, September 04, 2008
New Paper Out
I'm very happy to announce that my latest paper just came out at the Journal of Machine Learning research. Actually, it is the second half of work which developed out of my Ph.D. thesis. The other paper came out almost two years go, which tells you a bit about how long reviews can take if you're unlucky.
Anyway, from a theoretical point of view, the first paper studies eigenvalues of the kernel matrix while the second one studies eigenvectors, and both derive approximation bounds between the eigenvalues and the eigenvalues you'd get as the number of training points tends to infinity. These bounds have the important property that they scale with the eigenvalue under consideration. You don't have one bound for all eigenvalues, but instead a bound which becomes smaller as the eigenvalue becomes smaller. This means that you won't have the same bound on eigenvalues of the order of 10^3 and 10^-6, but actually, the error will be much smaller on the smaller eigenvalue. If you run numerical simulations, you will immediately see this, but actually proving this was a bit harder.
What this practically means is a different story, and I personally think it is actually quite interesting (of course :)): If you have some learning problem (a supervised one), and you take a kernel and start to train you support vector machine, then these results tell you that even if the kernel feature space might be very high-dimensional, the important information is contained in a low-dimensional subspace, which also happens to be spanned by the leading kernel PCA components.
In other words, when you choose the right kernel, you can do a PCA in feature space, and just consider a space spanned by the directions having largest variance. In essence, if you choose the correct kernel, you are dealing with a learning problem in a finite-dimensional space, which also explains why these methods work so well.
The "old" story was that you're using hypothesis classes which have finite VC dimension, and then everything's fine. This is still true, of course, but these new results also show why you can expect low-complexity hypothesis classes to work at all: Because a good kernel transforms the data such that the important information is contained in a low-complexity subspace of the feature space.
So I hope I could make you a bit interested in the paper. I'm trying to put together a bit more of an overview on my home page, so make sure to check that out as well in a week from now.
Anyway, from a theoretical point of view, the first paper studies eigenvalues of the kernel matrix while the second one studies eigenvectors, and both derive approximation bounds between the eigenvalues and the eigenvalues you'd get as the number of training points tends to infinity. These bounds have the important property that they scale with the eigenvalue under consideration. You don't have one bound for all eigenvalues, but instead a bound which becomes smaller as the eigenvalue becomes smaller. This means that you won't have the same bound on eigenvalues of the order of 10^3 and 10^-6, but actually, the error will be much smaller on the smaller eigenvalue. If you run numerical simulations, you will immediately see this, but actually proving this was a bit harder.
What this practically means is a different story, and I personally think it is actually quite interesting (of course :)): If you have some learning problem (a supervised one), and you take a kernel and start to train you support vector machine, then these results tell you that even if the kernel feature space might be very high-dimensional, the important information is contained in a low-dimensional subspace, which also happens to be spanned by the leading kernel PCA components.
In other words, when you choose the right kernel, you can do a PCA in feature space, and just consider a space spanned by the directions having largest variance. In essence, if you choose the correct kernel, you are dealing with a learning problem in a finite-dimensional space, which also explains why these methods work so well.
The "old" story was that you're using hypothesis classes which have finite VC dimension, and then everything's fine. This is still true, of course, but these new results also show why you can expect low-complexity hypothesis classes to work at all: Because a good kernel transforms the data such that the important information is contained in a low-complexity subspace of the feature space.
So I hope I could make you a bit interested in the paper. I'm trying to put together a bit more of an overview on my home page, so make sure to check that out as well in a week from now.
Wednesday, September 03, 2008
Google Chrome
Well, I decided not to blog about Google Chrome. We can't all be non-conformists, right?
I mean, It's Only A Browser™, after all.
Then again, I wonder how fast that V8 Javascript machine really is... .
If you're still interested in an not entirely glorifying post, you might want to check out the post by JRuby's Charles Nutter.
And now excuse me while I go shopping.
I mean, It's Only A Browser™, after all.
Then again, I wonder how fast that V8 Javascript machine really is... .
If you're still interested in an not entirely glorifying post, you might want to check out the post by JRuby's Charles Nutter.
And now excuse me while I go shopping.
Subscribe to:
Posts (Atom)