Although I came home from last year's NIPS conference more than three weeks ago, I haven't yet found time to summarize my impressions. I've found that it's always like this, first there is the jet-lag, then there is Christmas, New Year.
But maybe it's not just the closeness to the holiday season, I think it's also that NIPS is so content-rich that you really need some time to digest all that information.
Banquet Talk and Mini-Symposia
This year, in particular, because they have managed to cram in even more session into the program. They used to have some invited high-level talk during the banquet in previous years, but this year the organizers have chosen to put two technical talks and virtually 20 poster spot lights during the banquet. Actually, I'm not that sure whether this decision was wise as I and most of my colleagues felt that dinner and technical talks don't go well together. Maybe it was also my jet-lag, as I arrived on Monday afternoon, not on Sunday like some people.
The second addition where the mini-symposia on Thursday afternoon, conflicting with the earlier buses to Whistler. I attended the computational photography mini-symposium and found it very entertaining. The organizers have managed to put together a nice mix of introductory and overview talks. For example, Steve Seitz from the University of Washington had a nice talk on how to reconstruct tourist sites in Rome from pictures collected from flickr. Based on these 3d reconstruction you could go on a virtual tour of famous monuments, or compute closest paths based on where pictures were taken.
So if I had anything to say, I'd suggest to keep the mini-symposia, but replace the technical talks during the banquet by the invited talk as in previous years.
With over 250 presentations, it's really hard to pick out the most interesting ones, and as the pre-proceedings aren't out yet (or at least not that I'm aware of), it's also hard to collect some pointers here.
There was an interesting invited talk by Pascal Van Hentenryck on Online Stochastic Combinatorial Optimization. The quintessence of his approach to optimization in stochastic environments was that often, the reaction of the environment does not depend on you the action you take, so you can build a pretty reliable model for the environment and the optimize against that.
Yves Grandvalet had a paper "Support Vector Machines with a Reject Option", which proposes a formulation of a support vector machine which can also opt to say "I don't know which class this is".
John Langford had a paper which was already a preprint at arxiv.org on sparse online learning which basically has the option to truncate certain weights if the become too small.
Every now and then there was an interesting poster with nobody attending it. For example, Patrice Bertail had a poster on "Bootstrapping the ROC Curve" which looked interesting and highly technical, but we could find nobody. At some point I started to discuss the poster with a colleague, but we had to move away from the poster after people started to cluster around us as if one of us were actually Patrice.
Michalis Titsias had an extension of Gaussian Processes in his paper "Efficient Sampling for Gaussian Process Inference using Control Variables" to the case where the model is not just additive random noise but actually depends non-linearly on the function where the Gaussian process is on. It looked pretty complicated, but it might be good to know that such a thing exists.
There were many more interesting papers, of course, but let me just list one more: "Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models" by Tong Zhang seemed like a simple method which combines feature addition with removal steps and comes with a proof (of course). I guess similar schemes exist in dozens, but this one seemed quite interesting to try out.
The question I always try to answer is whether there are some big new developments. A few years ago, everybody suddenly seemed to do Dirichlet processes and some variant of eating place. Last year (as I have been told), everybody was into deep networks. But often, I found it very hard, and this year was also one of those. Definitely no deep learning, maybe some multiple kernel learning. There were a few papers on which try to include some sort of feature extraction or construction into the learning process in a principled manner, but such approaches are (necessarily?) often quite application specific.
I also began to wonder whether a multi-track setup wouldn't be better for NIPS. This question has been discussed every now and then, always in favor of keeping the conference single-track. I think one should keep in mind that what unites machine learning as a community are new methods, because the applications are quite divers, and often very specific. For a bioinformatics guy, a talk on computer vision might not be very interesting, unless there is some generic method which is application-agnostic to a certain degree.
It seems that currently, most of the generic methods are sufficiently well researched, and people now start to think about how to incorporate automatic learning of features and preprocessing into their methods. As I said above, such methods are often a bit ad-hoc and application specific. I'm not saying that this is bad. I think one first has to try out some simple things before you can find more abstract principles which might be more widely applicable.
So maybe having a multi-track NIPS would mean that you can listen more selectively to talks which are relevant to your area of research and the list of talks wouldn't appear to be somewhat arbitrary. On the other hand, you might become even more unaware of what other people are doing. Of course, I don't know the answer, but my feeling was that NIPS is slowly approaching a size and density of presentations that something has to change to optimize the information flow between presenters and attendees.
I've heard that some people come to NIPS only for the workshops, and I have to admit that I really like them a lot, too. Sessions are more focused topic-wise, and the smaller size of the audience invites some real interaction. Whereas I sometimes get the impression that the main conference is mostly for big-shots to meet over coffee-breaks and during poster sessions, it's in the conferences where they participate in the discussion.
We had our own workshop on machine learning and open source software which I have summarized elsewhere.
I attended the multiple kernel learning workshop which really was very interesting, because most of the speakers concluded that in most cases, multiple kernel learning does not work significantly better than a uniform average of kernels. For example, William Stafford Noble reported that he had a paper with multiple kernel learning for the Bioinformatics journal, and only afterwards decided to check whether unoptimized weights would have worked as well. He was quite surprised when the differences where statistically insignificant and concluded that he wouldn't have written the paper in that way had he known the results before.
Francis Bach also gave a very entertaining talk where he presented Olivier Chapelle's work, who couldn't attend. He did a very good job, including comments like "So on the y-axis we have the relative duality gap - I have no idea what that is", and raising his hand after his talk to have the first question.
All in all, I think this workshop was quite interesting and exiting and also important for the whole field of multiple kernel learning, basically, to see that it doesn't just work, and to try to understand better when it doesn't give the improvements hoped for and why.
Finally, many workshops were taped by videolectures.net. I've collected the links here:
- Learning from Multiple Sources
- Kernel Learning - Automatic Selection of Optimal Kernels
- Algebraic and Combinatorial Methods in Machine Learning
- Structured Input - Structured Output
- Machine Learning Open Source Software
- Machine Learning in Computational Biology
- Causality: Objectives and Assessment
- New Challenges in Theoretical Machine Learning: Learning with Data-dependent Concept Spaces
- Optimization for Machine Learning
- Beyond Search: Computational Intelligence for the Web