Tuesday, November 25, 2008

Java Integration in JRuby

Update: See also the updated entry in the JRuby wiki here.

JRuby has some very nice features to make accessing Java easy, but unfortunately, documentation about these features is a bit hard to come by. Existing documentation is sometimes a bit out-dated, or doesn't cover all features. The ultimate documentation is of course the source code itself, in particular rspec code to be found in spec/java_integration in the jruby-trunk, but reading source code is maybe not the first choice for everybody when trying to learn a new feature.

Fortunately, Java integration in JRuby is actually pretty good and useful. JRuby does quite a few tricks to make Java classes feel more Ruby-like, for example, by translating between javaNamingSchemes and ruby_naming_schemes, or by automagically converting blocks to implementations of the appropriate Java interfaces, to the effect that you can pass a block to a Java method whenever it expects a Runnable, for example.

I personally think there is great potential in this kind of bilingual development approach which uses Java for all the computationally expensive details and Ruby for the dynamic high-level plumbing. Compared with the Ruby/C alternative, probably based on tools like Swig for the plumbing, JRuby/Java is much easier and cleaner. For example, you can easily deal with Java exceptions on the Ruby side, and even if you don't you get a call stack with Ruby and Java classes happily mixed when an error occurs.

So I hope that the following is both helpful and correct enough to be of use for others.

The Basics

The first thing to do is to require 'java'. I'm not sure if this is strictly necessary as many things also work without that, but I guess it's safe to do this for the future.

The first question is of course how to access Java classes.

There seem to be several different alternatives for naming a Java class. The most "ruby-esque" way looks like this: java.lang.System becomes Java::JavaLang::System. The prefix is always Java::, followed by the package names with dots turned into CamlCase notation and finally, the class itself.

If the top-level package is java, javax, com, or org, you can also access it just like in Java, for example, by typing java.lang.System. If you want to have this functionality for your own packages as well, because they start (for example) with edu, you can use the following code taken from path-to-jruby-installation/site_ruby/1.8/builtin/javasupport/core_ext/kernel.rb:
def edu
JavaUtilities.get_package_module_dot_format('edu')
end
Or, which is even simpler
def edu
Java::Edu
end
Java's primitive types can be found directly in the Java module: For example, long is Java::long (but not java.long). Actually, I cannot think of too much you could do with these objects. But you need them when you want to create primitive arrays. For example, Java::long[10].new creates a new array of long integer values.

Loading jar-files

If you want to access classes beyond Java's own runtime, you have to load the jar file in JRuby. One way to do this is to explicitly require the jar-file. You can supply a full path, or you can just pass the jar file which is then searched according to the RUBYLIB environment variable.

Alternatively, you can put the jar-file into your CLASSPATH. Then, you can directly access your classes without an explicit require.

Calling Java Methods

You can then access the methods just as in Java, for example
java.lang.System.currentTimeMillis
but JRuby also provides transformed method names to make them more similar to Ruby naming conventions. The typical CamelCase is converted to underscores. The call above can therefore also be written as
java.lang.System.current_time_millis
Moreover, JRuby maps the bean conventions as follows
obj.getSomething()        becomes   obj.something
obj.setSomething(x) becomes obj.something = x
obj.isSomething() becomes obj.something?

When accessing methods, or constants, you have to follow the usual Ruby conventions. That is, constants have to be accessed by Class::Constant (for example, JFrame::EXIT_ON_CLOSE), and it seems you cannot access member variables of objects in Ruby outside of the object's methods, even if it is declared public in Java.

For each java object there are number of methods defined, but most of them aren't documented anywhere. I think the more useful ones are java_class (because the plain class only returns some JRuby-side proxy object) and java_kind_of?, which I assume is like instanceof for Java objects.

Conversion of Parameters

When calling a Java method, JRuby goes through some pain of mapping JRuby types to Java types. For example, if necessary, a Fixnums is also converted to a double, and so on. If this fails and JRuby does not find a corresponding method, it prints an error message along the lines of "no foo with arguments matching [class org.jrub.RubyObject] on object #<Java::...>". Actually, I think JRuby could be a bit more helpful here and tell you what class the ruby object was. In any case, when this happens it means that the types you wanted to pass couldn't be converted.

JRuby does not automatically try to convert arrays to Java arrays. You have to use the to_java method for that. By default, this constructs Object[] arrays. if you want to construct arrays with a given primitive type, you pass an argument to to_java which can either be a symbol like :long or :double, or a class. For example,
[1,2,3].to_java              becomes an Object[]
[1,2,3].to_java :long becomes long[]
[1,2,3].to_java(Java::long) same thing


Importing Classes and Including Packages

Typing the full name quickly gets old, so you can import classes into modules, and at the top-level.

You might run into slight difficulties with how to name a class. These functions actually expect a string, but you can also pass the Java-style fully qualified class name (that is, with package), if the package starts with java, javax, or one of the other default package names. However, currently (at least as of 1.1.5) it seems that you cannot use the Java::... notation if the top-level package isn't one of those for which you could have used the Java-style name anyway. So I guess it's save to go with the strings in that cases.

This means that you can do
import java.lang.String
but not
import Java::EduSomeuniversity::SomeClass
Instead, you should type
import 'edu.someuniversity.SomeClass'
or do the "def edu; Java::Edu; end" hack and then "import edu.someunversity...".

While you can import classes into modules and at top-level, you can include whole packages into classes or modules. For example,
module E
include_package javax.swing
end
Makes all of Swing available in E and also as, for example, E::JFrame. At least it used to be like this, currently (in 1.1.5) E::JFrame gives an odd "wrong number of arguments (1 for 0)" error... .

Currently, there is no way to import whole packages at the top-level (or at least none I'm aware of). Taking the short-cut through a module and then including the module at top-level currently also doesn't work. You have to access the class first through the module before you can use it a top-level.

Implementing Java Interfaces and Adding Syntactic Sugar to Java Classes in Ruby

Classes are always open in Ruby. This means that you can add methods later just be re-opening the class again with a class Name ... end statement. This is a nice way to add, for example, syntactic sugar to your Java classes like operators, or other methods like each to make your classes behave more Ruby-like.

Of course, these additions are only on the Ruby side, and not visible from the Java side. You have to remember that Java is a statically typed language and these run-time additions aren't visible. Still I found this feature to be tremendously useful for making Java classes more usable on the JRuby side.

What is possible, though, is to implement a Java interface with a JRuby class and pass that new class back to Java. The preferred way to do this is by including the interface in the class definition:
class MyLittleThread
include java.lang.Runnable

def run
10.times { puts "hello!"; sleep 1 }
end
end
Then you can start this thread with
java.lang.Thread.new(MyLittleThread.new).run


If the interface has only a single method, you can even pass a block which then gets automatically converted:
java.lang.Thread.new { 10.times { puts "hello!"; sleep 1} }.start


I think this was not possible in earlier versions, but you can also subclass a Java class and pass it back to Java code expecting the super class. This is actually pretty cool, as it means that you can freely intertwine Ruby and Java code.

Collections, Regexs, and so on.

JRuby already adds some syntactic sugar on the JRuby side for Java collections, regular expressions and so on, to make them more Ruby-like. For example, Enumerable is mixed in into java.util.Collection. In addition, JRuby also defines some additional methods like <<, or +. For all the details, have a look at the files in JRUBY_DIR/lib/ruby/site_ruby/1.8/builtin/java/.

Exceptions

There isn't much to say here, only that it's possible to catch Java exceptions using Ruby's begin ... rescue ... end construction.

Summary

We've seen that there are some very nice features in the JRuby/Java integration, namely coercing some basic types, automatic conversion between Java and JRuby naming schemes, automatic translation of Bean patterns. You can even implement Java interfaces, providing callbacks to Java code in JRuby code.

Some things are maybe missing (or I haven't figured them out yet) like a way to import whole packages at the top-level. Sometimes I wished that the conversion between Ruby and Java objects would also be more configurable. But given the enormous speed at which JRuby is progressing, I'm sure that these oddities will be smoothed out soon.

Monday, November 17, 2008

On Relevant Dimensions in Kernel Feature Spaces

I finally found some time to write a short overview article about our latest JMLR paper. It discusses some interesting insights into the dimensionality of your data in a kernel feature space when you only consider the information relevant to your supervised learning problem. The paper also presents a method for estimating this dimensionality for a given kernel and data set.

The overview also contains a few lines of matlab code with which you can have a look at the discussed effect yourself. It all boils down to pictures like these:



What you see here is the contribution of individual kernel PCA components to the Y samples, divided by the smooth part (red), and the noise (blue), on a toy data set, of course. Kernel PCA components are sorted by decreasing variance. What you can see is that the smooth part is contained in the leading kernel PCA components, while the later components only contain information relevant for the noise. This means that even in infinite-dimensional feature spaces, the actual information is contained in a low-dimensional feature space.

If you are looking for more information, have a look at the overview article, the paper, or a video lecture I gave together with Klaus-Robert Müller in Eindhoven last year. There is also some software available.