Monday, August 25, 2008

Benchmarking javac vs. ecj on Array Access

One of the projects I'm currently working on is a fast linear algebra matrix library for Java. I know that there already exist some alternatives, but I usually found that they were either no longer actively maintained or their performance was much slower than what, for example, matlab would give you.

So in principle, all I wanted is a matrix library which directly interfaces with some fast BLAS or LAPACK library, for example ATLAS, but hides all of the Fortran greatness behind a nice object-oriented wrapper. Turns out that something like that does not seem to exist, so we started to work on our own little library called jBLAS which is going to be released soon.

Anyway, while working on that library, I got a bit worried about the performance of the stuff I'm doing in Java. These are mainly element-wise operations like multiplying all elements of a matrix with the corresponding elements of another matrix.

Since we're interfacing to C (actually Fortran) code a lot, all of the matrix data is stored in direct buffers, in particular java.nio.DoubleBuffer, and I wondered how their put and get access methods compared to array access and of course to C. So I wrote a little program in eclipse which did just that and compared the outcome with what I got in C, and to my suprise, Java was very much on par, and the performance differences between accessing arrays and direct buffers was very small.

But when I tried to repeat the results at home, I got very different numbers, both for arrays and for direct buffers, which suddenly took five times longer than arrays. At first I suspected that the different processor maker was the issue, but after some more tweaking I found out that again to my great surprise, the issue was the compiler.

It turns out that the eclipse compiler ecj is much better at compiling my benchmark than Sun's own javac. To see this for yourself, here is my benchmark:
import java.nio.*;

class TicToc {
private long savedTime;

public void tic(String msg) {
System.out.print(msg);
System.out.flush();

saveTime();
}

public void tic(String fmt, Object... args) {
System.out.printf(fmt, args);

saveTime();
}

private void saveTime() {
savedTime = System.currentTimeMillis();
}

public void toc() {
long elapsedTime = System.currentTimeMillis() - savedTime;

System.out.printf(" (%.3fs)\n", elapsedTime / 1000.0);
}
}


public class Main {
private static void copyArray(int size, int iters) {
double[] source = new double[size];
double[] target = new double[size];
for (int i = 0; i < iters; i++)
for (int j = 0; j < size; j++)
target[j] = source[j];
}

private static DoubleBuffer createBuffer(int size, ByteOrder bo) {
return ByteBuffer.allocateDirect(Double.SIZE / Byte.SIZE * size)
.order(bo).asDoubleBuffer();
}

private static void copyDoubleBuffer(int size, int iters, ByteOrder bo) {
DoubleBuffer source = createBuffer(size, bo);
DoubleBuffer target = createBuffer(size, bo);
for (int i = 0; i < iters; i++)
for (int j = 0; j < size; j++)
target.put(j, source.get(j));
}

public static void main(String[] args) {
TicToc t = new TicToc();

final int SIZE = 1000;
final int ITERS = 1000000;

t.tic("copying array of size %d %d times...", SIZE, ITERS);
copyArray(SIZE, ITERS);
t.toc();

t.tic("copying DoubleBuffer of size %d %d times... (LITTLE_ENDIAN)", SIZE, ITERS);
copyDoubleBuffer(SIZE, ITERS, ByteOrder.LITTLE_ENDIAN);
t.toc();
}
}

So let's compare the run-times of these programs using the latest Java JDK 6 update 7 and the stand-alone eclipse compiler 3.4.

Then, I compile the source with
javac Main.java
and run it with java -cp . Main:
copying array of size 1000 1000000 times... (2.978s)
copying DoubleBuffer of size 1000 1000000 times... (LITTLE_ENDIAN) (4.291s)
Now the same with the eclipse compiler: I compile it with
java -jar ..path_to_jar../ecj-3.4.jar -1.6 Main.java
and get the following results:
copying array of size 1000 1000000 times... (0.547s)
copying DoubleBuffer of size 1000 1000000 times... (LITTLE_ENDIAN) (0.633s)
Note that both files only differ in the bytecode, but the eclipse compiler manages to produce code which is about 6 times faster, and which is only slightly slower for the direct buffer.

I'll have to look into this quite some more, but if this is true, this is pretty amazing!

Thursday, August 21, 2008

My keychain Tool

I wrote a little script in ruby which manages passwords. It is a strictly command line tool and allows you to store keys, retrieve them, and even generate more or less pronounceable passwords of arbitrary length using a Markov chain. All the data is stored in a blowfish-encrypted file and should therefore be somewhat safe.

Since the script depends on two other ruby gems (namely crypt and highline), I've uploaded it to rubyforge. You can now simply install the script with
  gem install keychain

The interface is quite easy, you can either pass the command as first argument, or invoke keychain which then enters an interactive mode. Simple try
   keychain help
to see a list of commands.

Needless to say, I cannot guarantee that this tool will not one day forget all your valuable passwords, but at least I can assure you that I'm personally trusting my code. :)

Wednesday, August 20, 2008

Nokia E61i

I own a Nokia E61i since last December and all in all I must say that it has been quite an agreeable experience. It is solidly build, has a small but usable keyboard, real multitasking, a large screen and a really nice browser which even support JavaScript. Recently I found out that it supports the SIP standard for Voice Over IP which integrates very nicely with many "open" commercial offerings. Using a software called JoikuSpot even turns your cell phone into a WiFi HotSpot, giving you almost config-free access to the Internet wherever you are (okay, only for http in the free version).

Maybe the weakest component (and incidentally one of the most important) is the messenger, and I like to elaborate a bit on the small weaknesses and nuisances of that piece of software. Nokia, if you're reading this, I politely ask you to consider fixing these things in the next update! Just do it. Pleeease!

Robustness

Generally, the phone is quite robust software-wise. It almost never locks up, and even if it does, you can usually still kill the program easily (How? Push the "Swirl" button for a few seconds, scroll down to the program and press the backspace key). However if there is one program which hands more often than others, it's the messenger.

Now, you would expect that having unstable connections is really part of the life of a cell phone. After all, it's a mobile device which means that it's position is constantly changing and reception may be quite unstable. Still, whenever the phone tries to connect but really can't, it decides that it's safer to turn off the automatic retrieval. I wonder if just waiting for few minutes wouldn't be a more reasonable alternative.

So if you can connect but something happens during the connect, the mail software also really likes to die in a number of less obvious ways. Sometimes it claims that it is still connected to the mailbox, but actually it isn't. If you start to read emails, they vanish. If you manually close the connection is closed, it sometimes helps but often it doesn't.

All in all, I know that IMAP is a complex protocol, but pleeease couldn't you try to make the error handling a bit more reasonable?

Certificates

The other big nuisance is that automatic retrieval only works when you have the certificate of the server. Now if you're working for a big company (which might be using Exchange anyway), which has paid enough money to get decently trusted certificates, everything's okay, but when you're working at a university, or try to sync emails from your own server, you're out of luck.

The situation might not be entirely bad if there would be some way to install missing certificates easily. For example in the dialog for untrusted certificates. I wonder why they chose not to allow that... .

So how do you get the certificate on the phone? First of all, you have to obtain the certificate. If you're lucky, there is some web page where you can download the certificate. Now you only have to pray that the web server is setting the correct mime type and everything's good.

If not, you have to get the certificate. After some trial-and-error I found that the Internet Explorer (but not Firefox or Thunderbird) allow to safe a certificate to disc. So again if you're lucky, the same certificate is used for secure web connections and you can safe that certificate.

Well, so how do you get the certificate on the phone? Putting the certificate somewhere on the phone using the file explorer won't work, because the phone doesn't recognize certificates in it's own file browser. So either you've to mail it to you (saying one more time that yes, I want to connect to that server), again hoping that the mime types will be set correctly, or you have to put it somewhere on the web from where you can then download it.

To cut a long story short: Nokia, please let us install certificates whenever we encounter an untrusted one.

Hyperlinks

Finally, one last nuisance is that (at least on the E61i, other versions of Symbian might have only one browser for Internet and WAP) if you click on a link in a message, the WAP browser is opened, not the Internet browser. Which really doesn't make any sense.

Nokia, will you help me out?

Update: If you're running Linux, you might try the following "one-liner" to get the certificate from a server
echo "quit" | openssl s_client -connect server-address:server-port |\
ruby -e 'flag=false; readlines.each {|x| flag ||= (x =~
/BEGIN CERT/); if flag then puts x end; flag &&= !(x =~ /END CERT/) }' |\
openssl x509 -outform DER >cert.der
Replacing "server-address:server-port" with your server information, of course.

Update: Even if you install the server certificate on the phone, it may still be that the signing certificate is unknown. Using the Internet Explorer, you can save all those other certificates as well, and you have to install them all to make the phone accept the server certificate. I haven't yet figured out how to do this with openssl, though... .

Wednesday, August 06, 2008

Static MVC

Update Feb 4, 2009: I recently learned about Jekyll, which puts these ideas into a solid framework.

I know that web applications and content management systems are the hot thing right now (and maybe will be for some time), but sometimes you wonder whether a few static html pages wouldn't be sufficient anyway. The advantages are pretty clear: No database back-end, no tweaking of apache configurations, and of course, nothing beats static html pages in terms of speed and memory requirements.

On the other hand, editing html pages by hand seems so "old-school", and there are some real maintenance issues. Appearance can be nicely controlled by CSS, but if you actually want to move bits of information around, you have to do it yourself. And let's just hope that you haven't gotten into adding some nice Javascript effects to your pages... .

This approximately describes the situation I was in when I decided to restructure my own homepage.

It turns out that it is actually possible to write a page generator along the principles used in web frameworks such as rails and end up with something quite elegant. And it's fun, too!

Some of the power of a framework like rails comes from a clear separation between data, business logic, and presentation. This kind of separation is also called the MVC-Pattern (Model-View-Controller). All the data is typically stored in a database like MySQL, and the presentation is encoded in template files.

So how can we duplicate this kind of separation with a static page-generator? Luckily, ruby comes with all the tools you need. Let us walk through a little example, the inevitable blog (without commenting functionality, of course).

Let's start with the database. We will just store all the information into YAML files. YAML is a file format which puts special emphasis on being easy to read and maintain by a human. The format is pretty self-explanatory:
- date: August 4, 2008
title: Same old, same old
text: |
Sometimes, you just wake up with that feeling that something
great is happening today. I couldn't really lay my finger on
it, but I was very sure that...

- date: August 6, 2008
title: You wouldn't believe
text: |
You know when I said that something great would happen two
days ago? And you know what...

This file contains two entries, encoded as hashes. The "|" indicates indented multi-line data.

Using the YAML library, you can easily load this data with
require 'yaml'

blog = YAML::load_file('blog.yaml')

And if you inspect blog you will find that is an array of two hashes.

Next, we need to render the result. We'll use the erb library, which you might know from rails. Here is a possible template file
<html>
<head>
<title>My little static blog</title>
</head>
<body>
<% for b in blog %>
<h1><%= b['title'] %></h1>

<p><em><%= b['date'] %></em></p>

<p><%= b['text'] %></p>
<% end %>
</body>
</html>
(Almost looks like were coding in rails, right? :))

Let's finally put all things together. We store the YAML file in blog.yaml and the template file in blog.rhtml. Then, the following script will generate blog.html:
require 'yaml'
require 'erb'

blog = YAML::load_file('blog.yaml')

blog_template = open('blog.rhtml').read
html = ERB.new(blog_template).result

open('blog.html', 'w') {|o| o.write html}

And here is the result:


This way, you can easily add new posts by editing just the YAML file, and fiddle with the presentation in the template file. You're not even restricted to a fixed number of pages, you could, for example, also generate a separate (static) page for each individual post (dynamically) by the script.

And it doesn't stop here, adding a lightweight markup engine like BlueCloth you can even start using wiki-style markup in your texts.

So, it doesn't always have to be a full fledged web application framework, you can have similar flexibility (minus the interactivity, of course), using a few readily available tools, and very few lines of code as well.

Monday, August 04, 2008

Quo vadis?

Actually, I've been thinking a bit about what to do with this blog. So far, I've even kept it sort of semi-private by not linking from my work homepage to it, maybe because I didn't want to have to worry if it is official enough, or if it reflects my scientific interest well enough. Consequently (and maybe because I don't have anything interesting to say), almost nobody is reading my blog and even friends are sometimes surprised that I have a blog.

I've thought about what I could do to make my blog more relevant and also more interesting, you know, concentrate on a few topics to give people a better idea of what to expect, and maybe also to give them a reason to come back and actually follow the blog.

Maybe the most interesting revelation was that the topics I cover in my blog are quite different from my scientific work, which is often quite heavy on the theory, and less concerned with the things I apparently like to blog about: programming languages, nice tools, and the occasional insight into some technology twist.

So I guess I'll try to accept these different aspects of my interests, and refrain from attempts to streamline my web presence.

This may be somewhat unrelated, but I also chose to rename the blog from "Trunk of Heroes" (whatever that was supposed to mean) to "Marginally Interesting" which is such a nice long phrase which also conveys only little information. At least, now I can say funny things like "My blog is Marginally Interesting" :)

Anyhow, the semester is over which means that I'll have more time doing some research and - of course - learning some exciting new piece of technology.

JRuby 1.1.3 and Jython 2.5

Just for the record, jruby 1.1.3 has been released. Startup time is again down a bit but not overly much so. All in all, I think they are doing a terrific job. On a related note, jython has also released an alpha version of jython which is going to be compatible with python 2.5. The last bigger release of jython is already a bit old and was compatible only to python 2.2.

On the other hand, I find it harder and harder to choose between the two languages. Somehow, they seem to fill in almost the same spot, and it is only a question of community if you're more into ruby or python. In any case, if you're looking for some nice integration with java, you're going to have both alternatives soon, which is a good thing, I guess.