Hi there, finally after visiting some project partners in Finland and attending this years NIPS conference, I'm finally back.
I'd like to finish of this year (or at least the pre-Christmas period) with pointers to two interesting pages which are not all that new (but interesting...):
Clay Shirky has some nice articles on semantic web and the futility of building an ontologoy which fits all applications. As I said these articles are not new but contain many nice examples to illustrate why such a top-down approach might not be a good idea.
Then, being a scientist myself and therefore depending on getting my work published, the existing peer review process is always a topic worth discussion. It seems that the nature magazine has recently picked up on that topic.
Merry Christmas!
Thursday, December 21, 2006
Friday, November 17, 2006
Christmas Wish List - Part I
Dear Santa,
please convince the powers that be to built a laptop which has hardware supported suspend-to-disk, only that it doesn't use the hard disk but dedicated Flash-RAM. (C'mon how much could that cost?)
And by the way, write an OS which does not behave as if it actually has been woken up from a deep sleep phase. ("Wha...? No, I'm fine... No, really... Sorry, what did you say?")
-Mikio
PS: Okay, ditch that... . I fear flash-ram is just too slow... :(
please convince the powers that be to built a laptop which has hardware supported suspend-to-disk, only that it doesn't use the hard disk but dedicated Flash-RAM. (C'mon how much could that cost?)
And by the way, write an OS which does not behave as if it actually has been woken up from a deep sleep phase. ("Wha...? No, I'm fine... No, really... Sorry, what did you say?")
-Mikio
PS: Okay, ditch that... . I fear flash-ram is just too slow... :(
Tuesday, November 14, 2006
Fast Matrix-matrix multiplication: Kazushige Goto
If you ever have dealt with anything related to linear algebra, you most probably know that the matrix-matrix multiplication is the most basic operation of it all. Matrix-vector multiplication is just a special case, and almost anything else, from solving linear equations to computing eigenvalues and eigenvectors depends on matrix-matrix multiplication as their basic building blocks.
Well fortunately, matrix-matrix multiplication is quite easy to implement. In C its really just three for-loops
(If we omit the zeroing of the initial matrix)
Of course, the above code sucks, I even haven't used any pointers, nor declared variables as register. I mean, we are talking about C here, not one of those shiny new programming languages which want to get between us and the machine!!
Well, the truth is that all of your die hard C optimizing won't help in any way, and compared to implementations such as those of ATLAS, your code is slower by a factor of 10 or more.
The reason is that the above code is very suboptimal (to put it nicely) in terms of locality. If the matrix is too large for the L1 cache, or even the L2 cache, you will constantly have to fetch your data from the (still) slow main memory.
So what any decent matrix-matrix multiplication algorithm does is that it subdivides the matrix into patches which fit into the caches and then handles these sub-matrices one after the other. The good news is that this is absolutely no problem for matrix-matrix multiplication. If you look at the formula, you just have to cumulate certain products to compute one entry of the result matrix, but the order of how you enumerate these products is completely up to you.
This is what ATLAS and every other linear algebra library do, and the results are quite impressive. They manage to keep the pipeline of the processor so full that they come close to the theoretical limit, which amounts to 1.5~2 double floating point operations per clock cycle (!). The "AT" in ATLAS stands for "Automatically Tuned", which means that ATLAS experimentally estimates the size of the caches, and also the optimal order floating point operations.
But caring for the L1 and L2 caches is not enough. Kazushige Goto of University of Texas in Austin has written the currently fastest implementation of the basic linear algebra routines, and he has optimized them by hand.
He says that optimizing for the caches in nice, but you really have to take care of the Translation Lookaside Buffer. If your course on processor technology has been some time ago, the TLB is the cache for the page descriptors necessary to resolve virtual memory lookups.
Now that's what I call commitment ;)
Well fortunately, matrix-matrix multiplication is quite easy to implement. In C its really just three for-loops
mmmult(double **a, double **b, double **c,
int n, int m, int l) {
for(int i = 0; i < n; i++)
for(int j = 0; j < m; j++)
for(int k = 0; k < l; k++)
a[i][j] += b[i][k] * c[k][j]
}
(If we omit the zeroing of the initial matrix)
Of course, the above code sucks, I even haven't used any pointers, nor declared variables as register. I mean, we are talking about C here, not one of those shiny new programming languages which want to get between us and the machine!!
Well, the truth is that all of your die hard C optimizing won't help in any way, and compared to implementations such as those of ATLAS, your code is slower by a factor of 10 or more.
The reason is that the above code is very suboptimal (to put it nicely) in terms of locality. If the matrix is too large for the L1 cache, or even the L2 cache, you will constantly have to fetch your data from the (still) slow main memory.
So what any decent matrix-matrix multiplication algorithm does is that it subdivides the matrix into patches which fit into the caches and then handles these sub-matrices one after the other. The good news is that this is absolutely no problem for matrix-matrix multiplication. If you look at the formula, you just have to cumulate certain products to compute one entry of the result matrix, but the order of how you enumerate these products is completely up to you.
This is what ATLAS and every other linear algebra library do, and the results are quite impressive. They manage to keep the pipeline of the processor so full that they come close to the theoretical limit, which amounts to 1.5~2 double floating point operations per clock cycle (!). The "AT" in ATLAS stands for "Automatically Tuned", which means that ATLAS experimentally estimates the size of the caches, and also the optimal order floating point operations.
But caring for the L1 and L2 caches is not enough. Kazushige Goto of University of Texas in Austin has written the currently fastest implementation of the basic linear algebra routines, and he has optimized them by hand.
He says that optimizing for the caches in nice, but you really have to take care of the Translation Lookaside Buffer. If your course on processor technology has been some time ago, the TLB is the cache for the page descriptors necessary to resolve virtual memory lookups.
Now that's what I call commitment ;)
Monday, October 30, 2006
Ctrl -> CapsLock
I have to admit it, I'm also one of those freaks who moved their Ctrl key! You see, the problem is that I used to operate the left control key with my thumb. I could show you photos of this, but trust me, you don't want to see those... .
Usually, this is no problem at all (meaning I take only negligible physical damage), but every once in a while I have to type a lot (like finished my Ph.D. thesis), and then the situation just gets worse.
But there is a solution! Just move your control key up to the caps-lock key. This is cool for several reasons:
The only question is, how to do it. Well, fortunately, it is rather simple:
Usually, this is no problem at all (meaning I take only negligible physical damage), but every once in a while I have to type a lot (like finished my Ph.D. thesis), and then the situation just gets worse.
But there is a solution! Just move your control key up to the caps-lock key. This is cool for several reasons:
- You can hit the (new) control key with your little finger - with even less strain than the original left control key.
- On laptops, the control key is often moved to different positions, while the caps-lock key isn't.
- It makes me think of the good ol' C=64:
(Although for that, I would have to move the control key up to the tab key...) - You won't be able to use other computers any more.
The only question is, how to do it. Well, fortunately, it is rather simple:
- Gnome/KDE: Just go to the keyboard preferences, setxkbmap options. There you can turn your caps-lock key into an additional
control key. - Plain X (Xorg): setxkbmap -option "ctrl:nocaps". Also see here for more options.
- Win XP: Open up
regedit
. Then add toHKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Keyboard Layout
a binary value namedScancode Map
and set it to
00 00 00 00
00 00 00 00
02 00 00 00
1d 00 3a 00
00 00 00 00
Don't forget to reboot! Believe me, it's magic and it works!
Sunday, October 29, 2006
Shell Tool of the week: rlwrap
This week's "shell tool of the week" (and as such first of it's kind) is
If it hadn't been done, I would've done it!
Addendum: I just found out that
rlwrap
. rlwrap
calls another interactive program but channels all of the input through the readline library (the notorious piece of software which refuses to become L-GPL, for a reason), giving you extensive command-line editing capabilities even for programs which are just getting their input via gets
(for example, ftp
).If it hadn't been done, I would've done it!
Addendum: I just found out that
rlwrap
actually stores different histories for each program it is used with!
Subscribe to:
Posts (Atom)