Wednesday, July 08, 2009

Threads, Exceptions, Net::HTTP, Timeouts and JRuby

I recently had some fun debugging a little application of mine written in JRuby which crawls the twitter search API looking for specific tweets. The problem was, every now and again, the crawler would hang even if I set appropriate timeouts. The crawler consisted of two threads, one periodically issuing a search to twitter, and another one writing the results into the database.

The first surprise to me was that in Ruby, by default, if there is an exception in a thread, the thread silently dies, not even issuing an error message. Only after you joined the thread, you get the error message. You can set Thread.abort_on_exception = true which completely kills your application, however. This meant that what appeared to be missed timeouts could just as well be uncaught exceptions.

So when working with multithreaded applications, enclosing the whole thread in a begin .. rescue Exception => e ... end is important if you want to get noticed about errors at all.

But still, the thread would misteriously die without properly handling the timeout. Some digging deeper revealed an old bug report and an interesting article about Ruby and timeouts in general which seemed to imply that timeouts might not always work, in particular if system calls are involved. It was unsure, though, whether the situation is the same for JRuby which uses native threads instead of Green threads (simulated threads by doing explicit time-slicing in the interpreter).

So I started to read the JRuby sources, to understand where and how timeouts are implemented in Net::HTTP. Which lead to my next surprise: Net::HTTP completely handles timeouts through the Timeout module, not on a socket level. It does not use the possibilities to set timeouts on reads or writes but encapsulates all the significant portions of code in Timeout::timeout { ... } calls. I also found a nice old post by Charles Nutter (a.k.a. headius) explaining the implementation in depth.

I guess based on that post, JRuby has started to implement Timeout again in Java (source). And some first tests revealed that this timeout plays well within Net::HTTP, but my crawler was still hanging every once and again.

Finally, I found the last missing piece of information: From the sources, it seems that JRuby's implementation of the Timeout module raises the Timeout::ExitException when the timeout happened. However, that is a ruby 1.9 feature, in ruby 1.8, the exception was named Timeout::Error. So basically, I was catching the wrong exception, in fact not catching it at all, thereby killing the thread (silently). Interestingly, some more testing showed that if JRuby raises a Timeout::Error if you use the Timeout module yourself, but raises a Timeout::ExitException when you're using Net::HTTP.

In the end, I just enclose the whole HTTP request section with a Timeout::timeout of my own, catching both Timeout::Error and Timeout::ExitException and finally everything was running robustly.

So in summary

  • Uncaught exceptions silently kill threads in ruby.
  • Net::HTTP does it's own timeouts through the Timeout module.
  • Somtimes, JRuby raises Timeout::ExitException, not Timeout::Error. Will be fixed in JRuby 1.4 (see below)


I guess I should post a bug report on the latter... .

Update: I submitted a bug report, and it's already fixed! Those jruby people are really incredibly fast!

2 comments:

Anonymous said...

Hey there - Would you mind posting a sample HTTP session, including the timeout code. I've been playing with JRuby 1.3 and haven't been able to get timeouts to work as expected?

I've been trying to do something like this, with no avail:

# First create a generic socket server listening on 9000, by running the following in a terminal window
# $ nc -l 9090


url = URI.parse('http://localhost:9090/index.html')
req = Net::HTTP::Get.new(url.path)

begin
Timeout.timeout 2 do
res = Net::HTTP.start(url.host, url.port) {|http| http.request(req)}
end
puts res.body
rescue Exception => e
puts "Got #{e.class}: #{e}"
end

Mikio Braun said...

You're code seems fine, although on my computer (Linux debian box), it doesn't work with the "nc" command (I always get a "connection refused").

But if you try it, e.g. on Google with a very short time-out, you get a time out just as you should:

require 'uri'
require 'net/http'

url = URI.parse('http://www.google.com/index.html')
req = Net::HTTP::Get.new(url.path)

begin
Timeout.timeout 0.01 do
res = Net::HTTP.start(url.host, url.port) {|http| http.request(req)}
end
puts res.body
rescue Exception => e
puts "Got #{e.class}: #{e}"
end