Threading in Ruby as been busted for a long time. Back in 1.8.7, I actually got segfaults in threaded code. Thankfully, those went away in 1.9, but now I’ve come across a new problem which is almost as bad, save for that I can actually do something about it.
The problem happens when a thread locks a Mutex (or Monitor) inside a timeout block and the timeout expires before the thread can unlock it.
require "thread"
require "timeout"
N = 10
M = 10
T = 0.1
lock = Mutex.new
func = Proc.new do
begin
Timeout.timeout(T){ lock.synchronize{ sleep } }
rescue Timeout::Error
nil
end
end
threads = N.times.collect{ Thread.new{ M.times{ func.call } } }
threads.each{ |thread| thread.join }
puts "no deadlocks, yay!"
The bug doesn’t happen very frequently; you may even have to run that snippet a few times before you see it happen, but my threaded asynchronous task processing daemon has high enough throughput in our production environment to where we were seeing the problem very frequently.
It wouldn’t be so bad if the mutex could get out of its bad state; what’s a failed task here and there out of thousands? Guess it depends on the tasks… ;) Unfortunately, once the mutex is in the bad state, all subsequent tasks run on that thread will fail.
There is hope though! This disgusting little hack completely alleviates the problem…
require "thread"
class Mutex
def lock_with_hack
lock_without_hack
rescue ThreadError => e
if e.message != "deadlock; recursive locking"
raise
else
unlock
lock_without_hack
end
end
alias_method :lock_without_hack, :lock
alias_method :lock, :lock_with_hack
end
Now before you go off and include that snippet in your code, be aware that it effectively turns Mutex into Monitor (i.e. allows recursive locking). In our use case (and I think most), that is fine though.
Here is a little Ruby script you can run on the command line that demonstrates the bug and also shows how the hack fixes it. Run like so:
ruby deadlock.rb # demonstrates bug ruby deadlock.rb --with-mutex-hack # demonstrates fix
I’ve reported the bug on Ruby’s issue tracker where it seems to be getting some attention.
