Ruby’s ThreadError: deadlock; recursive locking bug

Threading in Ruby as been busted for a long time. Back in 1.8.7, I actually got segfaults in threaded code. Thankfully, those went away in 1.9, but now I’ve come across a new problem which is almost as bad, save for that I can actually do something about it.

The problem happens when a thread locks a Mutex (or Monitor) inside a timeout block and the timeout expires before the thread can unlock it.

require "thread"
require "timeout"

N = 10
M = 10
T = 0.1

lock = Mutex.new
func = Proc.new do
  begin
    Timeout.timeout(T){ lock.synchronize{ sleep } }
  rescue Timeout::Error
    nil
  end
end

threads = N.times.collect{ Thread.new{ M.times{ func.call } } }
threads.each{ |thread| thread.join }

puts "no deadlocks, yay!"

The bug doesn’t happen very frequently; you may even have to run that snippet a few times before you see it happen, but my threaded asynchronous task processing daemon has high enough throughput in our production environment to where we were seeing the problem very frequently.

It wouldn’t be so bad if the mutex could get out of its bad state; what’s a failed task here and there out of thousands? Guess it depends on the tasks… ;) Unfortunately, once the mutex is in the bad state, all subsequent tasks run on that thread will fail.

There is hope though! This disgusting little hack completely alleviates the problem…

require "thread"
class Mutex
  def lock_with_hack
    lock_without_hack
  rescue ThreadError => e
    if e.message != "deadlock; recursive locking"
      raise
    else
      unlock
      lock_without_hack
    end
  end
  alias_method :lock_without_hack, :lock
  alias_method :lock, :lock_with_hack
end

Now before you go off and include that snippet in your code, be aware that it effectively turns Mutex into Monitor (i.e. allows recursive locking). In our use case (and I think most), that is fine though.

Here is a little Ruby script you can run on the command line that demonstrates the bug and also shows how the hack fixes it. Run like so:

ruby deadlock.rb                   # demonstrates bug
ruby deadlock.rb --with-mutex-hack # demonstrates fix

I’ve reported the bug on Ruby’s issue tracker where it seems to be getting some attention.

Leave a Reply