Sunday, July 20, 2008

eager loading + limit + order in named scope = ActiveRecord 2.1.0 bug

I found this bug while writing my pagination plugin that uses named scopes. It's caused by an oversight in construct_finder_sql_for_association_limiting and manifests when the following three conditions are met:

  • there is a has_many association being eager loaded
  • the query is being limited
  • there is order specified by a named scope

I fixed the bug by making modifications to construct_finder_sql_for_association_limiting. You can apply the fix by including the following code into your Rails project:

module ActiveRecord
  module Associations
    module ClassMethods

      def construct_finder_sql_for_association_limiting(options, join_dependency)
        scope       = scope(:find) || {}
        order       = [options[:order], scope[:order]].compact.join(', ')

        # Only join tables referenced in order or conditions since this is particularly slow on the pre-query.
        tables_from_conditions = conditions_tables(options)
        tables_from_order      = order_tables(options)
        all_tables             = tables_from_conditions + tables_from_order
        distinct_join_associations = all_tables.uniq.map{|table|
          join_dependency.joins_for_table_name(table)
        }.flatten.compact.uniq

        is_distinct = !options[:joins].blank? || include_eager_conditions?(options, tables_from_conditions) || include_eager_order?(options, tables_from_order)
        sql = "SELECT "
        if is_distinct
          sql << connection.distinct("#{connection.quote_table_name table_name}.#{primary_key}", order)
        else
          sql << primary_key
        end
        sql << " FROM #{connection.quote_table_name table_name} "

        if is_distinct
          sql << distinct_join_associations.collect(&:association_join).join
          add_joins!(sql, options, scope)
        end

        add_conditions!(sql, options[:conditions], scope)
        add_group!(sql, options[:group], scope)

        if order and is_distinct
          connection.add_order_by_for_association_limiting!(sql, :order => order)
        else
          add_order!(sql, options[:order], scope)
        end

        add_limit!(sql, options, scope)

        return sanitize_sql(sql)
      end
      
    end
  end
end

If you are curious about what exactly was changed, see this diff pastie.

Thursday, July 17, 2008

callback chains and metaclasses

For whatever reason (don't ask) our Rails app needed to be able to define callbacks on objects. As it stands now, you can only define callbacks on classes. Well, metaclasses are classes that pertain to a single specific object, so I figured I could add the callbacks there and everything will work.

Please excuse my extremely contrived example.

p = Post.find(...)
if p.contains_porn?
  class << p
    before_save :filter_content
  end
end
p.save # should trigger filter_content

Well it doesn't work.

After digging through the ActiveSupport source code, I saw that it only looks for callback chains in the object's class, not metaclass. That's not too hard to fix...

module ActiveSupport
  module Callbacks
    def run_callbacks(kind, options = {}, &block)
      callback_chain_method = "#{kind}_callback_chain"
      # Meta class inherits Class so we don't have to merge it in 1.9
      if RUBY_VERSION >= '1.9'
        metaclass.send(callback_chain_method).run(self, options, &block)
      else
        callbacks = self.class.send(callback_chain_method) | metaclass.send(callback_chain_method)
        callbacks.run(self, options, &block)
      end
    end
  end
end

Credit to Josh Peek for the Ruby 1.9 fix

Ok, so that got it working... with one major caveat: you cannot serialize objects that use callbacks. Ruby cannot serialize objects that have metaclasses and the run_callbacks method creates a metaclass whether you use it or not.

That's fine with me, I don't really like serializing objects anyways, but apparently it's a major problem for other people...

http://github.com/rails/rails/commit/e0846c8417093853f4f7f62732983e990c28d669
http://rails.lighthouseapp.com/projects/8994/tickets/575-callbacks-don-t-work-from-extended-modules

Josh Peek suggested a solution where we store the callback chains on the objects themselves, so object specific callbacks would be done like this...

p = Post.find(...)
p.before_save :filter_content if p.contains_porn?
p.save # should trigger filter_content

That seems like a very simple, clean and rational solution. I'll look into coding it up later.

Sunday, July 6, 2008

named scopes pwns will_paginate

will_paginate has its place. It's good for paginating when your queries are simple. My queries usually aren't simple, are yours? I noticed recently that one of my pages was taking a long time to display, around 1.9 seconds. I tracked it down to the will_paginate.

The problem is that will_paginate takes a single hash of ActiveRecord::Base#find arguments. That means if your query is really complex, then it's going to use that same complex query for both the result set and the count.

This is more or less what I was doing with will_paginate that it was choking on.

class Post < ActiveRecord::Base
  STATUS_OK = 1
  cattr_accessor :per_page
  per_page = 15
  has_many :watchers, ...
end

class PostsController < ApplicationController
  
  def index
    page_no = (params[:page] && params[:page].to_i) || 1
    @posts = Post.paginate :conditions => ["posts.status_id = ?", Post::STATUS_OK],
                           :order => 'group_id DESC, checksum', # don't ask, it's complicated
                           :include => :watchers,
                           :page => page_no
  end
  
end

The counting sql it makes out of that query does an outer join on watchers and selects distinct posts.id. Ouch. So the obvious solution is to simply do the count and result set queries ourselves. No biggie, but let's use named scopes to pretty things up.

class Post < ActiveRecord::Base
  STATUS_OK = 1
  cattr_accessor :per_page
  per_page = 15
  has_many :watchers, ...
  
  named_scope :ok, :conditions => {"posts.status_id" => Post::STATUS_OK}
  named_scope :recent, :order => "group_id DESC, checksum"
  named_scope :paginate, lambda { |page_no| {:offset => {(page_no-1)*per_page}, :limit => per_page} }
end

class PostsController < ApplicationController
  
  def index
    page_no = (params[:page] && params[:page].to_i) || 1
    @posts = Post.ok.recent.paginate(page_no).all(:include => :watchers)
    @post_count = Post.ok.count
  end
  
end

Pretty slick... and since we're doing the pagination "explicitly", we know that the counting is as simple as it needs to be. Let's take a look at the benchmarks, before and after.

# The SQL calls generated by will_paginate
Post Load (0.341523)
SELECT * FROM "posts" WHERE (post.status_id = 1) ORDER BY group_id DESC, checksum LIMIT 15 OFFSET 0
Watcher Load (0.005799)
SELECT "watchers".* FROM "watchers" WHERE ("watchers".page_id IN (...)) ORDER BY id
SQL (1.335042)
SELECT count(DISTINCT "posts".id) AS count_all FROM "posts" LEFT OUTER JOIN "watchers" ON watchers.post_id = posts.id WHERE (posts.status_id = 1)

# The SQL calls generated when we "manually paginate".
Post Load (0.350336)
SELECT * FROM "posts" WHERE ("posts"."status_id" = 1) ORDER BY group_id DESC, checksum LIMIT 15 OFFSET 0
Watcher Load (0.004582)
SELECT "watchers".* FROM "watchers" WHERE ("watchers".post_id IN (...)) ORDER BY id
SQL (0.023346)
SELECT count(*) AS count_all FROM "posts" WHERE ("posts"."status_id" = 1)

We save roughly 1.3 seconds by intelligently doing the counting. I really don't think using will_paginate buys you any cleaner code over using named scopes. Now all you have to do is implement a method similar to the will_paginate view helper and you can remove will_paginate from your project.