ActiveRecord Callbacks and the Single Responsibility Principle

Persistence models aren't domain models

I've often toyed with the idea of writing a post titled "ActiveRecord callbacks considered harmful", but always catch myself when I consider that the ActiveRecord default that sets the created_at and updated_at fields on a row before saving it is incredibly useful. It fascinates me that the same tool that allows us to intuitively trust that our timestamps are correct can also cause some of the most frustrating and time wasting debugging sessions a Ruby developer will ever encounter.

Part of the problem with demonstrating why this can be a frustration is that it requires a non-trivial example. A trivial example with one or two callbacks typically doesn't truly highlight the complexity introduced by callbacks that reach outside the scope of the observed object.

Martin Fowler describes an Active Record as "An object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data."

The definition of the pattern directly violates the Single Responsibility Principle. A single object combines database access and domain logic in the same place, usually without any inversion of control. The pattern works very well when you have a simple domain (as we see in the 15 minute blog exercise) but leaves many important questions unanswered. For example, how should we deal with the entire set of rows in a table? What about a specific subset such as 'activated users'? What about projections where we only deal with reading, but not writing data? Ignore these questions at your own peril when you're trying to express a complex problem domain in terms of a relational database.

Fundamentally we have at least two distinct responsibilities here. The responsibility of managing persistence of values to a data store and the responsibility of the domain logic that applies to the data held by the object are completely orthogonal to each other - even if that domain logic results in database reads or writes. Herein lies my gripe with callbacks in Ruby's ActiveRecord library: they provide a convenient place to do domain-logic processing at the point that data persistence is happening. Good object oriented design principles suggest that these responsibilities should be separated.

So why does the built-in hook that sets the created_at and updated_at fields on a record on save work so well? I see those columns as stores for metadata specifically related to persistence. The only time these columns should be set is when creating or updating a record in the database and therefore the callback is the perfect place for the logic to sit that avoids duplication.

Let's consider something a bit more concrete where callbacks seem to be an appropriate solution. In this blog's Rails code I use a before_save callback to generate the slug for a post and to precompile the HTML of the post's body from Markdown.

class Post < ActiveRecord::Base
  before_save :compile_markdown
  before_save :generate_slug, if: -> { slug.blank? }

  # ...

  private
  def compile_markdown
    compiler = MarkdownWithPygmentsCompiler.new
    self.processed_body = compiler.compile(body)
  end

  def generate_slug
    self.slug = SlugGenerator.new.generate(title)
  end
end

This allows me to write my blog posts in Markdown and have them cached as HTML in the database every time I save the record. Likewise I don't need to set a specific slug on a post when I create it, as it will automatically be set based on the title if one hasn't already been set. This is a convenient way of handling this logic but it has a few quirks I'd like to explore.

Firstly I can't change the compiled HTML body of a post as any change will always be overwritten by the compile_markdown call. I have a tight coupling to Markdown that can't easily be changed to something like Textile without causing compilation errors when I update an old post (even if I don't change the body at all!).

Secondly I could never create a post without a slug (perhaps to fall back on a database ID), or use a different algorithm to generate the slug without either overriding the generate_slug private method or creating the Post and setting the slug from outside of the object instance.

These two cases may seem trivial and in my case have worked fairly well, but tradeoffs must be considered. The full Post model is barely 30 lines long, neither of the callbacks have a side-effect that can affect the other callback. When a callback only looks at the object being saved without any reference to global state or values held by related objects, it seems to be manageable.

CQS suggests that we should separate methods that change the state of a system from methods that observe a system and return a result. Using these before_save callbacks has introduced a side-effect to calling Post#save that modifies the internal state of the object in place and isn't immediately obvious in the context of the caller.

A more maintainable alternative

While I'm happy to live with these callbacks for now, I think this could be better factored if we separate the domain logic from the persistence logic. The Post ActiveRecord model should ideally be devoid of any domain logic.

Ivar Jacobsen in his book Object Oriented Software Engineering: A Use Case Driven Approach suggests that we can model all user interactions with the system in terms of Use Cases. I've come to call them interactions or interactors. An interaction will accept input in the form of a form object, handle validations and transformations, and then use simple persistence models to make changes to the database.

Here's how I'd restructure the creation of a post from Markdown (using the awesome Virtus gem to handle form objects).

class Forms::CreatePost
  include Virtus.model
  include ActiveModel::Validations

  values do
    attribute :title
    attribute :body
    attribute :slug
  end

  validate :title, :html_body, presence: true
end
class Interactions::CreatePost
  def initialize(form)
    @form = form
  end

  def run
    raise(InvalidFormError, form) if !form.valid?
    post = Post.new(form.merge(html_body: form.body))
    yield post if block_given?
    post.save!
  end

  private
  attr_reader :form
end

One way of compiling from Markdown could be to inherit this Interaction and pass a block to the run method on Interactions::CreatePost

class Interactions::CreatePostFromMarkdown < Interactions::CreatePost
  def run
    super do |post|
      post.html_body = compile_markdown(form.body)
    end
  end

  private
  def compile_markdown(markdown)
    compiler = MarkdownWithPygmentsCompiler.new
    compiler.compile(markdown)
  end
end

There's definitely more code involved in doing it this way, but there are several advantages too. The domain logic is now separate from the ActiveRecord model, which can now be reduced to two lines of code:

class Post < ActiveRecord::Base

end

If I wanted to use Textile instead of Markdown I could simply implement a new Interaction and call into that for new posts. Both Markdown and Textile logic could safely coexist in the same code base without interfering with each other.

In terms of object design we've moved the responsibility of Markdown compilation out of the Post class and given it a name in our domain. This has enabled us to extend the post creation to support other forms of input such as Textile without modifying the Post class. We can easily add support for as many formats as we'd like to now and the code that consumes Post objects will require no modification to support them.

Don't over-engineer a solution

There's a good reason the golden path in Rails is called the golden path. If you're going to stray from it you need a good reason for it. I haven't implemented things this way in my blog because it would have been completely over-engineered. But when trying to model a more complex domain than a simple blog post, where your presentation might not exclusively be on the web and creating an entity might involve more than just creating one row in a database (for example opening a new account with default settings), then keeping your persistence models as light as possible has clear advantages to me by minimising visible side-effects and following the SOLID principles of OO design more closely.