Enumerators in Ruby 1.9

18 August 2013

Lazy evaluation of enumerables is one of the most exciting new features in Ruby 2.0’s standard library. Changing the execution sequence of an enumeration pipeline to yield item by item is as easy as starting the enumeration chain with lazy.

This type of lazy evaluation is the standard when working with IEnumerable<T> in the .NET space. It allows you to create a pipeline that can project from one data structure into another without needing to evaluate an entire stack of objects at a time. This is really useful when dealing with ETL tasks as you can work with one entry at a time instead of projecting an array of all entries at each step of the process. This gives tremendous efficiency when reading hundreds of thousands of entries in on one side of the pipeline, doing a few map/reduce transformations and saving the result of the transformation.

While the Ruby 2.0 Enumerable::Lazy really brings Ruby up to that level of efficiency, there are ways of getting that behaviour in Ruby 1.9 using the Enumerator class.

Consider this example:

puts RUBY_VERSION

en = Enumerator.new do |e|
  puts "yielding a"
  e.yield 'a'
  puts "yielding b"
  e.yield 'b'
  puts "yielding c"
  e.yield 'c'
end

en.each do |e|
  puts "received #{e}"
end
# >> 1.9.3
# >> yielding a
# >> received a
# >> yielding b
# >> received b
# >> yielding c
# >> received c

Yielding from the Enumerator will release execution to the consuming code for each entry, where as if you project the enumerator into an array first, you get a different execution order:

puts RUBY_VERSION

en = Enumerator.new do |e|
  puts "yielding a"
  e.yield 'a'
  puts "yielding b"
  e.yield 'b'
  puts "yielding c"
  e.yield 'c'
end

en.to_a.each do |e|
  puts "received #{e}"
end
# >> 1.9.3
# >> yielding a
# >> yielding b
# >> yielding c
# >> received a
# >> received b
# >> received c

It’s a subtle difference, but yields extreme power (pun intended ;)).