BlogLogin

New Features in Ruby 2.4

By John Backus on July 20, 2016

Faster regular expressions with Regexp#match?

Ruby 2.4 adds a new #match? method for regular expressions which is three times faster than any Regexp method in Ruby 2.3:

Regexp#match?:  2630002.5 i/s
  Regexp#===:   872217.5 i/s - 3.02x slower
   Regexp#=~:   859713.0 i/s - 3.06x slower
Regexp#match:   539361.3 i/s - 4.88x slower
Expand benchmark source

When you call Regexp#===, Regexp#=~, or Regexp#match, Ruby sets the $~ global variable with the resulting MatchData:

/^foo (\w+)$/ =~ 'foo bar'      # => 0
$~                              # => #<MatchData "foo bar" 1:"bar">

/^foo (\w+)$/.match('foo baz')  # => #<MatchData "foo baz" 1:"baz">
$~                              # => #<MatchData "foo baz" 1:"baz">

/^foo (\w+)$/ === 'foo qux'     # => true
$~                              # => #<MatchData "foo qux" 1:"qux">

Regexp#match? returns a boolean and avoids building a MatchData object or updating global state:

/^foo (\w+)$/.match?('foo wow') # => true
$~                              # => nil

By skipping the global variable Ruby is able to avoid work allocating memory for the MatchData.

New #sum method for Enumerable

You can now call #sum on any Enumerable object:

[1, 1, 2, 3, 5, 8, 13, 21].sum # => 54

The #sum method has an optional parameter which defaults to 0. This value is the starting value of a summation meaning that [].sum is 0.

If you are calling #sum on an array of non-integers then you need to provide your own initial value:

class ShoppingList
  attr_reader :items

  def initialize(*items)
    @items = items
  end

  def +(other)
    ShoppingList.new(*items, *other.items)
  end
end

eggs   = ShoppingList.new('eggs')          # => #<ShoppingList:0x007f952282e7b8 @items=["eggs"]>
milk   = ShoppingList.new('milks')         # => #<ShoppingList:0x007f952282ce68 @items=["milks"]>
cheese = ShoppingList.new('cheese')        # => #<ShoppingList:0x007f95228271e8 @items=["cheese"]>

eggs + milk + cheese                       # => #<ShoppingList:0x007f95228261d0 @items=["eggs", "milks", "cheese"]>
[eggs, milk, cheese].sum                   # => #<TypeError: ShoppingList can't be coerced into Integer>
[eggs, milk, cheese].sum(ShoppingList.new) # => #<ShoppingList:0x007f9522824cb8 @items=["eggs", "milks", "cheese"]>

On the last line an empty shopping list (ShoppingList.new) is supplied as the initial value.

New methods for testing if directories or files are empty

In Ruby 2.4 you can test whether directories and files are empty using the File and Dir modules:

Dir.empty?('empty_directory')      # => true
Dir.empty?('directory_with_files') # => false

File.empty?('contains_text.txt')   # => false
File.empty?('empty.txt')           # => true

The File.empty? method is equivalent to File.zero? which is already available in all supported Ruby versions:

File.zero?('contains_text.txt')  # => false
File.zero?('empty.txt')          # => true

Unfortunately these methods are not available for Pathname yet.

Extract named captures from Regexp match results

In Ruby 2.4 you can called #named_captures on a Regexp match result and get a hash containing your named capture groups and the data they extracted:

pattern  = /(?<first_name>John) (?<last_name>\w+)/
pattern.match('John Backus').named_captures # => { "first_name" => "John", "last_name" => "Backus" }

Ruby 2.4 also adds a #values_at method for extracting just the named captures which you care about:

pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
pattern.match('2016-02-01').values_at(:year, :month) # => ["2016", "02"]

The #values_at method also works for positional capture groups:

pattern = /(\d{4})-(\d{2})-(\d{2})$/
pattern.match('2016-07-18').values_at(1, 3) # => ["2016", "18"]

New Integer#digits method

If you want to access a digit in a certain position within an integer (from right to left) then you can use Integer#digits:

123.digits                  # => [3, 2, 1]
123.digits[0]               # => 3

# Equivalent behavior in Ruby 2.3:
123.to_s.chars.map(&:to_i).reverse # => [3, 2, 1]

If you want to know positional digit information given a non-decimal base, you can pass in a different radix. For example, to lookup positional digit information for a hexadecimal integer you can pass in 16:

0x7b.digits(16)                                # => [11, 7]
0x7b.digits(16).map { |digit| digit.to_s(16) } # => ["b", "7"]

Improvements to the Logger interface

The Logger library in Ruby 2.3 can be a bit cumbersome to setup:

logger1 = Logger.new(STDOUT)
logger1.level    = :info
logger1.progname = 'LOG1'

logger1.debug('This is ignored')
logger1.info('This is logged')

# >> I, [2016-07-17T23:45:30.571508 #19837]  INFO -- LOG1: This is logged

Ruby 2.4 moves this configuration to Logger’s constructor:

logger2 = Logger.new(STDOUT, level: :info, progname: 'LOG2')

logger2.debug('This is ignored')
logger2.info('This is logged')

# >> I, [2016-07-17T23:45:30.571556 #19837]  INFO -- LOG2: This is logged

Parse CLI options into a Hash

Parsing command line flags with OptionParser often involves a lot of boilerplate in order to compile the options down into a hash:

require 'optparse'
require 'optparse/date'
require 'optparse/uri'

config = {}

cli =
  OptionParser.new do |options|
    options.define('--from=DATE', Date) do |from|
      config[:from] = from
    end

    options.define('--url=ENDPOINT', URI) do |url|
      config[:url] = url
    end

    options.define('--names=LIST', Array) do |names|
      config[:names] = names
    end
  end

Now you can provide a hash via the :into keyword argument when parsing arguments:

require 'optparse'
require 'optparse/date'
require 'optparse/uri'

cli =
  OptionParser.new do |options|
    options.define '--from=DATE',    Date
    options.define '--url=ENDPOINT', URI
    options.define '--names=LIST',   Array
  end

config = {}

args = %w[
  --from  2016-02-03
  --url   https://blog.blockscore.com/
  --names John,Daniel,Delmer
]

cli.parse(args, into: config)

config.keys    # => [:from, :url, :names]
config[:from]  # => #<Date: 2016-02-03 ((2457422j,0s,0n),+0s,2299161j)>
config[:url]   # => #<URI::HTTPS https://blog.blockscore.com/>
config[:names] # => ["John", "Daniel", "Delmer"]

Faster Array#min and Array#max

In Ruby 2.4 the Array class defines its own #min and #max instance methods. This change dramatically speeds up the #min and #max methods on Array:

     Array#min:       35.1 i/s
Enumerable#min:       21.8 i/s - 1.61x slower
Expand benchmark source

Simplified integers

Until Ruby 2.4 you had to manage many numeric types:

# Find classes which subclass the base "Numeric" class:
numerics = ObjectSpace.each_object(Module).select { |mod| mod < Numeric }

# In Ruby 2.3:
numerics # => [Complex, Rational, Bignum, Float, Fixnum, Integer, BigDecimal]

# In Ruby 2.4:
numerics # => [Complex, Rational, Float, Integer, BigDecimal]

Now Fixnum and Bignum are implementation details that Ruby manages for you. This should help avoid subtle bugs like this:

def categorize_number(num)
  case num
  when Fixnum then 'fixed number!'
  when Float  then 'floating point!'
  end
end

# In Ruby 2.3:
categorize_number(2)        # => "fixed number!"
categorize_number(2.0)      # => "floating point!"
categorize_number(2 ** 500) # => nil

# In Ruby 2.4:
categorize_number(2)        # => "fixed number!"
categorize_number(2.0)      # => "floating point!"
categorize_number(2 ** 500) # => "fixed number!"

If you have Bignum or Fixnum hardcoded in your source code that is fine. These constants now point to Integer:

Fixnum  # => Integer
Bignum  # => Integer
Integer # => Integer

New arguments supported for float modifiers

#round, #ceil, #floor, and #truncate now accept a precision argument

4.55.ceil(1)     # => 4.6
4.55.floor(1)    # => 4.5
4.55.truncate(1) # => 4.5
4.55.round(1)    # => 4.6

These methods all work the same on Integer as well:

4.ceil(1)        # => 4.0
4.floor(1)       # => 4.0
4.truncate(1)    # => 4.0
4.round(1)       # => 4.0

Case sensitivity for unicode characters

Consider the following sentence:

My name is JOHN. That is spelled J-Ο-H-N

Calling #downcase on this string in Ruby 2.3 produces this output:

my name is john. that is spelled J-Ο-H-N

This is because “J-Ο-H-N” in the string above is written with unicode characters.

Ruby’s letter casing methods now handle unicode properly:

sentence =  "\uff2a-\u039f-\uff28-\uff2e"
sentence                              # => "J-Ο-H-N"
sentence.downcase                     # => "j-ο-h-n"
sentence.downcase.capitalize          # => "J-ο-h-n"
sentence.downcase.capitalize.swapcase # => "j-Ο-H-N"

New option to specify size of a new string

When creating a string you can now define a :capacity option which will tell Ruby how much memory it should allocate for your string. This can help performance as Ruby can avoid reallocations as you increase the size of the string in question:

   With capacity:    37225.1 i/s
Without capacity:    16031.3 i/s - 2.32x slower
Expand benchmark source

Fixed matching behavior for symbols

Ruby 2.3’s Symbol#match returned the match position even though String#match returns MatchData. This inconsistency is fixed in Ruby 2.4:

# Ruby 2.3 behavior:

'foo bar'.match(/^foo (\w+)$/)  # => #<MatchData "foo bar" 1:"bar">
:'foo bar'.match(/^foo (\w+)$/) # => 0

# Ruby 2.4 behavior:

'foo bar'.match(/^foo (\w+)$/)  # => #<MatchData "foo bar" 1:"bar">
:'foo bar'.match(/^foo (\w+)$/) # => #<MatchData "foo bar" 1:"bar">

Multiple assignment inside of conditionals

You can now assign multiple variables within a conditional:

branch1 =
  if (foo, bar = %w[foo bar])
    'truthy'
  else
    'falsey'
  end

branch2 =
  if (foo, bar = nil)
    'truthy'
  else
    'falsey'
  end

branch1 # => "truthy"
branch2 # => "falsey"

You probably shouldn’t do that though.

Exception reporting improvements for threading

If you encounter an exception within a thread then Ruby defaults to silently swallowing up that error:

puts 'Starting some parallel work'

thread =
  Thread.new do
    sleep 1

    fail 'something very bad happened!'
  end

sleep 2

puts 'Done!'
$ ruby parallel-work.rb
Starting some parallel work
Done!

If you want to fail the entire process when an exception happens within a thread then you can use Thread.abort_on_exception = true. Adding this to the parallel-work.rb script above would change the output to:

$ ruby parallel-work.rb
Starting some parallel work
parallel-work.rb:9:in 'block in <main>': something very bad happened! (RuntimeError)

In Ruby 2.4 you now have a middle ground between errors being silently ignored and aborting your entire program. Instead of abort_on_exception you can set Thread.report_on_exception = true:

$ ruby parallel-work.rb
Starting some parallel work
#<Thread:0x007ffa628a62b8@parallel-work.rb:6 run> terminated with exception:
parallel-work.rb:9:in 'block in <main>': something very bad happened! (RuntimeError)
Done!