Will
Cannings

Ruby provides four equality methods on objects: =====eql?, andequal?. By default, BasicObject and Object define all four to be equivalent, but it’s common to redefine their semantics for subclasses to support hashes (which includes sets) and Enumerable methods such asuniq (which actually uses a hash in the MRI implementation). In Impromptu (a lazy loading and component dependency system I’m developing, similar to the autoloading features in ActiveSupport) I have a class which represents folders containing source files. I use a set to track all folder instances to ensure a folder is only included once:

component 'core' do
    folder 'src'
end
component 'extensions' do
    folder 'src'
end
Impromptu::ComponentSet.folders.count => 1

Each folder object has a path attribute, and two folders with the same path should be considered the same folder. Ruby hashes call the hashmethod on an object to create an integer value used for table lookups. It may make sense to only overload the hash method for folders but we run into problems:

# overload the hash method on folders to be the hash of the path
class Folder
    def hash
        self.path.hash
    end
end

# create two folders with the same path
f1 = Folder.new
f1.path = "src"
f2 = Folder.new
f2.path = "src"

# attempt to add both folders
s = Set.new
s << f1
=> #}>
s << f2
=> #, #}>

This clearly doesn’t work. Although the hash of the two folders will be the same (4270666928775410514), when the hash object is performing a lookup it detects a collision on the key and tries to determine if the two objects are equal. If not, it assumes both objects just happen to have the same hash value and (if we’re performing an insertion) will insert the second object. This is important because it’s perfectly reasonable for two objects to have the same hash value yet not be equal, but it’s not the behaviour we want. Hash uses the eql? method to determine equality, so we need to override that too:

class Folder
    def eql?(other)
        self.path == other.path
    end
end

s = Set.new
s << f1

=> #}>
s << f2
=> #}>

Which is the behaviour we were after. You could also override == and===, but never override equal? as Ruby uses this to determine object identity equality. It’s also important to never base the hash of an object on a mutable value, unless you are sure it’s what you want to do. The hash of an array is based on the values contained in the array, and this causes problems if the values change:

# the hashes of arrays with the same values are equivalent
a1 = [1]
a2 = [1]
a1.hash == a2.hash
=> true

# add an array to the hash
h = {}
h[a1] = 1
h[a1]
=> 1

# problems arise when the set of values contained in an array change
a1 << 2
h[a1]
=> nil

# even though the inspect string of the hash suggests everything is fine
h
=> {[1, 2]=>1} 

# this happens because the hash value of a1 has changed, so lookups on
# a1 occur in the wrong place. if we revert a1 to what it was before,
# the hash table can find a1 again
a1.delete(2)
h[a1]
=> 1