Friday, August 27, 2004

Object.equals()

Like many languages Java provides two equality methods to test objects for equivalence. The first is accessed via the '==' operator and does a simple heap reference comparison. The other is a standard method on Object, equals() which is intended to be overridden by the programmer to perform a value-based comparison. Now the equivalence operator is defined by the language spec, and ultimately implemented by the JVM developers, hence not our concern. Unfortunately the java community has failed to properly document the subtleties involved in writing a correct equals method, and so I continue to come across incorrect implementations and worse, incorrect recommendations.

But first we have to understand what we are dealing with. Fundamentally when we are defining an equals method on a class A, what we are defining is an Equivalence Relation over the set of all objects of class A. This is important because it highlights that we can define our equals method to return anything we want provided we meet three rules.

  • Reflexivity: a.equals(a) -> true
  • Symmetry: a.equals(b) <-> b.equals(a)
  • Transitivity: a.equals(b) && b.equals(c) -> a.equals(c)
If we break any of these rules someone will suffer subtle and devious bugs.

Node: To save typing I'm going to leave out the hashCode method in each of these examples, but be aware if you override equals() you *must* override hashCode() to match.

One consequence of the requirements is that we must ensure that any test we use in our equals implementation meets these requirements as well. So lets consider the standard mistake:

class A {
  int n1;
  public boolean equals(Object o) {
    if (!(o instanceof A)) {
      return false;
    } else {
      A rhs = (A)o;
      return this.n1 == rhs.n1;
    }
  }
}
This is a very common implementation of equals, it has even made its way into a number of Java books. Worse, the bug only manifests in the presence of implementation inheritance of concrete classes.
class B extends A { 
  int n2;
  public boolean equals(Object o) {
    if (!(o instanceof B)) {
      return false;
    } else {
      B rhs = (B)o;
      return this.n1 == rhs.n1 && this.n2 == rhs.n2;
    }
  }
}
To demonstrate the bug lets define a simple function on A's:
static printEqual(A a1, A a2) {
  if (a1.equals(a2)) {
    System.out.println("EQUAL");
  } else {
    System.out.println("NOT-EQUAL");
  }
}
And test it:
  A test1 = new A();
  A test2 = new B();
  printEqual(test1, test2);
      => Prints "EQUAL"
  printEqual(test2, test1);
      => Prints "NOT-EQUAL" !!!!
And we've broken symmetry. This will bite you. There is no situation when an asymmetrical equality is not a bug. Unfortunately it is scary how many places on the net this is presented as the exemplar equals() implementation.

So what does a correct equals method look like. Well there are two. The first is very straight forward, and should be the default choice.

class A {
  int n1;
  public boolean equals(Object o) {
    if (o == this) {
      return true;
    } else if (o == null || !getClass().equals(o.getClass())) {
      return false;
    } else {
      A rhs = (A)o;
      return this.n1 == rhs.n1;
    }
  }
}

class B extends A { 
  int n2;
  public boolean equals(Object o) {
    if (!super.equals(o)) {
      return false;
    } else {
      B rhs = (B)o;
      return this.n2 == rhs.n2;


    }
  }
}
Now test1 != test2 AND test2 != test1. Also, by delegating from B to A at the start of B::equals() we ensure any inaccessible state in A is handled appropriately, and more importantly, that the equality code for A isn't duplicated in B. If you can't delegate from B->A like this, then it suggests that your B isn't actually an A, and you should check your class design (you probably should be using delegation rather than inheritance).

There are times however when you would really like an B to compare equal to an A. The classic example is the circle/ellipse or square/rectangle shape examples. It would be really nice if an ellipse with equal axes compared equal to an equivalent instance of a circle sub-class. The way we can achieve this is to recognise that the core requirement of a correct equals implementation is that the operations we use to implement meet the same reflexivity/symmetry/transitivity requirements as the method itself. Next note that instanceof is only asymmetric in the subclass; and in fact any operation in the subclass will lead to asymmetry! So if we do need subclassed equality, we can't permit a subclass to override the equals() method.

abstract class Shape {
  public boolean equals(Object o) {
    if (o == this) {
      return true;
    } else if (o instanceof Shape) {
      // !! Note: instanceof is safe here as Shape is abstract.
      Shape rhs = (Shape)o;
      return [shape specific comparison --- maybe position];
    } else {
      return false;
    }
  }
}

class Ellipse extends Shape { 
  int minor, major;

  public int getMinor() { return minor; }
  public int getMajor() { return major; }

  // !!! NOTE THE USE OF *final* to prevent asymmetry!
  // Also note the exclusive use of accessors.
  public final boolean equals(Object o) {
    if (!super.equals(o)) {
      return false;
    } else if (getClass().equals(o.getClass())) {
      Eclipse rhs = (Eclipse)o;
      return this.getMajor == rhs.getMajor()
          && this.getMinor == rhs.getMinor();
    } else {
      return false;
    }
  }
}

public Circle extends Ellipse {
  public Circle(int radius) { major = minor = radius; }
  public int getRadius()    { return major; }
}
Note that this is probably a rather clumbersome hierachy. A better approach would be:
public abstract class Ellipse {
  public abstract int getMajor();
  public abstract int getMinor();
  public final boolean equals(Object o) {
    if (!super.equals(o)) {
      return false;
    } else {
      Eclipse rhs = (Eclipse)o;
      return this.getMajor == rhs.getMajor()
          && this.getMinor == rhs.getMinor();
    }
  }
}

public class EllipseImpl {
  int major, minor;
  ...
}

public class CircleImpl {
  int radius;
  public int getMajor() { return radius; }
  ...
}
Note that an CircleImpl will still compare equal with an appropriate EllipseImpl.

Please. If you are programming Java, believe me, these really are the only two options open to you. I really am very tired of tracking down the subtle bugs that result when this advice is ignored.

5 comments:

Anonymous said...

Nice write up! Thanks. I am dropping the following code on the default solution however:

if (o == this) {
return true;
}

I'll just let the "else" portion figure that one out (but please correct me if I missed something). I just like less lines of code, and on average this may run faster as the case of comparing an object to itself is statistically less in the code I write.

Ehren said...

In the first example, if you remove o == this I'm pretty sure equals will return false if both objects are null.

Andrae Muys said...

By definition, both objects cannot be null. An object has to be non-null for equals to be called on it.

Steven Shaw said...

Andrae, I noticed that there are a couple of bugs here too. The first solution (with class A and B) is fine.

The first problem is in the second solution in class Ellipse:

} else if (getClass().equals(o.getClass())) {

This means that Circles cannot be compared against Ellipses. The on Github here. I've got my own solution called Equals1Alternative.java that is based from your solution 1 but allows for equals between classes in the same hierarchy. I've got to say that there was a bug in my attempt at that too.

Implementing equals is hard. I have tended to use an IDE plugin to generate it. I much prefer Scala's case classes when I get to use them. It looks to me that a better programming language "solution" would be not to have a magic superclass such as Object with methods like equals, hashCode and toString. Now I really just read Odersky's article on the subject!

Steven Shaw said...

Wow, thanks blogger. Totally munged me reply :(