I'm working on an application that uses Spring caching and replicates cache entries across a cluster of hosts. When one host adds a new entry to a cache, it serializes that cache entry and sends it to the other hosts for them to add too.
I noticed an issue where if an enum type is a part of the cache key the cache will not replicate properly. Each host winds up with duplicate entries in the cache for identical-looking keys.
I traced the issue to the SimpleKey's hashCode field. It gets initialized in the SimpleKey's constructor and is based on the hash codes of all of the SimpleKey's parameters. The problem is that enum hash codes are based on object identity - they aren't consistent from one host to another and will even change with an application restart. If a SimpleKey instance was deserialized from another host and contains an enum parameter, then its hashCode field will have a different value than a SimpleKey with the same parameters created by the current host.
This means that even though the replication is working - each host is adding cache entries sent from other hosts to their own caches - no host can ever get a cache hit on an entry that was sent to it from another. The different hash codes make the cache's underlying map lookups fail to find the already-existing keys. The ultimate consequence is that these caches effectively have no synchronization across hosts and they're bloated with up to one duplicate entry per host for common keys.
This pull request fixes the issue by making the SimpleKey's hashCode transient so it isn't shared with other runtimes when serialized, and by making sure its hashCode gets calculated by its own runtime even when the object was deserialized instead of constructed.
I think this is a reasonable change because identity-based hashcodes are a real possibility in Java (it's the Object.hashCode implementation) and since SimpleKey derives its hash code from a set of parameters of unknown types you can't assume it's safe to share when serializing.
Comment From: pivotal-issuemaster
@ZikFat Please sign the Contributor License Agreement!
Click here to manually synchronize the status of this Pull Request.
See the FAQ for frequently asked questions.
Comment From: pivotal-issuemaster
@ZikFat Thank you for signing the Contributor License Agreement!
Comment From: quaff
It make sense.
Comment From: jhoeller
SimpleKey wasn't really designed for cluster use, but given that it does declare Serializable, we should indeed be defensive about hashcode assumptions in case of deserialization on a different machine. Omitting the hashcode from the serialized representation and recalcuting it for every deserialized instance seems sensible, reducing the size of the serialized representation as well.
That said, I've addressed the change somewhat differently: A custom readObject implementation allows us to re-calculate the hashcode immediately on deserialization, considering the field effectively final for the regular lifetime of a SimpleKey instance. I'll push that commit in a moment, hopefully addressing your concern so far. Thanks for the PR, in any case!
Note that we generally only support deserialization from the same version of the framework, except for very specific cases where we explicitly declare an older serialVersionUID. This means that SimpleKey deserialization in 5.2.3 will only work with the entire cluster being on 5.2.3.
Comment From: ZikFat
Thanks for the fix! I agree that your readObject version is better.