Affects: 6.0.11 (current HEAD) and probably earlier versions
The ConcurrentReferenceHashMap
used by Spring has a race condition which can cause it to incorrectly return null for a get operation where they key is in the map.
In (inner class) Segment.getReference(...)
, line 500 (method starts at line 493), we see:
// Use a local copy to protect against other threads writing
Reference<K, V>[] references = this.references;
int index = getIndex(hash, references);
Reference<K, V> head = references[index];
return findInChain(head, key, hash);
(This is called from ConcurrentWeakHashMap.getEntry(...)
, itself called by various methods including get(...)). At this point no lock is held. The comment is the clue: it takes a local reference to the existing references
array (this.references
) in order to protect against issues caused by other threads re-assigning this field.
Taking a local reference however doesn't protect against other threads writing to the same array. That is exactly what can happen, in certain (rare) circumstances, within the Segment.restructure(...)
method (line 605+ below, method starts at line 580):
// Either create a new table or reuse the existing one
Entry<K, V>[] restructured =
(resizing ? createEntryArray(restructureSize) : this.entries); // <=== same array
// Restructure
for (int i = 0; i < this.entries.length; i++) {
entry = this.entries[i];
if (!resizing) {
restructured[i] = null; // <=== write HERE!
Notice that the restructure uses the original array if it is not resizing (line marked with "same array" comment) and that it temporarily assigns each entry in turn with null ("write HERE!"). The entry is restored later on in the loop, but this moment when it is null is the problem, since it may cause Segment.getReference(...)
(first method posted above), if being executed concurrently in another thread, to retrieve the null value when looking up the entry:
Reference<K, V> head = references[index]; // <=== write affects this read!
This causes it to return null, as if the map does not contain an entry with the given key, when it fact it may do so.
In practice the issue may be very difficult to reproduce and I'm sure it is uncommon "in the wild" but I was able to reproduce the issue by inserting some sleep calls at key points within the ConcurrentReferenceHashMap code and executing operations in parallel from two different threads, while also ensuring that a key in the map became garbage collected before the call to the 2nd operation so that it would perform a restructure
after the first (get) operation had taken its reference to this.references
but before it retrieved the entry. This was based on an already slightly modified version of the class which we are using internally but I have inspected the version in Spring carefully and I'm fairly sure the same issue is present.
Comment From: bclozel
Sorry this got overlooked @davmac314
I agree with your assessment. There are other possible cases of race conditions, for example o.s.util.ConcurrentReferenceHashMap.Segment#doTask
also updates array indices when adding entries. In that case getReference
could return null, while the entry is being added.
I think this is unfortunate but in line with the general approach here: references come and go depending on restructure operations, memory being GC'd, etc. I've checked here that no NullPointerException
should be thrown in this case and this should only return null
when the entry is there. At worst this makes the cache less efficient. The alternative would be to add a guard and lock the read operation but this comes at a significant performance cost.
I'm closing this issue a result, let me know if you can think of a way to solve this particular issue without breaking the runtime model for this class.
Thanks for your contribution!
Comment From: davmac314
Hi @bclozel ,
Having reported the issue I feel I've done my civic duty and if you choose to close the issue that is in your hands. I would suggest that at least the documentation (javadoc for the class) should be updated to reflect this known limitation, lest the class be used by others (either withing the Spring project our outside it) with the expectation that values that are definitely in the map will be able to looked up successfully, but that is up to you.
We fixed this in our modified version by, in the case of a restructure using the same array, simply modifying the linked list of entries in each array element "in place" rather than nulling the list first. There is no need to lock for the read operation. Unfortunately I cannot share the code which was written as part of my employment.
I looked at the doTask
method which you suggested also had a race, but I do not see the problem there. Unless you mean that another thread could look up an item that is being added "right now" and not see it (null being returned); I don't think this qualifies as a problem race since there is always a window where the object "is currently being added" rather than "has successfully been added". In comparison, the bug I pointed out means that a single thread can add a k,v
pair, look up k
and get null as a result (due to interaction in another thread not involving k
or v
), then look up k
a second time but on this occasion get v
as a result.
I understand that for a cache this may not matter (other than being a minor performance issue) but once again I would strongly suggest that the class be documented as only being suitable for use as a cache if the issue isn't going to be fixed.
Comment From: bclozel
Thanks for your feedback @davmac314
This is an interesting tradeoff for the project:
- we can attempt to fix a problem that nobody experienced in production and possible cause regressions (functional or performance). A PR would have helped since you have experience running with the fix in production, but apparently you're not allowed to contribute here.
- Leave things as they are entirely, as documenting this very specific case is not really actionable for most people and we don't intend such classes to be used outside of the main use cases in Spring Framework (we don't intend to be a general caching library)
I can give it a shot still, would you be opened to reviewing my changes?
Comment From: davmac314
Hi @bclozel, I would certainly be happy to review any changes relevant to this issue. It is unfortunate (and frustrating for me) that I can't simply give you the code that has already been written. Fortunately the changes required are small.
Comment From: bclozel
Hey @davmac314 - what do you think about this change: https://github.com/bclozel/spring-framework/tree/gh-31008
Thanks for your input!
Comment From: davmac314
@bclozel looks good to me!