I have a small miss-understanding about SimpleIdGenerator
, that is a fairly trivial class:
public class SimpleIdGenerator implements IdGenerator {
private final AtomicLong mostSigBits = new AtomicLong(0);
private final AtomicLong leastSigBits = new AtomicLong(0);
@Override
public UUID generateId() {
long leastSigBits = this.leastSigBits.incrementAndGet();
if (leastSigBits == 0) {
this.mostSigBits.incrementAndGet();
}
return new UUID(this.mostSigBits.get(), leastSigBits);
}
}
The presence of AtomicLong
hints into the fact that this is a thread-safe continuous incremented UUID, but it's not the case:
long leastSigBits = this.leastSigBits.incrementAndGet();
if (leastSigBits == 0) {
For the sake of the discussion let's suppose that currently leastSigBits
holds a -1
(it has been incremented quite a lot, yes).
ThreadA
does long leastSigBits = this.leastSigBits.incrementAndGet();
, so it puts the value into 0
(-1 + 1 = 0
); but before it does the check if (leastSigBits == 0)
, ThreadB
did long leastSigBits = this.leastSigBits.incrementAndGet();
too, now on a value that is 0
, so it put the value in 1
. ThreadA
does the check and sees a value of 1
, that if
statement is not entered and a such a duplicate UUID
.
This is very far stretched and I have doubts it has ever impacted any users as for this to happen they would need to generate all the long
range of IDs, which is highly highly improbable. Still, this code is wrong.
If this is suppose to provide thread-safe variant :
- document it as such
- fix the code
if this isn't supposed to be thread safe, simply dropping the un-necessary AtomicLong
(with it's volatile
overhead) is going to be a big performance gain.
Either way, I would be more than glad to fix this, if someone tells me the path I should be taking. Thank you.
Comment From: abhishek-abhi
Can I fix this .
Comment From: rstoyanchev
ThreadA does the check and sees a value of 1
Why would that be? The value of incrementAndGet is saved in a local variable.
Comment From: wind57
@rstoyanchev I should have had my coffee before posting this. you're right and I am wrong.
Either way there is a race here.
ThreadA
does this:
long leastSigBits = this.leastSigBits.incrementAndGet();
if (leastSigBits == 0) {
this.mostSigBits.incrementAndGet();
}
it sets long leastSigBits
as 0
, enters the if
statement, but does not do this.mostSigBits.incrementAndGet();
just yet. ThreadB
acts at the same time; it sets its local long leastSigBits
to 1
, does not enter the if statement
and does return new UUID(this.mostSigBits.get(), leastSigBits);
. this.mostSigBits.get() == 0
and leastSigBits == 1
; as such it returns an UUID that was already returned.
Comment From: quaff
ThreadA does the check and sees a value of 1
Why would that be? The value of incrementAndGet is saved in a local variable.
The result of this.mostSigBits.incrementAndGet()
should be saved in a local variable too and instead of this.mostSigBits.get()
later.
Comment From: quaff
@rstoyanchev I should have had my coffee before posting this. you're right and I am wrong.
Either way there is a race here.
ThreadA
does this:
long leastSigBits = this.leastSigBits.incrementAndGet(); if (leastSigBits == 0) { this.mostSigBits.incrementAndGet(); }
it sets
long leastSigBits
as0
, enters theif
statement, but does not dothis.mostSigBits.incrementAndGet();
just yet.ThreadB
acts at the same time; it sets its locallong leastSigBits
to1
, does not enter theif statement
and doesreturn new UUID(this.mostSigBits.get(), leastSigBits);
.this.mostSigBits.get() == 0
andleastSigBits == 1
; as such it returns an UUID that was already returned.
@wind57 I agree.
Comment From: b4nnertail
@rstoyanchev I should have had my coffee before posting this. you're right and I am wrong. Either way there is a race here.
ThreadA
does this:long leastSigBits = this.leastSigBits.incrementAndGet(); if (leastSigBits == 0) { this.mostSigBits.incrementAndGet(); }
it sets
long leastSigBits
as0
, enters theif
statement, but does not dothis.mostSigBits.incrementAndGet();
just yet.ThreadB
acts at the same time; it sets its locallong leastSigBits
to1
, does not enter theif statement
and doesreturn new UUID(this.mostSigBits.get(), leastSigBits);
.this.mostSigBits.get() == 0
andleastSigBits == 1
; as such it returns an UUID that was already returned.@wind57 I agree.
I prepared a unit test to show the issue:
class SimpleIdGeneratorTest {
private final AtomicLong mostSigBits = new AtomicLong(0);
private final AtomicLong leastSigBits = new AtomicLong(-1);
Set<UUID> ids = new HashSet<>();
@Test
void generateIdTest() throws InterruptedException {
ids.add(new UUID(0, 1));
ExecutorService executorService = Executors.newFixedThreadPool(2);
executorService.execute(() -> ids.add(generateId()));
executorService.execute(() -> ids.add(generateId()));
Thread.sleep(200);
assertTrue(ids.contains(new UUID(0, 1)));
assertTrue(ids.contains(new UUID(1, 0)));
// This UUID is missing, instead UUID(mostSigBits = 0, leastSigBits = 1) got created twice
assertFalse(ids.contains(new UUID(1, 1)));
}
public UUID generateId() {
long leastSigBits = this.leastSigBits.incrementAndGet();
if (leastSigBits == 0) {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
e.printStackTrace();
}
this.mostSigBits.incrementAndGet();
}
return new UUID(this.mostSigBits.get(), leastSigBits);
}
}
I guess, to be truly thread safe, we would need to introduce a synchronized block as shown below. Please correct me if I'm wrong.
class SimpleIdGeneratorTest {
private final AtomicLong mostSigBits = new AtomicLong(0);
private final AtomicLong leastSigBits = new AtomicLong(-1);
Set<UUID> ids = new HashSet<>();
@Test
void generateIdTest() throws InterruptedException {
ids.add(new UUID(0, 1));
ExecutorService executorService = Executors.newFixedThreadPool(2);
executorService.execute(() -> ids.add(generateId()));
executorService.execute(() -> ids.add(generateId()));
Thread.sleep(200);
assertTrue(ids.contains(new UUID(0, 1)));
assertTrue(ids.contains(new UUID(1, 0)));
assertTrue(ids.contains(new UUID(1, 1)));
}
public UUID generateId() {
long leastSigBits = this.leastSigBits.incrementAndGet();
synchronized (this) {
if (leastSigBits == 0) {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
e.printStackTrace();
}
this.mostSigBits.incrementAndGet();
}
return new UUID(this.mostSigBits.get(), leastSigBits);
}
}
}
Comment From: wind57
@puhlerblet usually a unit test proves very little in case of thread safety, though, I admit, in this case it is fairly trivial to prove the point. Otherwise, you could also look into this tool, that I usually use. It does instruction re-ordering that can prove things, see more here for example.
Also both synchronized
and AtomicLong
? Under contention, an intrinsic lock is "inflated" and when a GC needs to happen - it needs to deflate that, which is a process that takes time (and easily provable with JMH
). I would suggest a ReentrantLock
, instead.
But the problem is a bit more subtle, it seems to me. Keep in mind that for this problem to be observed, we need to go through the entire range of positive and negative values of a long. Let's take the easy case:
Long.MAX_VALUE == 9223372036854775807
So there are 9223372036854775807
positive possibilities in a long. We now look of how big is an UUID
:
java.util.UUID object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 05 00 00 00 (00000101 00000000 00000000 00000000) (5)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) 2c 19 17 00 (00101100 00011001 00010111 00000000) (1513772)
12 4 (alignment/padding gap)
16 8 long UUID.mostSigBits 0
24 8 long UUID.leastSigBits 0
Instance size: 32 bytes
Space losses: 4 bytes internal + 0 bytes external = 4 bytes total
take 32 bytes
multiply by 9223372036854775807
and you will need that much space, just for the UUID
, just for positive part of the least significant bits, if you want to cover the entire range.
And if I now look at the documentation of the class itself:
A simple IdGenerator that starts at 1 and increments by 1 with each call.
I could say that there is no need for two AtomicLong
as, realistically, no one will ever hit so many IDs.
Comment From: b4nnertail
@wind57 since i am quite inexperienced in dealing with concurrency I really appreciate your input. Your assumption, that no one will ever hit so many IDs sounds reasonable.
Comment From: wind57
@rstoyanchev what do you think about this one?
Comment From: rstoyanchev
@wind57 indeed every time leastSigBits
rolls over, some id's may be based on the current rather than on the next mostSigBits
value which in practice means that some id's may be issued more than once, but that would only happen once in a very long while and the duplicates would have been last issued a very long time ago
I think it's simply not worth optimizing and we can just update the docs to say that uniqueness is not fully guaranteed and that on occasion, after every Long.MAX_VALUE
number of id's are issued, there may be repeated ones for a short period. If that's an issue for some reason then don't use this generator.
Thoughts? /cc @garyrussell @artembilan
Comment From: rstoyanchev
This is by the way one alternative I thought of but again I'm not sure it's necessary:
public class SimpleIdGenerator implements IdGenerator {
private final AtomicLong mostSigBits = new AtomicLong(0);
private final AtomicLong leastSigBits = new AtomicLong(0);
private final AtomicLong lastMostSigBits = new AtomicLong(-1);
@Override
public UUID generateId() {
long leastSigBits = this.leastSigBits.incrementAndGet();
if (leastSigBits == 0) {
this.mostSigBits.incrementAndGet();
}
else if (leastSigBits == Long.MAX_VALUE - 10000) {
this.lastMostSigBits.incrementAndGet();
}
else if (leastSigBits < 10000) {
while (true) {
if (!this.mostSigBits.equals(this.lastMostSigBits)) {
break;
}
}
}
return new UUID(this.mostSigBits.get(), leastSigBits);
}
}
Comment From: garyrussell
Bear in mind that this generator is pretty much useless in any environment that persists the ID in a DB and/or a multi-instance application.
Not only does the problem (potentially) occur after Long.MAX_VALUE
allocations, but those allocations must have occurred in the same JVM.
This generator was provided long ago for a very specific use case where the IDs were not persisted and UUID.randomUUID()
was found to be too expensive for the ID allocation rate.
So, I would say documentation should be enough.
If performance is really an issue, consider using the AlternativeJdkIdGenerator
instead.
Comment From: rstoyanchev
Thanks @garyrussell those will be good clarifications. For same JVM, not persisted, and reasonably close to unique. If those trade-offs are okay you'll probably get better performance than AlternativeJDkIdGenerator
.
Comment From: wind57
-
I agree on the part about documentation.
-
I disagree on the implementation prototype, though. Why not simply use a single
AtomicLong
and specify in the documentation that it supports unique values up toLong.MAX_VALUE
and then it will simply overflow?
Comment From: rstoyanchev
and specify in the documentation that it supports unique values up to Long.MAX_VALUE and then it will simply overflow
That's an option I guess, make it more official that after Long.MAX_VALUE it won't be fully unique. If that is an issue to begin with then it's the wrong generator to use one way or another. @garyrussell you agree with that?
Comment From: garyrussell
Yes, I agree; users are always able to define their own implementation if they are not happy with this one.