Generating unique random numbers using follow logic. But looks like its failing

Generating unique random numbers using follow logic. But looks like its failing - java

What's the possibility of below code generating same last 7 digits when called successively from one thread?
try {
Thread.sleep(1000);
} catch (InterruptedException e) {}
String temp = String.valueOf(System.currentTimeMillis());
return new BigInteger(temp).add(BigInteger.valueOf(new Long(activityIdCounter.incrementAndGet())));
activityIdCounter is AtomicInteger and above snippet is static synchronized.
I am asking this because I have found intermittently my test cases have failed due to existing id being reused. And this is a part of getting the unique ids.
Right now, I have changed to System.nanoTime() but I am not sure how
duplicate numbers may get generated, and
currently generated id matches a 6 days old id already present in DB.
Let me explain briefly how this generateId() method is called.
Tests classes call it twice sequentially for each call to DB for generating two different ids and both ids are persisted. Any other call or test method, will also do same. But for the operation to succeed in DB, this ids will be searched if they exist already in DB as ...where id1=:id1 and id2=:id2.
Now here is my TestNG configuration:
<test verbose="2" name="FullTestSuite" annotations="JDK" preserve-order="true" parallel="classes" thread-count="10">
Edit: id creation logic
id1 = new BigInteger(<12-13 digits - constant>).add(<generated id>).toString();
id2 = "someString:" + id1 + ":" + generatedId; //call again for those 7 digits

If you only use 7 digits you have 10 000 000 possible different values. Assuming you picked numbers completely random over that range, then after picking 3723 numbers, the odds of generating a duplicate are greater than 50%. And after 6785 picks, odds for generating a duplicate are over 90%.
Of course you don't pick them randomly, you try to make them increase. But if you only pick the last seven digits, then purely time based, you have an overflow after 10 000 000 milliseconds, or once every 2 hours, 46 minutes, and 40 seconds. And since you add an ever increasing number to the millis, that period will get shorter and shorter.
Also note that since you use System.currentTimeMillis(), you are exposed to changes in the system clock by other processes, most notably processes that do NTP sync. This means that two successive calls to System.currentTimeMillis() may see a lower value on the second call.
Conclusion : to safely generate unique ids, you'll have to use way more digits. In fact don't reinvent the wheel, use UUID.randomUUID().

Related

How to generate fixed length random number without conflict?

I'm working on an application where I've to generate code like Google classroom. When a user creates a class I generate code using following functions
private String codeGenerator(){
StringBuilder stringBuilder=new StringBuilder();
String chars="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
int characterLength=chars.length();
for(int i=0;i<5;i++){
stringBuilder.append(chars.charAt((int)Math.floor(Math.random()*characterLength)));
}
return stringBuilder.toString();
}
As I have 62 different characters. I can generate total 5^62 code total which is quite large. I can generate this code in server or user device. So my question is which one is better approach? How likely a generated code will conflict with another code?

From a comment, it seems that you are generating group codes for your own application.
For the purposes and scale of your app, 5-character codes may be appropriate. But there are several points you should know:
Random number generators are not designed to generate unique numbers. You can generate a random code as you're doing now, but you should check that code for uniqueness (e.g., check it against a table that stores group codes already generated) before you treat that code as unique.
If users are expected to type in a group code, you should include a way to check whether a group code is valid, to avoid users accidentally joining a different group than intended. This is often done by adding a so-called "checksum digit" to the end of the group code. See also this answer.
It seems that you're trying to generate codes that should be hard to guess. In that case, Math.random() is far from suitable (as is java.util.Random) — especially because the group codes are so short. Use a secure random generator instead, such as java.security.SecureRandom (fortunately for you, its security issues were addressed in Android 4.4, which, as I can tell from a comment of yours, is the minimum Android version your application supports; see also this question). Also, if possible, make group codes longer, such as 8 or 12 characters long.
For more information, see Unique Random Identifiers.
Also, there is another concern. There is a serious security issue if the 5-character group code is the only thing that grants access to that group. Ideally, there should be other forms of authorization, such as allowing only logged-in users or certain logged-in users—
to access the group via that group code, or
to accept invitations to join the group via that group code (e.g., in Google Classroom, the PERMISSION_DENIED error code can be raised when a user tries to accept an invitation to join a class).

The only way to avoid duplicates in your scheme is to keep a copy of the ones that you have already generated, and avoid "generating" anything that would result in a duplicate. Since 5^62 is a lot, you could simply store them on a table if using a database; or on a hashset if everything is in-memory and there is only one instance of the application (remember to save the list of generated IDs to disk every time you create a new one, and to re-read it at startup).
The chances of a collision are low: you would need to generate around 5^(62/2) = 5^31 ~= 4.6E21 really-random identifiers for a collision to be more likely than not (see birthday paradox) - and it would take a lot of space to store and check all those identifiers for duplicates to detect that this was the case. But such is the price of security.

Que: A sack contains a blue ball and a red ball. I draw one ball from the sack. What are the chances it is a red ball?
Ans: 1/2
Que: I have a collection of 5^62 unique codes. I choose one code from the collection. What are the chances that it is "ABCDE"?
Ans: 1/(5^62)
NOTE: Random number generators are not actually random.

Well, in case you need a unique generator, what about the following. This is definitely not a random, but it's definitely unique for one instance.
public final class UniqueCodeGenerator implements Supplier<String> {
private int code;
#Override
public synchronized String get() {
return String.format("%05d", code++);
}
public static void main(String... args) {
Supplier<String> generator = new UniqueCodeGenerator();
for (int i = 0; i < 10; i++)
System.out.println(generator.get());
}
}

Java8 - Mapping with Streams without Collecting for Performance

Exponentially Growing Stream
I have a Stream that grows exponentially for creating permutations. So each call to addWeeks increases the number of elements in the Stream.
Stream<SeasonBuilder> sbStream = sbSet.stream();
for (int i = 1; i <= someCutOff; i++) {
sbStream = sbStream.map(sb -> sb.addWeeks(possibleWeeks))
.flatMap(Collection::stream);
}
// Collect SeasonBuilders into a Set
return sbStream.collect(Collectors.toSet()); // size > 750 000
Problems
Each call to addWeeks returns a Set<SeasonBuilder> and collecting everything into a Set takes a while.
addWeeks is not static and needs to be called on each SeasonBuilder in the stream, each time through the loop
public Set<SeasonBuilder> addWeeks(
final Set<Set<ImmutablePair<Integer, Integer>>> possibleWeeks) {
return possibleWeeks.stream()
.filter(containsMatchup()) // Finds the weeks to add
.map(this::addWeek) // Create new SeasonBuilders with the new week
.collect(Collectors.toSet());
Out of memory error..... when possible weeks has size = 15
Questions
Should I be using a method chain other than map followed by flatmap?
How can I modify addWeeks so that I don't have to collect everything into a Set?
Should I return a Stream<SeasonBuilder>? Can I flatmap a Stream?
Update:
Thanks for the help everyone!
I have put the code for the methods in a gist
Thanks to #Holger and #lexicore for suggesting returning a Stream<SeasonBuilder> in addWeeks. Minor performance increase, as was predicted by #lexicore
I tried using parallelStream() and there was no significant change in performance
Context
I am creating all possible permutations of a Fantasy Football season, which will be used elsewhere for stats analysis. In a 4-team, 14-week season, for any given week, there could be three different possibilities
(1 vs 2) , (3 vs 4)
(1 vs 3) , (2 vs 4)
(1 vs 4) , (2 vs 3)
To solve the problem, plug in the permutations, and we have all our possible seasons. Done! But wait... what if Team 1 only ever plays Team 2. Then the other teams would be sad. So there are some constraints on the permutations.
Every team must play each other roughly the same amount of times (i.e. Team 1 cannot play against Team 3 ten times in a single season). In this example - 4-teams, 14 weeks - each team is capped at playing another team 5 times. So some sort of filtering has to happen when creating permutations, to avoid non-valid seasons.
Where this gets more interesting is:
6 Team League -- 15 possible weeks
8 Team League -- 105 possible weeks
10 Team League -- 945 possible weeks
I am trying to optimize performance where possible, because there are a lot of permutations to create. A 4-team, 14-week season creates 756 756 (=14!/(5!5!4!)) possible seasons, given the constraints. 6-team or 8-team seasons just get crazier.

Your whole construction is very suspicious to begin with. If you're interested in performance it is unlikely that generating all permutations explicitly is a good approach.
I also don't believe that collecting to set and streaming again is the performance problem.
But nevertheless, to answer your question: why don't you return Stream<SeasonBuilder> from addWeeks directly, why do you collect it to set first? Return the stream directy, without collecting:
public Stream<SeasonBuilder> addWeeks(
final Set<Set<ImmutablePair<Integer, Integer>>> possibleWeeks) {
return possibleWeeks.stream()
.filter(containsMatchup()) // Finds the weeks to add
.map(this::addWeek); // Create new SeasonBuilders with the new week
}
You won't need map/flatMap then, just one flatMap:
sbStream = sbStream.flatMap(sb -> sb.addWeeks(possibleWeeks));
But this won't help your performance much anyway.

Generate ID fast and with high probability of uniqueness

I want to generate ID to event that occur in my application.
The event frequency is up to the user load, so it might occur hundreds-thousand of time per second.
I can't afford using UUID.randomUUID() because it might be problematic in performance matters - Take a look at this.
I thought of generating ID as follows:
System.currentTimeMillis() + ";" + Long.toString(_random.nextLong())
When _random is a static java.util.Random my class is holding.
My questions are:
Do you think the distribution of this combination will be good enough to my needs?
Does Java's Random implementation related to the current time and therefore the fact I'm combining the two is dangerous?

I would use the following.
final AtomicLong counter = new AtomicLong(System.currentTimeMillis() * 1000);
and
long l = counter.getAndIncrement(); // takes less than 10 nano-seconds most of the time.
This will be unique within your system and across restarts provided you average less than one million per second.
Even at this rate, the number will not overflow for some time.
class Main {
public static void main(String[] args) {
System.out.println(new java.util.Date(Long.MAX_VALUE/1000));
}
}
prints
Sun Jan 10 04:00:54 GMT 294247
EDIT: In the last 8 years I have switched to using nanosecond wall clock and memory-mapped files to ensure uniqueness across processes on the same machine. The code is available here. https://github.com/OpenHFT/Chronicle-Bytes/blob/ea/src/main/java/net/openhft/chronicle/bytes/MappedUniqueTimeProvider.java

To prevent possible collisions I would suggest you to somehow integrate users' unique ids into the generated id. You can do this either adding user id to directly to the generated id
System.currentTimeMillis() + ";" + Long.toString(_random.nextLong()) + userId
or you can use separate _random for each user that uses the user's id as its seed.

UUID uuid = UUID.randomUUID(); is less than 8 times slower, after warming up, 0.015 ms versus 0.0021 ms on my PC. That would be a positive argument for UUID - for me.
One could shift the random long a bit to the right, so time is more normative, sequencing.
No, there is a pseudo random distribution involved.

I can't afford using UUID.randomUUID() because it might be problematic
And it might not. Currently, you're solving a problem that might not exist. I suggest to use an interface so you can easily swap out the generated ID but stick to this generator on which many smart people have spent a lot of time to make it right.
Your own solution might work in many cases but the corner cases are important and you will only see those after a few years of experience.
That said, combining the current time + Random should give pretty unique IDs. But they are easy to guess and insecure.

I would use a library to avoid reinventing the wheel.
For example, JUG (https://github.com/cowtowncoder/java-uuid-generator) can generate 5 millions time-based UUIDs per second (https://github.com/cowtowncoder/java-uuid-generator/blob/master/release-notes/FAQ) :
<dependency>
<groupId>com.fasterxml.uuid</groupId>
<artifactId>java-uuid-generator</artifactId>
<version>4.0.1</version>
</dependency>
UUID uuid = Generators.timeBasedGenerator().generate();

How to generate incremental identifier in java

I have requirement in which I continuously receive messages that needs to be written in a file. Every time a new message is received it needs to be written in a separate file. What I want is to generate an unique identifier to be used as a file-name. I also want to preserve the order of the messages as well. By this I mean, the identifier generated as a file-name should always be incremental.
I was using UUID.randomUUID() to generate file-names but the problem with this approach is that UUID only assures randomness of the identifier but is not incremental. As a result I am losing the ordering of the file (I want file generated first should appear first in the list).
Approaches known
Can use System.currentTimeMillis() but I can receive multiple messages at same time stamp.
2.Another approach could be to implement static long value and increment it whenever a file is to be created and use the long value as a file-name. But I am not sure about this approach. Also it doesn't seem to be a proper solution to my problem. I think there could be far better solutions than this one.
If someone could suggest me a better solution to this problem, will be highly appreciated.

If you want your id value to uniformly rise even between server restarts, then you must either base it on the system time or have some elaborately robust logic that persists the last ID used. Note that achieving robustness on its own is not hard, but achieving it in a performant and scalable way is.
If you additionally need the id to be unique across multiple nodes in a redundant server cluster, then you need even more elaborate logic, which definitely involves a persistent store to which all the boxes synchronize access. Making this performant is, of course, even harder.
The best option I can see is to have a quite long ID so there's room for these parts:
System.currentTimeMillis for long-term uniqueness (across restarts);
System.nanotime for finer granularity;
a unique id of each server node (determined in a platform-specific way).
The method will still have to remember the last value generated and retry in case of a duplicate. It won't have to retry too many times, though, just until the next nanoTime clock tick—it could even busy-wait for it.
Sketch of code without point 3 (single-node implementation):
private static long lastNanos;
public static synchronized String uniqueId() {
for (;/*ever*/;) {
final long n = System.nanoTime();
if (n == lastNanos) continue;
lastNanos = n;
return "" + System.currentTimeMillis() + n;
}
}

Ok, my hands up. My last answer was fairly flaky and I've deleted it.
Keeping with the spirit of the site, I thought I'd try a different tac.
If you say you are keeping these messages in a single file then you could try something like creating an unique Id out of the size of the file?
Before you write the message to the file it's id could be the current size of the file.
You could add the filename + size as the id if these messages need to be unique across a number of files.
I'll leave the hot potato of synchronization to another day. But you could wrap all of this up in a syncronized object that keeps track of things.
Also, I am assuming that any messages written to the file will not be removed in the future.
ADDITIONAL NOTE:
You could create an message processing object that opens the file on construction (or via a create method).
This object will get the initial size of the file and this will be used as the unique id.
As each message is added (in a synchronized manner), the id is incremented by the size of the message.
This would address the performance issues. Will not work if more than one JVM/Node accesses the same file.
Skeletal Idea:
public class MessageSink {
private long id = 0;
public MessageSink(String filename) {
id = ... get file size ..
}
public synchronized addMessage(Message msg) {
msg.setId(id);
.. write to file + flush ..
.. or add to stack of messages that need to be written to file
.. at a later stage.
id = id + msg.getSize();
}
public void flushMessages() {
.. open file
.. for each message in stack write ...
.. flush and close file
}
}

I had the same requirement and found a suitable solution. Twitter Snowflake uses a simple algorithm to generate sortable 64bit (long) ids. Snowflake is written on Scala but the approach is simple and could be easily used in a Java code.
id is composed of:
timestamp - 41 bits (millisecond precision w/ a custom epoch gives us 69 years);
machine id - 10 bits (MAC address could be used as a hardware id);
sequence number - 12 bits - rolls over every 4096 per machine (with protection to avoid rollover in the same ms)
Formula looks like: ((timestamp - customEpoch) << timestampShift) | (machineId << machineIdShift) | sequenceNumber;
Shift for each component depends on it's bits position in ID.
Detailed description and source code could be found at github:
Twitter Snowflake
Basic Java implementation of the Snowflake algorithm

Generating a globally unique identifier in Java

Summary: I'm developing a persistent Java web application, and I need to make sure that all resources I persist have globally unique identifiers to prevent duplicates.
The Fine Print:
I'm not using an RDBMS, so I don't have any fancy sequence generators (such as the one provided by Oracle)
I'd like it to be fast, preferably all in memory - I'd rather not have to open up a file and increment some value
It needs to be thread safe (I'm anticipating that only one JVM at a time will need to generate IDs)
There needs to be consistency across instantiations of the JVM. If the server shuts down and starts up, the ID generator shouldn't re-generate the same IDs it generated in previous instantiations (or at least the chance has to be really, really slim - I anticipate many millions of presisted resources)
I have seen the examples in the EJB unique ID pattern article. They won't work for me (I'd rather not rely solely on System.currentTimeMillis() because we'll be persisting multiple resources per millisecond).
I have looked at the answers proposed in this question. My concern about them is, what is the chance that I will get a duplicate ID over time? I'm intrigued by the suggestion to use java.util.UUID for a UUID, but again, the chances of a duplicate need to be infinitesimally small.
I'm using JDK6

Pretty sure UUIDs are "good enough". There are 340,282,366,920,938,463,463,374,607,431,770,000,000 UUIDs available.
http://www.wilybeagle.com/guid_store/guid_explain.htm
"To put these numbers into perspective, one's annual risk of being hit by a meteorite is estimated to be one chance in 17 billion, that means the probability is about 0.00000000006 (6 × 10−11), equivalent to the odds of creating a few tens of trillions of UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs"
http://en.wikipedia.org/wiki/Universally_Unique_Identifier

public class UniqueID {
private static long startTime = System.currentTimeMillis();
private static long id;
public static synchronized String getUniqueID() {
return "id." + startTime + "." + id++;
}
}

If it needs to be unique per PC: you could probably use (System.currentTimeMillis() << 4) | (staticCounter++ & 15) or something like that.
That would allow you to generate 16 per ms. If you need more, shift by 5 and and it with 31...
if it needs to be unique across multiple PCs, you should also combine in your primary network card's MAC address.
edit: to clarify
private static int staticCounter=0;
private final int nBits=4;
public long getUnique() {
return (currentTimeMillis() << nBits) | (staticCounter++ & 2^nBits-1);
}
and change nBits to the square root of the largest number you should need to generate per ms.
It will eventually roll over. Probably 20 years or something with nBits at 4.

From memory the RMI remote packages contain a UUID generator. I don't know whether thats worth looking into.
When I've had to generate them I typically use a MD5 hashsum of the current date time, the user name and the IP address of the computer. Basically the idea is to take everything that you can find out about the computer/person and then generate a MD5 hash of this information.
It works really well and is incredibly fast (once you've initialised the MessageDigest for the first time).

why not do like this
String id = Long.toString(System.currentTimeMillis()) +
(new Random()).nextInt(1000) +
(new Random()).nextInt(1000);

if you want to use a shorter and faster implementation that java UUID take a look at:
https://code.google.com/p/spf4j/source/browse/trunk/spf4j-core/src/main/java/org/spf4j/concurrent/UIDGenerator.java
see the implementation choices and limitations in the javadoc.
here is a unit test on how to use:
https://code.google.com/p/spf4j/source/browse/trunk/spf4j-core/src/test/java/org/spf4j/concurrent/UIDGeneratorTest.java

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.