I have requirement in which I continuously receive messages that needs to be written in a file. Every time a new message is received it needs to be written in a separate file. What I want is to generate an unique identifier to be used as a file-name. I also want to preserve the order of the messages as well. By this I mean, the identifier generated as a file-name should always be incremental.
I was using UUID.randomUUID() to generate file-names but the problem with this approach is that UUID only assures randomness of the identifier but is not incremental. As a result I am losing the ordering of the file (I want file generated first should appear first in the list).
Approaches known
Can use System.currentTimeMillis() but I can receive multiple messages at same time stamp.
2.Another approach could be to implement static long value and increment it whenever a file is to be created and use the long value as a file-name. But I am not sure about this approach. Also it doesn't seem to be a proper solution to my problem. I think there could be far better solutions than this one.
If someone could suggest me a better solution to this problem, will be highly appreciated.
If you want your id value to uniformly rise even between server restarts, then you must either base it on the system time or have some elaborately robust logic that persists the last ID used. Note that achieving robustness on its own is not hard, but achieving it in a performant and scalable way is.
If you additionally need the id to be unique across multiple nodes in a redundant server cluster, then you need even more elaborate logic, which definitely involves a persistent store to which all the boxes synchronize access. Making this performant is, of course, even harder.
The best option I can see is to have a quite long ID so there's room for these parts:
System.currentTimeMillis for long-term uniqueness (across restarts);
System.nanotime for finer granularity;
a unique id of each server node (determined in a platform-specific way).
The method will still have to remember the last value generated and retry in case of a duplicate. It won't have to retry too many times, though, just until the next nanoTime clock tick—it could even busy-wait for it.
Sketch of code without point 3 (single-node implementation):
private static long lastNanos;
public static synchronized String uniqueId() {
for (;/*ever*/;) {
final long n = System.nanoTime();
if (n == lastNanos) continue;
lastNanos = n;
return "" + System.currentTimeMillis() + n;
}
}
Ok, my hands up. My last answer was fairly flaky and I've deleted it.
Keeping with the spirit of the site, I thought I'd try a different tac.
If you say you are keeping these messages in a single file then you could try something like creating an unique Id out of the size of the file?
Before you write the message to the file it's id could be the current size of the file.
You could add the filename + size as the id if these messages need to be unique across a number of files.
I'll leave the hot potato of synchronization to another day. But you could wrap all of this up in a syncronized object that keeps track of things.
Also, I am assuming that any messages written to the file will not be removed in the future.
ADDITIONAL NOTE:
You could create an message processing object that opens the file on construction (or via a create method).
This object will get the initial size of the file and this will be used as the unique id.
As each message is added (in a synchronized manner), the id is incremented by the size of the message.
This would address the performance issues. Will not work if more than one JVM/Node accesses the same file.
Skeletal Idea:
public class MessageSink {
private long id = 0;
public MessageSink(String filename) {
id = ... get file size ..
}
public synchronized addMessage(Message msg) {
msg.setId(id);
.. write to file + flush ..
.. or add to stack of messages that need to be written to file
.. at a later stage.
id = id + msg.getSize();
}
public void flushMessages() {
.. open file
.. for each message in stack write ...
.. flush and close file
}
}
I had the same requirement and found a suitable solution. Twitter Snowflake uses a simple algorithm to generate sortable 64bit (long) ids. Snowflake is written on Scala but the approach is simple and could be easily used in a Java code.
id is composed of:
timestamp - 41 bits (millisecond precision w/ a custom epoch gives us 69 years);
machine id - 10 bits (MAC address could be used as a hardware id);
sequence number - 12 bits - rolls over every 4096 per machine (with protection to avoid rollover in the same ms)
Formula looks like: ((timestamp - customEpoch) << timestampShift) | (machineId << machineIdShift) | sequenceNumber;
Shift for each component depends on it's bits position in ID.
Detailed description and source code could be found at github:
Twitter Snowflake
Basic Java implementation of the Snowflake algorithm
Related
I'm working on an application where I've to generate code like Google classroom. When a user creates a class I generate code using following functions
private String codeGenerator(){
StringBuilder stringBuilder=new StringBuilder();
String chars="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
int characterLength=chars.length();
for(int i=0;i<5;i++){
stringBuilder.append(chars.charAt((int)Math.floor(Math.random()*characterLength)));
}
return stringBuilder.toString();
}
As I have 62 different characters. I can generate total 5^62 code total which is quite large. I can generate this code in server or user device. So my question is which one is better approach? How likely a generated code will conflict with another code?
From a comment, it seems that you are generating group codes for your own application.
For the purposes and scale of your app, 5-character codes may be appropriate. But there are several points you should know:
Random number generators are not designed to generate unique numbers. You can generate a random code as you're doing now, but you should check that code for uniqueness (e.g., check it against a table that stores group codes already generated) before you treat that code as unique.
If users are expected to type in a group code, you should include a way to check whether a group code is valid, to avoid users accidentally joining a different group than intended. This is often done by adding a so-called "checksum digit" to the end of the group code. See also this answer.
It seems that you're trying to generate codes that should be hard to guess. In that case, Math.random() is far from suitable (as is java.util.Random) — especially because the group codes are so short. Use a secure random generator instead, such as java.security.SecureRandom (fortunately for you, its security issues were addressed in Android 4.4, which, as I can tell from a comment of yours, is the minimum Android version your application supports; see also this question). Also, if possible, make group codes longer, such as 8 or 12 characters long.
For more information, see Unique Random Identifiers.
Also, there is another concern. There is a serious security issue if the 5-character group code is the only thing that grants access to that group. Ideally, there should be other forms of authorization, such as allowing only logged-in users or certain logged-in users—
to access the group via that group code, or
to accept invitations to join the group via that group code (e.g., in Google Classroom, the PERMISSION_DENIED error code can be raised when a user tries to accept an invitation to join a class).
The only way to avoid duplicates in your scheme is to keep a copy of the ones that you have already generated, and avoid "generating" anything that would result in a duplicate. Since 5^62 is a lot, you could simply store them on a table if using a database; or on a hashset if everything is in-memory and there is only one instance of the application (remember to save the list of generated IDs to disk every time you create a new one, and to re-read it at startup).
The chances of a collision are low: you would need to generate around 5^(62/2) = 5^31 ~= 4.6E21 really-random identifiers for a collision to be more likely than not (see birthday paradox) - and it would take a lot of space to store and check all those identifiers for duplicates to detect that this was the case. But such is the price of security.
Que: A sack contains a blue ball and a red ball. I draw one ball from the sack. What are the chances it is a red ball?
Ans: 1/2
Que: I have a collection of 5^62 unique codes. I choose one code from the collection. What are the chances that it is "ABCDE"?
Ans: 1/(5^62)
NOTE: Random number generators are not actually random.
Well, in case you need a unique generator, what about the following. This is definitely not a random, but it's definitely unique for one instance.
public final class UniqueCodeGenerator implements Supplier<String> {
private int code;
#Override
public synchronized String get() {
return String.format("%05d", code++);
}
public static void main(String... args) {
Supplier<String> generator = new UniqueCodeGenerator();
for (int i = 0; i < 10; i++)
System.out.println(generator.get());
}
}
In a scenario of re-solving a previously solved problem (with some new data, of course), it's typically impossible to re-assign a vehicle's very-first assignment once it was given. The driver is already on its way, and any new solution has to take into account that:
the job must remain his (can't be assigned to another vehicle)
the activity that's been assigned to him as the very-first, must remain so in future solutions
For the sake of simplicity, I'm using a single vehicle scenario, and only trying to impose the second bullet (i.e. ensure that a certain activity will be the first in the solution).
This is how I defined the constraint:
new HardActivityConstraint()
{
#Override
public ConstraintsStatus fulfilled(JobInsertionContext iFacts, TourActivity prevAct, TourActivity newAct, TourActivity nextAct,
double prevActDepTime)
{
String locationId = newAct.getLocation().getId();
// we want to make sure that any solution will have "C1" as its first activity
boolean activityShouldBeFirst = locationId.equals("C1");
boolean attemptingToInsertFirst = (prevAct instanceof Start);
if (activityShouldBeFirst && !attemptingToInsertFirst)
return ConstraintsStatus.NOT_FULFILLED_BREAK;
if (!activityShouldBeFirst && attemptingToInsertFirst)
return ConstraintsStatus.NOT_FULFILLED;
return ConstraintsStatus.FULFILLED;
}
}
This is how I build the algorithm:
VehicleRoutingAlgorithmBuilder vraBuilder;
vraBuilder = new VehicleRoutingAlgorithmBuilder(vrpProblem, "schrimpf.xml");
vraBuilder.addCoreConstraints();
vraBuilder.addDefaultCostCalculators();
StateManager stateManager = new StateManager(vrpProblem);
ConstraintManager constraintManager = new ConstraintManager(vrpProblem, stateManager);
constraintManager.addConstraint(new HardActivityConstraint() { ... }, Priority.HIGH);
vraBuilder.setStateAndConstraintManager(stateManager, constraintManager);
VehicleRoutingAlgorithm algorithm = vraBuilder.build();
The results are not good. I'm only getting solutions with a single job assigned (the one with the required activity). In debug it's clear that the job insertion iterations consider many viable options that appear to solve the problem entirely, but at the bottom line, the best solution returned by the algorithm doesn't include the other jobs.
UPDATE: even more surprising, is that when I use the constraint in scenarios with over 5 vehicles, it works fine (worst results are with 1 vehicle).
I'll gladly attach more information if needed.
Thanks
Zach
First, you can use initial routes to ensure that certain jobs need to be assigned to specific vehicles right from the beginning (see example).
Second, to ensure that no activity will be inserted between start and your initial job(location) (e.g. "C1" in your example), you need to prohibit it the way you defined your HardActConstraint, just modify it so that a newAct can never be between prevAct=Start and nextAct=act(C1).
Third, with regards to your update, just have in mind that the essence of the algorithm is to ruin part of the solution (remove a number of jobs) and recreate the solution again (insert the unassigned jobs). Currently, the schrimpf algorithm ruins a number of jobs relative to the total number of jobs, i.e. noJobs = 0.5 * totalNoJobs for the random ruin and 0.3 * totalNoJobs for the radial ruin. If your problem is very small, the share of jobs to be removed might not sufficiant. This is going to change with next release, where you can use an algorithm out of the box which defines an absolute minimum of jobs that need to be removed. For the time being, modify the shares in your algorithmConfig.xml.
Having one puzzling requirement.
Basically I need to create unique id with these criteria
9 digits number, unique for the day (means it's ok if the number appears again the next day )
generated in realtime ; java only ( means no sequence number generation from database -actually no database access at all )
the number is generated to populate a requestID, and around 1.000.000 id will be generated per day.
UUID or UID should not be used ( more than 9 digits )
Here is my consideration :
using sequence number sounds good, but in case JVM restart, the
requestId might be re-generated.
using time HHmmssSSS ( Hour Minute Second Milliseconds ) have 2 issues :
a. System Hour might be changed by server admin.
b. Can cause issue
if 2 requests being asked on same milliseconds.
Any idea?
no sequence number generation from database
I hate silly requirements like that. I say you cheat and use an embedded database like H2 or HSQLDB and generate the identifier through a sequence.
Edit: Let me expand I bit on why I propose this "cheat": My understanding on the "No database" requirement is that either no database software should be installed to handle this requirement or that the existing database schema cannot be changed. Using an embedded database is the same thing as adding a new jar file to your project. Why you should not do this? Why implement something yourself when relational databases have already solved this problem for you?
Nine digits to handle 1,000,000 IDs gives us three digits to play with (we need the other six for the 0-999999 for the ID).
I assume you have a multi-server setup. Assign each server a three-digit server ID, and then you can allocate unique ID values within each server without worrying about overlap between them. It can just be an ever-increasing value in memory, except to survive JVM restarts, we need to echo the most recently allocated value to disk (well, to anywhere you want to store it — local disk, memcache, whatever).
To ensure you don't hit the overhead of file/whatever I/O on each request, you allocate the IDs in blocks, echoing the endpoint of the block back to the storage.
So it ends up being:
Give each server an ID
Have storage on the server which stores the last allocated value for the day (a file, for instance)
Have the ID allocator work in blocks (10 IDs at a time, 100, whatever)
To allocate a block:
Read the file, write back a number increased by your blocksize
Use IDs from the block
The ID would be , e.g. 12000000027 for the 28th ID allocated by server #12
When the day changes (e.g., midnight), throw away your current block and allocate a new one for the new day
In pseudocode:
class IDAllocator {
Storage storage;
int nextId;
int idCount;
int blockSize;
long lastIdTime;
/**
* Creates an IDAllocator with the given server ID, backing storage,
* and block size.
*
* #param serverId the ID of the server (e.g., 12)
* #param s the backing storage to use
* #param size the block size to use
* #throws SomeException if something goes wrong
*/
IDAllocator(int serverId, Storage s, int size)
throws SomeException {
// Remember our info
this.serverId = serverId * 1000000; // Add a million to make life easy
this.storage = s;
this.nextId = 0;
this.idCount = 0;
this.blockSize = bs;
this.lastIdTime = this.getDayMilliseconds();
// Get the first block. If you like and depending on
// what container this code is running in, you could
// spin this out to a separate thread.
this.getBlock();
}
public synchronized int getNextId()
throws SomeException {
int id;
// If we're out of IDs, or if the day has changed, get a new block
if (idCount == 0 || this.lastIdTime < this.getDayMilliseconds()) {
this.getBlock();
}
// Alloc from the block
id = this.nextId;
--this.idCount;
++this.nextId;
// If you wanted (and depending on what container this
// code is running in), you could proactively retrieve
// the next block here if you were getting low...
// Return the ID
return id + this.serverId;
}
protected long getDayMilliseconds() {
return System.currentTimeMillis() % 86400000;
}
protected void getBlock()
throws SomeException {
int id;
synchronized (this) {
synchronized (this.storage.syncRoot()) {
id = this.storage.readIntFromStorage();
this.storage.writeIntToStroage(id + blocksize);
}
this.nextId = id;
this.idCount = blocksize;
}
}
}
...but again, that's pseudocode, and you might want to throw some proactive stuff in there so you never block on I/O waiting for an ID when you need one.
The above is written assuming you already have some kind of application-wide singleton, and the IDAllocator instance would just be a data member in that single instance. If not, you could readily make the above a singleton instead, by giving it the classic getInstance method and having it read its configuration from the environment rather than receiving it as arguments to the constructor.
What about counting from 1 to 999.999.999 for server 1.
And counting from -999.999.999 to -1 for server 2.
I guess due to load balancing the balancing would be about 50:50. So you got the same id range for each server. In addition you store the last generated id on your filesystem. Due to performance issues just store every 1000. value (or 10000, it doesn't really matter). After restarting your application read the last generated value and add 1000. I guess that would work.
You could maybe try out Apache's RandomStringUtils String random(int count, boolean letters, boolean numbers) or try and use the Java TRNG Client library which in turn makes use of RANDOM.ORG:
This library provides a SecureRandom service, integrated with the Java
Security API, for accessing random.org and random.irb.hr (true random
number generators that generates randomness via atmospheric noise or
photonic emission).
I think that if you get one of those and combine it with a time stamp, you should get what you are after.
String s = UUID.randomUUID().toString();
return s.substring(0,8) + s.substring(9,13) + s.substring(14,18) +
s.substring(19,23) + s.substring(24);
I use JDK1.5's UUID, but it uses too much time when I connect/disconnect from the net.
I think the UUID may want to access some net.
Can anybody help me?
UUID generation is done locally and doesn't require any alive network connection.
Quoting the API odc:
public static UUID randomUUID()
Static factory to retrieve a type 4 (pseudo randomly generated) UUID.
The UUID is generated using a
cryptographically strong pseudo random
number generator.
Your delay is probably being caused by the intialization of the cryptographically strong RNG - those take some time, and might even depend on the presence of a network connection as a source of entropy. However, this should happen only once during the runtime of the JVM. I don't see a way around this problem, though.
The javadoc for UUID http://java.sun.com/j2se/1.5.0/docs/api/java/util/UUID.html has some good information on how the UUID is generated. It uses the time and clock frequency to generate the UUID. Like sharptooth says, no network interface is required. Is there possibly some other concurrent process running that could possibly be causing this problem?
What's the purpose of those s.substring calls? It looks like you're returning the original string.
If you're appending 5 Strings together, over a large set of data, that could be the issue. Try to use StringBuffer. It's amazing the difference that can make when concatenating more than 1-2 Strings together, especially for larger datasets
For older versions of Java (6 and earlier maybe?), there's a bug in Random that causes it to iterate over the entire temp directory. We've seen seed generation take 10 minutes on some egregiously bad build machines at NVIDIA. You might want to check the size of your temp dir.
Compare: http://www.docjar.com/html/api/sun/security/provider/SeedGenerator.java.html
To: http://www.java2s.com/Open-Source/Java-Document/6.0-JDK-Modules/j2me/sun/security/provider/SeedGenerator.java.htm
Summary: I'm developing a persistent Java web application, and I need to make sure that all resources I persist have globally unique identifiers to prevent duplicates.
The Fine Print:
I'm not using an RDBMS, so I don't have any fancy sequence generators (such as the one provided by Oracle)
I'd like it to be fast, preferably all in memory - I'd rather not have to open up a file and increment some value
It needs to be thread safe (I'm anticipating that only one JVM at a time will need to generate IDs)
There needs to be consistency across instantiations of the JVM. If the server shuts down and starts up, the ID generator shouldn't re-generate the same IDs it generated in previous instantiations (or at least the chance has to be really, really slim - I anticipate many millions of presisted resources)
I have seen the examples in the EJB unique ID pattern article. They won't work for me (I'd rather not rely solely on System.currentTimeMillis() because we'll be persisting multiple resources per millisecond).
I have looked at the answers proposed in this question. My concern about them is, what is the chance that I will get a duplicate ID over time? I'm intrigued by the suggestion to use java.util.UUID for a UUID, but again, the chances of a duplicate need to be infinitesimally small.
I'm using JDK6
Pretty sure UUIDs are "good enough". There are 340,282,366,920,938,463,463,374,607,431,770,000,000 UUIDs available.
http://www.wilybeagle.com/guid_store/guid_explain.htm
"To put these numbers into perspective, one's annual risk of being hit by a meteorite is estimated to be one chance in 17 billion, that means the probability is about 0.00000000006 (6 × 10−11), equivalent to the odds of creating a few tens of trillions of UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs"
http://en.wikipedia.org/wiki/Universally_Unique_Identifier
public class UniqueID {
private static long startTime = System.currentTimeMillis();
private static long id;
public static synchronized String getUniqueID() {
return "id." + startTime + "." + id++;
}
}
If it needs to be unique per PC: you could probably use (System.currentTimeMillis() << 4) | (staticCounter++ & 15) or something like that.
That would allow you to generate 16 per ms. If you need more, shift by 5 and and it with 31...
if it needs to be unique across multiple PCs, you should also combine in your primary network card's MAC address.
edit: to clarify
private static int staticCounter=0;
private final int nBits=4;
public long getUnique() {
return (currentTimeMillis() << nBits) | (staticCounter++ & 2^nBits-1);
}
and change nBits to the square root of the largest number you should need to generate per ms.
It will eventually roll over. Probably 20 years or something with nBits at 4.
From memory the RMI remote packages contain a UUID generator. I don't know whether thats worth looking into.
When I've had to generate them I typically use a MD5 hashsum of the current date time, the user name and the IP address of the computer. Basically the idea is to take everything that you can find out about the computer/person and then generate a MD5 hash of this information.
It works really well and is incredibly fast (once you've initialised the MessageDigest for the first time).
why not do like this
String id = Long.toString(System.currentTimeMillis()) +
(new Random()).nextInt(1000) +
(new Random()).nextInt(1000);
if you want to use a shorter and faster implementation that java UUID take a look at:
https://code.google.com/p/spf4j/source/browse/trunk/spf4j-core/src/main/java/org/spf4j/concurrent/UIDGenerator.java
see the implementation choices and limitations in the javadoc.
here is a unit test on how to use:
https://code.google.com/p/spf4j/source/browse/trunk/spf4j-core/src/test/java/org/spf4j/concurrent/UIDGeneratorTest.java