I am developing a jboss Java EE application where I need to send messages through a messaging system (JMS or AMQP optional). Approx. there will be around 10k to 15k messages per second. The requirement is to generate a unique id for each outgoing message that is not used any time in the past, even after application restart i.e. the id should not repeat again through the application lifetime (from day 1 of application use until decommissioned)
I will prefer solutions based on
Numeric value only (what data type?)
String
The auto-generation of the id should be atomic.
Java provides a method for generating Universally Unique Identifiers in the UUID class
Wikipedia has an explanation why the probability that these generate a message with the same ID is negligible.
I tend to like UUIDs, especially since you can easily create them from disparate sources. But you could also just use a long (64 bit) integer. At 15k messages per second, you would get approximately 39 million years worth of unique numbers (half that if you want them to be greater than zero).
If you are looking for a fast and simple take look at: UIDGenerator.java
You can customize it (unique to process only, or world), it is easy to use and fast:
private static final UIDGenerator SCA_GEN = new UIDGenerator(new ScalableSequence(0, 100));
.......
SCA_GEN.next();
see my benchmarking results at:
http://zoltran.com/roller/zoltran/entry/generating_a_unique_id
or run them yourself.
Related
I have a requirement as follows
There are multiple devices producing data based on the device configuration. e.g., There are two devices producing data at their own intervals let’s say d1 producing for every 15 min and d2 producing for every 30 min
All this data will be sent to Kafka
I need to consume the data and perform calculations for each device which is based on the values produced for the current hour and the first value produced in the next hour. For e.g., If d1 is producing data for every 15min from 12:00 AM-1:00 AM then the calculation is based on the values produced for that hour and the first value produced from 1:00 AM-2:00 AM. If the value is not produced from 1:00AM-2:00 AM then I need to consider data from 12:00 AM-1:00 AM and save it data repository (Time series)
Like this there will be ‘n’ number of devices and each device has its own configuration. In the above scenario device d1 and d2 are producing data for every 1 hr. There might be other devices which will be producing data for every 3 hr, 6 hr.
Currently this requirement is done in Java. Since the devices are increasing so as the computations, I would like to know if Spark/Spark Streaming can be applied to this scenario?Any articles with respect to these kind of requirements can be shared so that it will be of great help.
If, and this is a big if, the computations are going to be device-wise, you can make use of topic partitions and scale the number of partitions with the number of devices. The messages are delivered in order per partition this is the most powerful idea that you need to understand.
However, some words of caution:
The number of topics may increase, if you want to decrease you may need to purge the topics and start again.
In order to ensure that the devices are uniformly distributed, you may consider assign a guid to each device.
If the calculations do not involve some sort of machine learning libraries and can be done in plain java, it may be a good idea to use plain old consumers (or Streams) for this, instead of abstracting them via Spark-Streaming. The lower the level the greater the flexibility.
You can check this. https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster
I wrote a class that, given a seed and difficulty, will return a playing field to my game. The generation is consistent (no matter what, the same seed & difficulty level will always result in the same play field). As far as I know all android devices use Java 1.6 so here goes my question(s):
Is it safe to send only the seed and difficulty to other devices in a multiplayer environment?
Do I need to worry about when Google updates Java version level form 1.6? or will they likely update all android devices to that version level (I am assuming the Random class will have been changed)? And if not what would be a good way to detect if Random class is different?
Rephrased, what precautionary measures should be in place to ensure that the class java.util.Random, which my field generation class uses heavily, will result in the same play field for every device? Or, alternatively, would it be more wise to consider sending all play field data to the non-hosting device(s)?
I could probably accomplish the latter with a reliable message with size of:
byte[ROWS * COLUMNS]
In advance, I appreciate any guidance/suggestions in this matter. This is a difficult issue to search for so some links for future views may be appropriate.
There are a few options here, but I guess I was hoping for some magic JVM property defining the java.util.Random class revision version.
First option is to check the java version and compare it against the other device's version. If they are the same it is safe (as far as I know) to assume that the Random class is the same and thus the seed and difficulty can be sent. If, however, they are different you either send all the data or check the documentation/version release notes yourself to see when the Random class was changed and then determine if all the data should be sent based on previously acquired java version identifier.
The second option is to simply always send all the data. Which is what I will personally be doing.
If you're not as lucky as I and your data exceeds the value of Multiplayer.MAX_RELIABLE_MESSAGE_LEN (in bytes) you may have to break the data into multiple messages which could get ugly but is entirely doable.
I'm using the below function to generate UUID
UUID.randomUUID().toString()
In production we have 50+ servers (application server - each is a JVM on its own) and for requests that land in these servers, as a first step we generate a UUID which essentially uniquely identifies a transaction.
What we are observing is that in Server 6 and Server 11, the UUIDs generated are matching at least for 10 to 15 messages per day which is strange because given the load i.e. about 1 million transactions a day, these UUIDs being duplicate within the same day is very odd.
This is what we have done so far
Verified the application logs - we didn't find anything fishy in there, all logs are as normal
Tried replicating this issue in the test environment with similar load in production and with 50+ servers - but this didn't happen in the test environment
Checked the application logic - this doesn't seem to be an issue because all other 48 servers except 6 and 11 which have a copy of the same code base is working perfectly fine and they are generating unique UUIDs per transaction.
So far we haven't been able to trace the issue, my question is basically if there is something at JVM level we are missing or UUID parameter that we need to set for this one off kind of an issue?
Given time, I'm sure you'll find the culprit. In the meantime, there was a comment that I think deserves to be promoted to answer:
You are generating pseudo random UUIDs at multiple locations. If you don't find other bugs, consider either generating all the pseudo random UUIDs at one location, or generate real random UUIDs
So create a UUID server. It is just a process that churns out blocks of UUIDs. Each block consists maybe 10,000 (or whatever is appropriate) UUIDs. The process writes each block to disk after the process verifies the block contains no duplicates.
Create another process to distribute the blocks of UUIDs. Maybe it is just an a web service that returns an unused block when it gets a request. The transaction server makes a request for a block and then consumes those UUIDs as it creates transactions. When the server has used most of its assigned UUIDs, it requests another block.
I wouldn't waste time wondering how UUID.randomUUID() is generating a few duplicate UUIDs per day. The odds of that happening by chance are infinitesimal. (Generating a whole series of duplicates is possible—if the underlying RNG state is duplicated, but that doesn't seem to be the case.)
Instead, look for places where a UUID stored by one server could be clobbering one stored by another. Why does this only happen between 2 servers out of 50? That has something to do with the details of your environment and system that haven't been shared.
As stated above, the chances of a legit collision are impossibly small. A more likely possibly is if the values are ever transferred between objects in an improper way.
For languages like Java that behave as pass by reference, consider the following scenario
saveObject1.setUUID(initObj.getUUID())
initObj.setUUID(UUID.randomUUID());
saveObject2.setUUID(initObj.getUUID())
In this case saveObject1 & saveObject2 will have the same value, because they are both pointed to the same object reference (initObj's UUID reference).
An issue like this seems more likely than the actual UUIDs being a collision, esp if you can reproduce it. Naturally if it doesn't happen all the time it's probably something more complex, like a rare race condition where initObj doesn't get reinitialized in time, causing saveObject1 & 2 to share the same object reference.
I use DB2 9.7.5 64Bits. The server has enough memory but no clustering.
I need to make huge computations : compute several (roughly 20) ratios in my db. Some of them can take as long as 25 seconds.
The results are stored in a result table.
Now I have several solutions (As a policy, we exclude Stored Proc).
I call each ratio, one at a time from a java client OR
I call several ratios in a multi threaded java client.
My assumption is that it is useless to call from a multi threaded since my db is the bottleneck. But I'm not wholly sure that the db engine really gives 100% of the cpu for 1 query. I think that the engine must probably be able to share its cpu power between several queries.
I am currently reading the IBM Data manual but would like to have your feedback.
Many thanks.
I need to make huge computations : compute several (roughly 20) ratios in my db. Some of them can take as long as 25 seconds.
25 seconds is not necessarily a bad thing. maybe its a wonderful result, depends on what you compute
Now I have several solutions (As a policy, we exclude Stored Proc).
Stored proc are not evil, you just need to know how to use them safely
My assumption is that it is useless to call from a multi threaded since my db is the bottleneck. But I'm not wholly sure that the db engine really gives 100% of the cpu for 1 query. I think that the engine must probably be able to share its cpu power between several queries.
multithreading in java never hearts (as long as you keep the threads safe), especially useful in your case when you are doing alot of calculations.
I don's use db2 so I don't know how good it is on multithreading, but if its single threaded I doubt that it will ever reach 100% cpu usage. you should check the conf files of your db2 to tweek it a little bit
Also read the article about IBM DB2 clustering
I also suggest using a data warehouse tool to analyze your script performance againest the db2
Good luck
Take a look at Materialized Query Tables. If what you are working with is reporting, and especially doesn't require absolutely up-to-date information, you can set up MQTs that will contain the parts that are heavy to calculate with for instance hourly versions.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Reliable way of generating unique hardware ID
Am trying to generate an ID that will be unique to a particular computer. The ID will not be generated randomly. It will be calculation based, such that the ID generated for computer A will be fixed and unique to computer A. Everytime the program is executed on computer A, it will continue to generate the same ID and when executed on another computer, it will generate another ID unique to that computer. This is to ensure that two computers don't have the same ID.
My Challenge: For my program to be able to generate an ID unique to a computer, it needs to perform the calculation based on a seed unique to the computer executing it.
My Question: How can i get a value unique to a computer, so that i can use the value as a seed in the ID generation program?
Is it possible to get a value from a computer's hardware(eg motherboard) that is unique to that computer? That way, the value is most likely not to change as long as the computer's motherboard is not replaced.
MAC address? Thats (for practical purposes) unique to every NIC so it guarantee's reproducibility even if the user is dual booting. Sure there are rare cases of people trading cards, but coupled with other metrics (don't only use this, since network cards can be changed), it's still possible.
How would you get it?
public static byte[] getMACAddress() throws SocketException, UnknownHostException {
InetAddress address = InetAddress.getLocalHost();
NetworkInterface networkInterface = NetworkInterface.getByInetAddress(address);
return networkInterface.getHardwareAddress();
}
If you want a String representation, do this
for (int byteIndex = 0; byteIndex < macAddress.length; byteIndex++) {
System.out.format("%02X%s", macAddress[byteIndex], (byteIndex < macAddress.length - 1) ? "-" : "");
}
(thanks to http://www.kodejava.org/examples/250.html)
Note: As mentioned in the comments, Mac addresses can be spoofed. But your talking about a small part of the population doing this, and unless your using this for anti-piracy stuff, its unique enough.
Win32 generates a computer SID, that is supposed to be unique for each installation that you can get via WMI or Active Directory, but that is pretty platform specific. You can also use the MAC address, as everyone else has mentioned, just make sure that it is a physical network adapter, as virtual adapters tend to share the same MAC address across computers.
However, UUID's (or GUID's) are 128 bit numbers that are supposed to be guaranteed unique, and were actually created for the purpose of solving the problem of generating unique identifiers across multiple, random machines. According to Wikipedia:
To put these numbers into perspective,
one's annual risk of being hit by a
meteorite is estimated to be one
chance in 17 billion,[25] that means
the probability is about 0.00000000006
(6 × 10−11), equivalent to the odds of
creating a few tens of trillions of
UUIDs in a year and having one
duplicate. In other words, only after
generating 1 billion UUIDs every
second for the next 100 years, the
probability of creating just one
duplicate would be about 50%. The
probability of one duplicate would be
about 50% if every person on earth
owns 600 million UUIDs.
The total number of possible combinations is 2^128 (or 3 x 10^38), so I tend to believe it. Also, most modern UUID generators don't use the V1 algorithm anymore (i.e. the one based off the MAC address), since it is considered a security issue due to the fact that one can tell when the GUID was generated, and who generated it. In the Win32 world, a security patch circa Win2K or NT 4 changed to use the V4 version of the algorithm, which is based off of a pseudo-random number instead of the MAC, and the JVM has always used the V3/V4 version.
EDIT: The method used to generate UUID's in Java is via the java.util.UUID class.
An easy way to do this is to read the ethernet hardware, or "mac" address.
http://download.oracle.com/javase/6/docs/api/java/net/NetworkInterface.html#getHardwareAddress()
Mac addresses are not quite as unique as people think, as they do get reused as time goes on. But the odds of one application or network having two identical ones are quite low.
The MAC address is unique enough for what you. See http://en.wikipedia.org/wiki/MAC_address
You didn't specify which language you are using. It may easier in some languages than others. Here is how to do it in Java http://www.kodejava.org/examples/250.html. Google around for your language.
Your best option is to base the ID on the MAC address of the primary network adaptor.
This is potentially likely to change at somepoint, but so is any single hard component.
FYI GUIDs are calculated using the MAC address.
Have you access to any information described in this article? Windows-only
http://msdn.microsoft.com/en-us/library/aa394587.aspx
Serial number, asset tag
Another option IFF you're using intel chips is the processor serial number, assuming you can ensure the feature is enabled. See Intel Serial # Note for more info