JavaEE application create a unique key using java.util.UUID - java

I want to make a javaEE application when users can register and confirm their email when receiving a email with a link after inserting their data in registration form (name, mail...)
To do that I am going to generate a long and unique key with java.util.UUID, store in a database and then send an email to the user with that key being part of the URL (Example: www.mysite.com/account.xhtml?id=KEY). Then the user will click the link, I extract the key from the URL and check if that key is stored in the DB. If it is, the user registration will be completed.
My question is, when creating that key with java.util.UUID, how can I know that it is a unique key? Should I check if there is another equal key in the DB and if so create a new one until the created key is unique?

What's the chance that a randomly-generated 128-bit integer will be equal to another randomly-generated integer?
If you just need peace of mind, use a primary key and if the insert fails due to a key collision, re-create a new UUID and retry the insert.

There are couple of ways you can do UUID in Java.
Java 5 onwards better practice is using java.util.UUID It is size of the string 36 characters. This link gives you simple example.
This discussion will give you answer to your question. It is very strong. I have never came across someone is complaining about its uniqueness.
But if you adding into DB or using in storage or using through network, size may be matters. So converting to other formats - Bases is good solution (Base64, Base85 etc). Please check this discussion here. You can use apache library org.apache.commons.codec.binary.Base64. Base85 is not safe for URLs.
My recommendation is, if you have may application/session beans/web services (many interconnections other applications and data transfers etc) are creating UUIDs, I prefer to do unique application name padding too. Like APP1, APP2 etc and then decode to other bases. If UUID is 6fcb514b-b878-4c9d-95b7-8dc3a7ce6fd8, then APP1-6fcb514b-b878-4c9d-95b7-8dc3a7ce6fd8 like that...
Though it is off the topic here, BUT When you use a URL like this www.mysite.com/account.xhtml?id=KEY, beware about SQL injection hacking attacks.

Related

Implementing UUID for better performance

Requirement:
we need to assign a random, long enough number/string to merchants so that they are uniquely identifiable(do not want someone to guess the identifier ). This is required because we are printing this number/string as QR code and giving it to merchants so that our users can read the QR code and get the info about the merchant.
current Implementation:
we cannot print id's as they will be sequential, so we have introduced a new field (externalId) to store a unique id generated by TimeBasedGenerator of JUG UUID generator. As I researched more I found there are performance issues with UUID. All the articles talks about UUId as primary key, I am not using UUID as primary key but as a secondary identifier. I am not worried about inserts and updates, as they will be fewer as compared to search. As user will be sending us the UUID read from the printed QR code and we will need to search the merchant using this UUID field, Hence there will be huge impact on performance.
solution as per me:
instead of UUID we will give a combination of ID:UUID, so that when we get the request we can split and fetch the merchant using ID and check if the externalId(UUID) present in our DB matches with the one provided, if it matches we return the merchant other wise not.
Will this solution increase the performance significantly or the time taken by string operations will nullify the effect.
Is there any better approach to do this?
Thanks
UUIDs are generated in various flavors. As you are perhaps already aware, RFC 4122 outlines a variant (i.e. UUID schema) that defines five different versions. I suggest that you look into version 1 of RFC 4122.
Version 1 uses a MAC address of the machine generating the UUID as the last 12 digits of the generated UUID.
Assume you go to a second-hand computer recycler and purchase an obsolete ethernet adaptor; power it up and get the MAC address of that adaptor; and then destroy the adaptor (shredder, shot-gun, acid, etc.). You can now be assured that no other computer in the universe will -ever- generate a V1 UUID using that MAC address.
Now you could write a UUID (RFC 4122, V1) generator that would use harvested MAC address.
You could harvest a separate MAC address (from additional cards) for each merchant, and then you would be able to identify UUIDs generated for each.
BONUS: You might experiment with V1 UUIDs using Mahonri Moriancumer's UUID and GUID Generator and Forensics.
The best implementation would depend on your UUID'S character distribution but what I did when faced with similar situation is use only the UUID for the end user. In the database I added a column filled with the first 2 chars of the UUID and indexed that column.
Either this or, as you proposed, the ID would serve the same purpose. The benefits of indexing a partial UDID are that
you can tweak how many characters to use to improve performance
the id is not used externaly potentially causing issues.
On the other hand, an integer index is probably using less memory and in no need of tweeking
Performance wise, for my implementation we were facing search queries of 1 in a million taking 7-8 seconds. With this solution we dropped to a few milliseconds

Validate the unique string generated in the system

I'm generating a unique id(generated by the frame work used are, so I should use only this ID and there is no API in the framework to check if this generated by the framework. Thanks RichieHH for pointing this) for each request in the web application and this can be presented back as a part of another request to the system. Now, I am storing these unique ID's generated in the database, and for every request the DB query is issued to check if this ID already exists(this is how the validation is done currently for the unique ID's). Now, if I have to validate the ID sent in the request has been generated by the application with out using the persistent storage, which approach should I be following?
My initial approacht is to generate the ID which adds to particular sum after hashing, but this can be identified after going through the patterns.
It will be great if some one can help me with an approach to solve this problem in a way it can validate the uniqueID generated with in the application. Thanks.
Use UUID, which is pretty standard solution for this task. You don't need to validate UUID, you can assume that it is unique always.
You can use ServerName+Timestamp+some extra. It can be more advantageous for debug but less secure.

Designing Unique Keys(Primary Keys) for a heavily denormalized NoSQL database

I am working on a web application related to Discussion forums using Java and Cassandra database.
I need to construct 'keys' for the rows storing the user's details and & another set of rows storing the content posted by the user.
One option is to get the randomly generated UUID provided by Java language, but these are 16 bytes long. and since NoSQL database involves heavy denormalization, I am concerned whether I would be wasting lots of disk space, RAM and other resources if the key could be generated in smaller sizes.
I need to generate two types of keys, one for the Users & other for Content Posted by Users.
For the Content posted by users, would timestamp+userId be a good key. where timestamp is the server time at which content was posted and userId refers to key of user row.
Any suggestions, comments appreciated ..
Thanks
Marcos
Is this a distributed application?
Then you could use a simple synchronized counter and initialize it on startup with the next available id.
On the other hand a database should be able to handle the UUID hashes as created by java.
This is a standard for creating things like sessionIds, that need to be unique.
Your problem is somewhat similar since a session in your context would represent a set of user input.

Short code generator for long text in java

I have long text that identifies few things in my application
For example my code: U2Cd3c7a781856c69559539a78e9492e9772dfe1b67.2.nrg
As I am sharing this key in public, it is bit long and I would like to make short by transforming just like shorturl so that is shorter in public and internally i would like to map this long text as it includes few information such as encrypted record id, user id and etc..
I am looking for a java code that does above, I never mind using my database to store in case a short code generator needs database.
Thank you
Rams
You will have to store in a database, and it should be as simple as adding the file name to a table with an autoincrement ID column, and using the ID column to build the URL. Make sure to put a cache in there somewhere. You don't want to hit the database every time you need to render a link.
Marcelo's answer is good if the links are of temporary nature. If the links are long-lived, I'd add another column that used a short but dense randomly generated key (such as a 10-digit base 36 number A-Z0-9) and use that for the URL. The reason is that if you needed to do any kind of table maintenance (such as merging test and QA data, for example), you could do so without worrying too much about conflicts resulting from the same autokey value referring to two different URLs.
Where I worked previously, they thought nothing about hard-coding PK values for status and code tables. This meant that these tables in prod, QA, Test, and Dev had to be identical to the PK. What a pain!
Thus I don't like to give my PKs to users...

Generate Primary key without using Database

I came across a question recently that was for "Generating primary key in a clustered environment of 5 App-Servers - [OAS Version 10] without using database".
Usually we generate PK by a DB sequence, or storing the values in a database table and then using a SP to generate the new PK value...However current requirement is to generate primary key for my application without referencing the database using JDK 1.4.
Need expert's help to arrive on better ways to handle this.
Thanks,
Use a UUID as your primary key and generate it client-side.
Edit:
Since your comment I felt I should expand on why this is a good way to do things.
Although sequential primary keys are the most common in databases, using a randomly generated primary key is frequently the best choice for distributed databases or (particularly) databases that support a "disconnected" user interface, i.e. a UI where the user is not continuously connected to the database at all times.
UUIDs are the best form of randomly generated key since they are guaranteed to be very unique; the likelyhood of the same UUID being generated twice is so extremely low as to be almost completely impossible. UUIDs are also ubiquitous; nearly every platform has support for the generation of them built in, and for those that don't there's almost always a third-party library to take up the slack.
The biggest benefit to using a randomly generated primary key is that you can build many complex data relationships (with primary and foreign keys) on the client side and (when you're ready to save, for example) simply dump everything to the database in a single bulk insert without having to rely on post-insert steps to obtain the key for later relationship inserts.
On the con side, UUIDs are 16 bytes rather than a standard 4-byte int -- 4 times the space. Is that really an issue these days? I'd say not, but I know some who would argue otherwise. The only real performance concern when it comes to UUIDs is indexing, specifically clustered indexing. I'm going to wander into the SQL Server world, since I don't develop against Oracle all that often and that's my current comfort zone, and talk about the fact that SQL Server will by default create a clustered index across all fields on the primary key of a table. This works fairly well in the auto-increment int world, and provides for some good performance for key-based lookups. Any DBA worth his salt, however, will cluster differently, but folks who don't pay attention to that clustering and who also use UUIDs (GUIDs in the Microsoft world) tend to get some nasty slowdowns on insert-heavy databases, because the clustered index has to be recomputed every insert and if it's clustered against a UUID, which could put the new key in the middle of the clustered sequence, a lot of data could potentially need to be rearranged to maintain the clustered index. This may or may not be an issue in the Oracle world -- I just don't know if Oracle PKs are clustered by default like they are in SQL Server.
If that run-on sentence was too hard to follow, just remember this: if you use a UUID as your primary key, do not cluster on that key!
You may find it helpful to look up UUID generation.
In the simple case, one program running one thread on each machine, you can do something such as
MAC address + time in nanseconds since 1970.
If you cannot use database at all, GUID/UUID is the only reliable way to go. However, if you can use database occasionally, try HiLo algorithm.
You should consider using ids in the form of UUID. Java5 has a class for representing them (and must also have a factory to generate them). With this factory class, you can backport the code to your anticated Java 1.4 in order to have the identifiers you require.
Take a look at these strategies used by Hibernate (section 5.1.5 in the link). You will surely find it useful.
It explains several methods, its pros and cons, also stating if they are safe in a clustered environment.
Best of all, there is available code that already implements it for you :)
If it fits your application, you can use a larger string key coupled with a UUID() function or SHA1(of random data).
For sequential int's, I'll leave that to another poster.
You can generate a key based on the combination of below three things
The IP address or MAC address of machine
Current time
An incremental counter on each instance (to ensure same key does not get generated twice on one machine as time may appear same in two immediate key creations because of underlying time precision)
by using Statement Object you can called statement.getGeneratedKeys(); method to retrieve the auto-generated key(s) generated by the execution of this Statement object.
Java doc
Here is how it's done in MongoDB: http://www.mongodb.org/display/DOCS/Object+IDs
They include a timestamp.
But you can also install Oracle Express and select sequences, you can select in bulk:
SQL> select mysequence.nextval from dual connect by level < 20;
NEXTVAL
1
2
3
4
5
..
20
Why are you not allowed to use the database? Money (Oracle express is free) or single point of failure? Or do you want to support other databases than Oracle in the future?
Its shipped OOB in many Spring-based applications like Hybris-
The typeCode is the name of your table like, User, Address, etc.
private PK generatePkForCode(final String typeCode)
{
final TypeInfoMap persistenceInfo = Registry.getCurrentTenant().getPersistenceManager().getPersistenceInfo(typeCode);
return PK.createCounterPK(persistenceInfo.getItemTypeCode());
}

Categories