Short code generator for long text in java - java

I have long text that identifies few things in my application
For example my code: U2Cd3c7a781856c69559539a78e9492e9772dfe1b67.2.nrg
As I am sharing this key in public, it is bit long and I would like to make short by transforming just like shorturl so that is shorter in public and internally i would like to map this long text as it includes few information such as encrypted record id, user id and etc..
I am looking for a java code that does above, I never mind using my database to store in case a short code generator needs database.
Thank you
Rams

You will have to store in a database, and it should be as simple as adding the file name to a table with an autoincrement ID column, and using the ID column to build the URL. Make sure to put a cache in there somewhere. You don't want to hit the database every time you need to render a link.

Marcelo's answer is good if the links are of temporary nature. If the links are long-lived, I'd add another column that used a short but dense randomly generated key (such as a 10-digit base 36 number A-Z0-9) and use that for the URL. The reason is that if you needed to do any kind of table maintenance (such as merging test and QA data, for example), you could do so without worrying too much about conflicts resulting from the same autokey value referring to two different URLs.
Where I worked previously, they thought nothing about hard-coding PK values for status and code tables. This meant that these tables in prod, QA, Test, and Dev had to be identical to the PK. What a pain!
Thus I don't like to give my PKs to users...

Related

Creating empty objects in db or saving them later

I have the following table in my db:
CREATE TABLE document (
id INT PRIMARY KEY AUTOINCREMENT,
productModelId INT NOT NULL,
comment VARCHAR(50),
CONSTRAINT FK_product_model FOREIGN KEY (productModelId) REFERENCES product_model(id),
)
Of course, real table is much more complicated, but this is enough to understand the problem.
Our users want to see the number of the document when they click button "new". So, in order to do that, we have to create object in db and send to client that object. But, there is a problem. We need to know productModelId before we save the object in db. Otherwise we will have an sql exception.
I see two possible variants (both are ugly, really):
To show modal list with product models to user and after that create object in database with productModelId chosen by user.
To create a temporary number and after that to save the object in db when user finishes editing the document and saves id. We also need to remove NOT NULL case and validate this somwhere in code.
The first way is bad because we have too much modals in our application. Our UI is too heavy with them.
The second variant is ugly because our database is not consistent without all the checks.
What can you suggest we do? Any new solutions? What do you do in your apps? May be some UI tips. We are using the first variant at the moment.
Theory says that the id you use on your database should not be a relevant information, so the user should not see it if not well hidden in an URL or similar, so you should not display it to the user, and the problem you have is one possible confirmation of this theory.
Right now the solution you have is partially correct: it satisfies technical requirements, but is still bad because if the user doesn't complete the insert you'll end up with the DB having empty records (meaning, with ID and foreign key ok, but all other fields empty or with useless default values), so you are basically circumventing the database validations.
There are two better solutions, but both require you to review your database.
The first is not to use the id as something to display to the user. Use another column, with another "id", declare it unique on the database, generate it at application, display it to the user, and then use this other "id" (if it's unique, it is effectively an id) wherever needed.
The second one is the one that is being used often cause it does not require a central database or other authority to check uniqueness of ids, so scales better in distributed environments.
Drop the use of the common "id int" auto-incremented or not, and use UUIDs. Your id will be a varchar or a binary, an UUID implementation (like java.util.UUID, but you can find in other languages) will generate a unique id by itself whenever (and wherever, even on the client for example) you need it, and then you supply this id when saving.
We make it the following way.
Created table id_requests with fields issue_type_id and lastId. We need this in order to avoid the situation when two users hit the button 'new' and get the same ids.
And of course we added field innerNum to all the tables we use this feature in.
Thank you!

Store meta information as a simple Enum or in database?

I need to store a meta information - this information is really does not affect the system in anyway at all and really just informational purposes.
For example - If I have several applications: ios app, android app, mobile web, desktop web - where a logged in user can create content and the content will be displayed across applications. I was thinking it may be useful to store where did that content created from.
So if I have in the database:
- USER table (user_id, username, password, email)
- CONTENT table (content_id, user_id, content)
I want to add the information about where the content came from, so I'll modify the content table as follow:
- CONTENT table (content_id, user_id, content, source)
How should I store the source?
Should it be just an enum class (i am using Java)
public enum Source{ IOS_APP, ANDROID_APP, MOBILE_WEB, DESKTOP_WEB }
and then simply store it a String (varchar) in the database?
Or, should I actually create an extra table and use foreign key relationship
SOURCE table (source_id, source_description)
CONTENT table (content_id, user_id, content, source_id)
Which approach would be more desirable? Pros / cons?
The information here reall does not affect the application anyway. In a way, it's just for statistics informational purposes so if we look back for curiosity we can answer the question "where did most of the content come from"
IMO, you should not choose one over the other, but instead have both.
The enum will help keep your Java code clean, and the table will help keep that data organized.
It would be good to have a separate (master) table for that type of information. And the other tables can reference it as a foreign key. With that you will have a central location for possible values. You wouldn't have to go every where looking for all the possible values.
You can create an enum representing that (master) table. And it can be used as field type if you create entities for other tables. You can see this for an example. Also (optionally) you could valid the enum with table content at application start, to make sure the enum stays in sync with the table, in case new values or added or some existing ones are updated.
I use to have enums in the code and a table with the data represented by the enum in the database. Also, I make the IDs in the table to match with the value of the enum and have easy access to well-known enum values.
Another approach is not having an enum, but a class with ID + Name (maybe the enum exists anyway, to match the IDs with the enum values and have easy access to well-known enum values). If you have code like "if myEnum = MyEnum.SomeValue then" (this is, you have to take a decision on an enum value) maybe it's better to have polimorphic behavior using the class.
PS: I'm not a Java developer, I work with C#, I think that enums in Java are actually real classes that are able to have behaviour, aren't they?

Is it advisable to store some information (meta-data) about a content in the id (or key) of that content?

It is advisable to store some information(meta-data) about a content in the Id(or key) of that content ?
In other words, I am using a time based UUIDs as the Ids (or key) for some content stored in the database. My application first accesses the list of all such Ids(or keys) of the content (from the database) and then accessed the corresponding content(from the database). These Ids are actually UUIDs(time based). My idea is to store some extra information about the content, in the Ids itself, so that the my software can access this meta-content without accessing the entire content from the database again.
My application context is a website using Java technology and Cassandra database.
So my question is,
whether I should do so ? I am concerned since lots of processing may be required (at the time of presentation of data to user) in order to retrieve the meta data from the ids of the content!! Thus it may be instead better to retrieve it from database then getting it through processing of the Id of that content.
If suggested then , How should I implement that in an efficient manner ? I was thinking of following way :-
Id of a content = 'Timebased UUID' + 'UserId'
where, 'timebasedUUID' is the generated ID based on the timestamp when that content was added by a user & 'userId' represents the Id of the user who put that content.
so my example Id would look something like this:- e4c0b9c0-a633-15a0-ac78-001b38952a49(TimeUUID) -- ff7405dacd2b(UserId)
How should I extract this userId from the above id of the content, in most efficient manner?
Is there a better approach to store meta information in the Ids ?
I hate to say it since you seem to have put a lot of thought into this but I would say this is not advisable. Storing data like this sounds like a good idea at first but ends up causing problems because you will have many unexpected issues reading and saving the data. It's best to keep separate data as separate variables and columns.
If you are really interested in accessing meta-content with out main content I would make two column families. One family has the meta-content and the other the larger main content and both share the same ID key. I don't know much about Cassandra but this seems to be the recommended way to do this sort of thing.
I should note that I don't think that all this will be necessary. Unless the users are storing very large amounts of information their size should be trivial and your retrievals of them should remain quick
I agree with AmaDaden. Mixing IDs and data is the first step on a path that leads to a world of suffering. In particular, you will eventually find a situation where the business logic requires the data part to change and the database logic requires the ID not to change. Off the cuff, in your example, there might suddenly be a requirement for a user to be able to merge two accounts to a single user id. If user id is just data, this should be a trivial update. If it's part of the ID, you need to find and update all references to that id.

How to identify duplicate items gathered from multiple feeds and link to them in a Database

I have a Database storing details of products which are taken from many sites, and gathered through the individual sites API's. When I call the feed, the details are stored in a database table.
The problem I'm having is that because the exact same product is listed on many sites by the seller I end up having duplicate items in my database, and then when I display them on a web page there are many duplicates.
The problem is that the item doesn't have any obvious unique identifier, it has specific details of the item (of which there could be many), and then a description of the item from the seller.
What I would like is for the item to show up once, and then give the user details of where else the item is listed.
How would I identify the duplicates that have come in, without slowing down the entire database? How would I also then pick one advert from all the duplicates, and then store what other sites the advert is displayed on.
Thanks for any help.
The problem is two-fold, and both are on your side. When you figure out how to deal with that, writing the code into a program (Java or SQL will be easy). I'll name them first and then identify the solutions.
For some unknown reason, you have assumed that collecting product descriptions from mulitple sites will not collect the same product.
You are used to the common and nonsensical Id column, which is fine when you are working with spreadsheets prototyping functionality; but it is nowhere near what is required for a database or Development-level functionality. Your users (or boss) have naturally expected database capability from the database, and you did not provide any. (And no, it does not require fuzzy string logic or magic of any kind.)
Solution
This is a condensed version of the IDEF1X Standard for modelling Relational Databases; the portion re Identifiers.
You need to think in database terms, and think about the database tables you need to perform your function, which means you are not allowed to use an auto-increment Id column. That column gives a spreadsheet a RowId, but it does not imply anything about the content of the table, or the columns that identify a product.
And you cannot simply rip data off another website, you need to think about what your website requires for products. What does your company understand a product to be, and how does it identify a product ?
Identify all the columns and datatypes for the columns.
Identify which columns are mandatory and which are optional.
Identify which are strong Identifiers. Eg. Manufacturer and Model; the short Product Name, not the long Description (or may be for your company, the long description is an Identifier). Work with your users, and work that out.
You will find you actually have a small cluster of tables around Product, such as Manufacturer, ProductType, perhaps Vendor, etc.
Organise those tables, and Normalise them, so that you are not duplicating data.
Make sure you treat those Identifiers with a bit of respect. Choose which will be unique. Those are Candidate Keys. You need at least one per table, and there will be more than one in Product. All the Identifiers that will be searched on will need to be indexed (Unique or not). Note that Unique Indices cannot be Nullable, so you cannot choose an optional column.
What makes a single Unique Identifier for Product may not be a single column. That's ok, we can evaluate multiple columns for keys in databases; they are called Compound Keys.
Take the best, most stable (one which will not change) Unique Identifier, one of the Candidate Keys, and make that the Primary Key.
If, and only if, the Unique Identifier, the Primary Key, which may be a Compound Key, is very long, and therefore unsuitable for a Primary Key, which is migrated to the child tables, then add a Surrogate Key. That will be the Id column. Note that that is an additional column and additional Index. It is not a substitute for the Identifiers of Product, the Candidate Keys; they cannot be removed.
So far we have a Product database on your companies side of the web, that is meaningful to it. Now we are in a position to evaluate products from the other side of the web; and when we do, we have a framework on our side that is strong, against which we can measure the rubbish that we get from the other side of the web.
Feeds
You need a WebSite table to manage the feeds.
There will be an Associative table (many-to-many) between Product and WebSite. Let's call it ProductSite. It will contain only our ProductId, and the WebSiteCode. It may containPrice`. The contents are valid for a single feed cycle.
Load each feed into a staging database or schema, an incoming ProductIn table, maybe one per source website. This is just the flat file from the external source. Add a column IsValid and set the Default to true.
Then write some SQL that compares that ProductIn table, with its loose and floppy contents, with our Product table with its strong Identifiers.
The way I would do it is, several waves of separate checks, each marking the rows that fail, with IsValid to false. At the end Insert the IsValid rows into our ProductSite.
You might be lucky, and get away with an optimistic approach. That is, as long as you find a match on a few important columns, the match is valid. (reverse the Default and update of the IsValid boolean).
This is the proc that will require some back-and-forth work, until it settles down. That is why you need to work with your users re the Indentifiers. The goal is to exclude no external products, but your starting point will exclude many. That will include going back to our Product table and improving the content (values in the rows) of the Identifiers, and other relevant columns that you use to identify matching rows.
Repeat for each WebSite.
Now populate our website from our Product table, using information that we are confident about, and show which sites have the product for sale from ProductSite.
I don't think this is a code or database problem (yet). You say:
The problem is that the item doesn't have any obvious unique identifier
You need to work out what that uniqeness is before you can ask a computer to do that for you. It sounds like you need some sort of fuzzy, string similarity algorithm.
Some examples of data that you consider duplicates might help.

Designing Unique Keys(Primary Keys) for a heavily denormalized NoSQL database

I am working on a web application related to Discussion forums using Java and Cassandra database.
I need to construct 'keys' for the rows storing the user's details and & another set of rows storing the content posted by the user.
One option is to get the randomly generated UUID provided by Java language, but these are 16 bytes long. and since NoSQL database involves heavy denormalization, I am concerned whether I would be wasting lots of disk space, RAM and other resources if the key could be generated in smaller sizes.
I need to generate two types of keys, one for the Users & other for Content Posted by Users.
For the Content posted by users, would timestamp+userId be a good key. where timestamp is the server time at which content was posted and userId refers to key of user row.
Any suggestions, comments appreciated ..
Thanks
Marcos
Is this a distributed application?
Then you could use a simple synchronized counter and initialize it on startup with the next available id.
On the other hand a database should be able to handle the UUID hashes as created by java.
This is a standard for creating things like sessionIds, that need to be unique.
Your problem is somewhat similar since a session in your context would represent a set of user input.

Categories