Convert string to specific int

Convert string to specific int - java

I am creating foreground notification with ID like so:
startForeground(1, notification)
When initialising the service I am sending to it some string (ex: Hello). I wish that the service and notification will be bind to this string so I wish to use it as my id. So, how can I convert string to unique ID? For example the word "Hello" will always generate 123 And the word Bye will always generate 456.

That sounds like you want a "Hash Code"; a value derived from some other information that is (hopefully, but not always) unique.
There are a lot of different algorithms available to do this and if you search for "hash code" you will find lots of them (especially in the security domain; sha, md5 etc)
However,
It sounds like you may not really need to get that complex (some of the more secure and "unique" hash code algorithms can be slow to calculate).
Is there any reason why you can't use the string itself?
String comparison may be slow, but maybe not as slow as a good hash. Also you might be able to use a Hash Table if you need a faster "lookup".. hashmap
Anyway, if you really do need a hash code from a string, a quick search found this (which looks reasonable) Sam Clarke; Kotlin Hash Strings

Related

Is it possible to retrieve the original value of hashed value by offering the partial search mechanism?

Info: Using SHA-512 w/Salt.
I am hashing some sensitive values to support search mechanism as decrypting the values at run-time would be time consuming.
For now, things look good. I have the hashes of the original values stored in DB that were hashed with a defined salt. Whenever user tries to search with the search parameter, the input value gets hashed with the same salt and I simply match the two hashed values. That suffices my work.
But now, I want to offer partial search. So, if I have the hash values of "Hello", "Hi", "Howdy" stored and if the users enters "H", all three values should be matched and retrieved.
Is it possible to obtain this functionality?
Any help would be appreciated.
Thanks

If I understand you correctly, that is not possible.
Let me clarify and explain: you want to know whether you can know from inserting H in the search field, if the hash is related to the hash of "Hi, Howdy, Hello", whatsoever. Right?
If that is the case, it is not possible as the characteristics of a secure hash function is, that the change of a single input, significantly changes the entirety of the resulting hash value.
This characteristic is called pre-image resistance.
More information on hashes can be found for example here

Generate password reset token through UUID

I am looking for mechanism to generate random unique alpha numeric key for resetting user password.
I've goggled a lot in this direction, but looks like this thing is not obvious thing.
I've tried something like that:
new String(encodeBase64URLSafe(UUID.randomUUID()));
But after reading the following article: Is UUID.randomUUID() suitable for use as a one-time password? looks like that this way is not fully correct.
It would be really appreciate if you answer on the following questions:
Which is secure way to generate such token using UUID?
Do we need to convert UUID string to base64 in order to have safe URLs or it would be enough to remove dashes from generated string?
Would be it correct to use mechanism from this link in such purpouses How to generate a random alpha-numeric string?, why?

Using an UUID is safe and secure. The linked article just says that an UUID is maybe a little too much for this kind of security. But well ... if you are "too" secure, no one will blame you.
An UUID is just alpha numeric characters and dashes. So if you need to put it in a query string or an URL, you have nothing to escape. You can remove the dashes to save some space if you want. But it is not required.
This mechanism is secure too. Both (UUID and this one) will works.
For this kind of security, all you have to do is to ensure that your token is randomly generated (even partially).

Using an MD5 Hash as an index

I am writing a MongoDB Collection that contains a specific set of data, and I want to run comparisons against that data by taking an MD5 (or maybe SHA256) hash of the data and basing comparisons off of that.
I was wondering if using a fixed-length character string of hex-numbers is the right way of doing this. Is there a better datatype to use, such as a "blob" or even a 64bit long integer to hold the values? (This may require me to use a hashing function that produces longs -- I don't know of one except maybe overriding the Java .hashCode() function with Eclispe?)
If there is a better way entirely, advise on best practice would be appreciated here!

Storing MD5 Hashes in MongoDB
You have to use String or Binary (half the size) in case you decide to store a MD5 hash (see here).
Best Hash Function
This is tough to answer, since it highly depends on the kind of data in your collection. I personally think that MD5 hashes are a good way, but again it depends on the use-case. In case you want to customize/optimize your hash, this post and this post might get you started. They cover some simple recipes on writing a custom hash function.

Generating Unique Hash for URL crawled by crawler

I am implementing a Crawler and I wanted to generate a unique hash code for every URL crawled by my system. This will help me in checking duplicate URLs, matching complete URL can be a expensive stuff. Crawler will crawl millions of pages daily. So output of this hash function should be unique.

Unless you know every address ahead of time, and there happens to be a perfect hash for said set of addresses, this task is theoretically impossible.
By the pigeonhole principle, there must exist at least two strings that have the same Integer value no matter what technique you use for conversion, considering that Integers have a finite range, and strings do not. While addresses, in reality, are not infinitely long, you're still going to get multiple addresses that map to the same hash value. In theory, there are infinitely many strings that will map to the same Integer value.
So, in conclusion, you should probably just use a standard HashMap.
Additionally, you need to worry about the following:
www.stackoverflow.com http://www.stackoverflow.com
http://stackoverflow.com stackoverflow.com ...
which are all equivalent, so you would need to normalize first, then hash. While there are some algorithms that given the set first will generate a perfect hash, I doubt that that is necessary for your purposes.

I think the solution is to normalize URLs first by removing first parts like http:// or http://www. from the beginning and last parts like / or ?... or #....
After this cleaning, you should have a clean domain URL, and you can do a hash for it.
But the best solution is to use a bloomfilter (a probabilistic data structure) which can tell you of the URL was probably visited or guaranteed not visited

How to best represent Constants (Enums) in the Database (INT vs VARCHAR)?

what is the best solution in terms of performance and "readability/good coding style" to represent a (Java) Enumeration (fixed set of constants) on the DB layer in regard to an integer (or any number datatype in general) vs a string representation.
Caveat: There are some database systems that support "Enums" directly but this would require to keept the Database Enum-Definition in sync with the Business-Layer-implementation. Furthermore this kind of datatype might not be available on all Database systems and as well might differ in the syntax => I am looking for an easy solution that is easy to mange and available on all database systems. (So my question only adresses the Number vs String representation.)
The Number representation of a constants seems to me very efficient to store (for example consumes only two bytes as integer) and is most likely very fast in terms of indexing, but hard to read ("0" vs. "1" etc)..
The String representation is more readable (storing "enabled" and "disabled" compared to a "0" and "1" ), but consumes much mor storage space and is most likely also slower in regard to indexing.
My questions is, did I miss some important aspects? What would you suggest to use for an enum representation on the Database layer.
Thank you very much!

In most cases, I prefer to use a short alphanumeric code, and then have a lookup table with the expanded text. When necessary I build the enum table in the program dynamically from the database table.
For example, suppose we have a field that is supposed to contain, say, transaction type, and the possible values are Sale, Return, Service, and Layaway. I'd create a transaction type table with code and description, make the codes maybe "SA", "RE", "SV", and "LY", and use the code field as the primary key. Then in each transaction record I'd post that code. This takes less space than an integer key in the record itself and in the index. Exactly how it is processed depends on the database engine but it shouldn't be dramatically less efficient than an integer key. And because it's mnemonic it's very easy to use. You can dump a record and easily see what the values are and likely remember which is which. You can display the codes without translation in user output and the users can make sense of them. Indeed, this can give you a performance gain over integer keys: In many cases the abbreviation is good for the users -- they often want abbreviations to keep displays compact and avoid scrolling -- so you don't need to join on the transaction table to get a translation.
I would definitely NOT store a long text value in every record. Like in this example, I would not want to dispense with the transaction table and store "Layaway". Not only is this inefficient, but it is quite possible that someday the users will say that they want it changed to "Layaway sale", or even some subtle difference like "Lay-away". Then you not only have to update every record in the database, but you have to search through the program for every place this text occurs and change it. Also, the longer the text, the more likely that somewhere along the line a programmer will mis-spell it and create obscure bugs.
Also, having a transaction type table provides a convenient place to store additional information about the transaction type. Never ever ever write code that says "if whatevercode='A' or whatevercode='C' or whatevercode='X' then ..." Whatever it is that makes those three codes somehow different from all other codes, put a field for it in the transaction table and test that field. If you say, "Well, those are all the tax-related codes" or whatever, then fine, create a field called "tax_related" and set it to true or false for each code value as appropriate. Otherwise when someone creates a new transaction type, they have to look through all those if/or lists and figure out which ones this type should be added to and which it shouldn't. I've read plenty of baffling programs where I had to figure out why some logic applied to these three code values but not others, and when you think a fourth value ought to be included in the list, it's very hard to tell whether it is missing because it is really different in some way, or if the programmer made a mistake.
The only type I don't create the translation table is when the list is very short, there is no additional data to keep, and it is clear from the nature of the universe that it is unlikely to ever change so the values can be safely hard-coded. Like true/false or positive/negative/zero or male/female. (And hey, even that last one, obvious as it seems, there are people insisting we now include "transgendered" and the like.)
Some people dogmatically insist that every table have an auto-generated sequential integer key. Such keys are an excellent choice in many cases, but for code lists, I prefer the short alpha key for the reasons stated above.

I would store the string representation, as this is easy to correlate back to the enum and much more stable. Using ordinal() would be bad because it can change if you add a new enum to the middle of the series, so you would have to implement your own numbering system.
In terms of performance, it all depends on what the enums would be used for, but it is most likely a premature optimization to develop a whole separate representation with conversion rather than just use the natural String representation.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.