How to generate strings that share the same hashcode in Java?

How to generate strings that share the same hashcode in Java? - java

An existing system written in Java uses the hashcode of a string as its routing strategy for load balancing.
Now, I cannot modify the system but need to generate strings that share the same hashcode to test the worst condition.
I provide those strings from commandline and hope the system will route all these strings into the same destination.
Is it possible to generate a large numbers of strings that share the same hashcode?
To make this question clear:
String[] getStringsInSameHashCode(int number){
//return an array in length "number"
//Every element of the array share the same hashcode.
//The element should be different from each other
}
Remarks: Any hashCode value is acceptable. There is no constraint on what the string is. But they should be different from each other.
EDIT:
Override method of String class is not acceptable because I feed those string from command line.
Instrumentation is also not acceptable because that will make some impacts on the system.

see a test method, basically, so long as you match,
a1*31+b1 = a2*31 +b2, which means (a1-a2)*31=b2-b1
public void testHash()
{
System.out.println("A:" + ((int)'A'));
System.out.println("B:" + ((int)'B'));
System.out.println("a:" + ((int)'a'));
System.out.println(hash("Aa".hashCode()));
System.out.println(hash("BB".hashCode()));
System.out.println(hash("Aa".hashCode()));
System.out.println(hash("BB".hashCode()));
System.out.println(hash("AaAa".hashCode()));
System.out.println(hash("BBBB".hashCode()));
System.out.println(hash("AaBB".hashCode()));
System.out.println(hash("BBAa".hashCode()));
}
you will get
A:65
B:66
a:97
2260
2260
2260
2260
2019172
2019172
2019172
2019172
edit: someone said this is not straightforward enough. I added below part
#Test
public void testN() throws Exception {
List<String> l = HashCUtil.generateN(3);
for(int i = 0; i < l.size(); ++i){
System.out.println(l.get(i) + "---" + l.get(i).hashCode());
}
}
AaAaAa---1952508096
AaAaBB---1952508096
AaBBAa---1952508096
AaBBBB---1952508096
BBAaAa---1952508096
BBAaBB---1952508096
BBBBAa---1952508096
BBBBBB---1952508096
below is the source code, it might be not efficient, but it work:
public class HashCUtil {
private static String[] base = new String[] {"Aa", "BB"};
public static List<String> generateN(int n)
{
if(n <= 0)
{
return null;
}
List<String> list = generateOne(null);
for(int i = 1; i < n; ++i)
{
list = generateOne(list);
}
return list;
}
public static List<String> generateOne(List<String> strList)
{
if((null == strList) || (0 == strList.size()))
{
strList = new ArrayList<String>();
for(int i = 0; i < base.length; ++i)
{
strList.add(base[i]);
}
return strList;
}
List<String> result = new ArrayList<String>();
for(int i = 0; i < base.length; ++i)
{
for(String str: strList)
{
result.add(base[i] + str);
}
}
return result;
}
}
look at String.hashCode()
public int hashCode() {
int h = hash;
if (h == 0) {
int off = offset;
char val[] = value;
int len = count;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}

I think find a equal-hash string from a long string is too hard, it's easy when find equal-hash string of an short string (2 or 3).
Look at the equation below. (sorry I cant post image cause me new member)
Notice that, "FB" and "Ea" have the same hashcode, and any two strings like s1+"FB"+s2 and s1+"Ea"+s2 will have the same hashcode.
So, the easy solution is finding any 2-char substring of existing string and replace with a 2-char substring with the same hashcode
Exmaple, we have the string "helloworld"
get 2-char substring "he", hashcode("he") = 'h'*31 + 'e' = ('h'*31 + 31) + ('e' - 31) = ('h'+1)*31 + 'F' = 'i' + 'F' = hashcode("iF")
so the desire string is "iFlloworld"
we have increased 'h' by 1, we can increase by 2, or 3 etc (but will be wrong if it overflow the char value)
The below code run well with small level, it will wrong if the level is big, make the char value overflow, I will fix it later if you want (this code change 2 first chars, but I will edit code to 2 last chars because 2 first chars are calc with largest value)
public static String samehash(String s, int level) {
if (s.length() < 2)
return s;
String sub2 = s.substring(0, 2);
char c0 = sub2.charAt(0);
char c1 = sub2.charAt(1);
c0 = (char) (c0 + level);
c1 = (char) (c1 - 31 * level);
String newsub2 = new String(new char[] { c0, c1 });
String re = newsub2 + s.substring(2);
return re;
}

I was wondering if there was a "universal" solution; e.g. some constant string XYZ, such that
s.hashCode() == (s + XYZ).hashCode()
for any string s. Finding such a string involves solving a fairly complicated equation ... which was beyond my rusty mathematical ability. But then it dawned on me that h == 31*h + ch is always true when h and ch are both zero!
Based on that insight, the following method should create a different String with the same hashcode as its argument:
public String collider(String s) {
return "\0" + s;
}
If NUL characters are problematic for you, prepending any string whose hashcode is zero would work too ... albeit that the colliding strings would be longer than if you used zero.

Given String X, then String Y = "\u0096\0\0ɪ\0ˬ" + X will share same hashcode with X.
Explanation:
String.hashcode() returns Integer, and every Integer X in java has property that X = X + 2 * (Integer.MAX_VALUE + 1). Here, Integer.MAX_VALUE = 2 ^ 31 - 1;
So we only need to find String M, which has the property that M's hashcode % (2 * (Integer.MAX_VALUE + 1)) = 0;
I find "\u0096\0\0ɪ\0ˬ" : \u0096 's ascii code is 150,\0 's ascii code is 0, ɪ's ascii code is 618, ˬ's ascii code is 748, so its hashcode is 150 * 31 ^ 5 + 618 * 31 ^ 2 + 748 = 2 ^ 32 = 0;
It is up to you which string you would like, and I pick this one.

You can instrument the java.lang.String class so its method hashCode() will always return the same number.
I suppose Javassist is the easiest way to do such an instrumentation.
In short:
obtain an instance of java.lang.instrument.Instrumentation by using a Java-agent (see package java.lang.instrument documentation for details)
redefine java.lang.String class by using Instrumentation.redefineClasses(ClassDefinition[]) method
The code will look like (roughly):
ClassPool classPool = new ClassPool(true);
CtClass stringClass = classPool.get("java.lang.String");
CtMethod hashCodeMethod = stringClass.getDeclaredMethod("hashCode", null);
hashCodeMethod.setBody("{return 0;}");
byte[] bytes = stringClass.toBytecode();
ClassDefinition[] classDefinitions = new ClassDefinition[] {new ClassDefinition(String.class, bytes);
instrumentation.redefineClasses(classDefinitions);// this instrumentation can be obtained via Java-agent
Also don't forget that agent manifest file must specify Can-Redefine-Classes: true to be able to use redefineClasses(ClassDefinition[]) method.

String s = "Some String"
for (int i = 0; i < SOME_VERY_BIG_NUMBER; ++i) {
String copy = new String(s);
// Do something with copy.
}
Will this work for you? It just creates a lot of copies of the same String literal that you can then use in your testing.

Related

How to efficiently remove consecutive same characters in a string

I wrote a method to reduce a sequence of the same characters to a single character as follows. It seems its logic is correct while there is a room for improvement in terms of performance, according to my tutor. Could anyone shed some light on this?
Comments of aspects other than performance is also really appreciated.
public class RemoveRepetitions {
public static String remove(String input) {
String ret = "";
String last = "";
String[] stringArray = input.split("");
for(int j=0; j < stringArray.length; j++) {
if (! last.equals(stringArray[j]) ) {
ret += stringArray[j];
}
last = stringArray[j];
}
return ret;
}
public static void main(String[] args) {
System.out.println(RemoveRepetitions.remove("foobaarrbuzz"));
}
}

We can improve the performance by using StringBuilder instead of using string as string operations are costlier. Also, the split function is also not required (it will make the program slower as well).
Here is a way to solve this:
public static String remove(String input)
{
StringBuilder answer = new StringBuilder("");
int N = input.length();
int i = 0;
while (i < N)
{
char c = input.charAt(i);
answer.append( c );
while (i<N && input.charAt(i)==c)
++i;
}
return answer.toString();
}
The idea is to iterate over all characters of the input string and keep appending every new character to the answer and skip all the same consecutive characters.

Possible change which you could think of in your code is:
Time Complexity: Your code is achieving output in O(n) time complexity, which might be the best possible way.
Space Complexity: Your code is using extra memory space which arises due to splitting.
Question to ask: Can you achieve this output, without using the extra space for character array that you get after splitting the string? (as character by character traversal is possible directly on string).
I can provide you the code here but, it would be great if you could try it on your own, once you are done with your attempts
you can lookup for the best solution here (you are almost there)
https://www.geeksforgeeks.org/remove-consecutive-duplicates-string/
Good luck!

As mentioned before, it is much better to access the characters in the string using method String::charAt or at least by iterating a char array retrieved with String::toCharArray instead of splitting the input string into String array.
However, Java strings may contain characters exceeding basic multilingual plane of Unicode (e.g. emojis 😂😍😊, Chinese or Japanese characters etc.) and therefore String::codePointAt should be used. Respectively, Character.charCount should be used to calculate appropriate offset while iterating the input string.
Also the input string should be checked if it's null or empty, so the resulting code may look like this:
public static String dedup(String str) {
if (null == str || str.isEmpty()) {
return str;
}
int prev = -1;
int n = str.length();
System.out.println("length = " + n + " of [" + str + "], real length: " + str.codePointCount(0, n));
StringBuilder sb = new StringBuilder(n);
for (int i = 0; i < n; ) {
int cp = str.codePointAt(i);
if (i == 0 || cp != prev) {
sb.appendCodePoint(cp);
}
prev = cp;
i += Character.charCount(cp); // for emojis it returns 2
}
return sb.toString();
}
A version with String::charAt may look like this:
public static String dedup2(String str) {
if (null == str || str.isEmpty()) {
return str;
}
int n = str.length();
StringBuilder sb = new StringBuilder(n);
sb.append(str.charAt(0));
for (int i = 1; i < n; i++) {
if (str.charAt(i) != str.charAt(i - 1)) {
sb.append(str.charAt(i));
}
}
return sb.toString();
}
The following test proves that charAt fails to deduplicate repeated emojis:
System.out.println("codePoint: " + dedup ("😂😂😍😍😊😊😂 hello"));
System.out.println("charAt: " + dedup2("😂😂😍😍😊😊😂 hello"));
Output:
length = 20 of [😂😂😍😍😊😊😂 hello], real length: 13
codePoint: 😂😍😊😂 helo
charAt: 😂😂😍😍😊😊😂 helo

Get all possible combinations of n booleans? [duplicate]

I tied to simplify the task as much as possible, so I could apply it to my algorithm.
And here is the challenge for mathematicians and programmers:
I need to create a method where I pass parameter int n:
public void optionality_generator(int n){
//some kind of loops, or recursions...to make it workable
System.out.println("current combination: ...");
}
The output should show all possible combinations of true's and false's.
Here is examples where N=1; N=2; N=3; N=4; N=5 where x=false and 0=true; Please note, empty break lines is just for you to recognise easier the patterns. Hopefully, I included all possible combinations):
Combination of 1:
0
x
Combination of 2:
00
x0
0x
xx
Combination of 3:
000
X00
0X0
00X
XX0
0XX
XXX
Combination of 4:
0000
X000
0X00
00X0
000X
XX00
X0X0
X00X
0XX0
0X0X
00XX
XXX0
XX0X
X0XX
0XXX
XXXX
Combination of 5:
00000
X0000
0X000
00X00
000X0
0000X
XX000
X0X00
X00X0
X000X
X0X00
X00X0
X000X
0XX00
0X0X0
0X00X
00XX0
00X0X
000XX
XXX00
XX0X0
XX00X
X0XX0
X0X0X
X00XX
0XXX0
0XX0X
00XXX
XXXX0
XXX0X
XX0XX
X0XXX
0XXXX
XXXXX
Also, If you see the output, here is the pattern I recognized, that all combinations are inverted on half (e.g first combination is 00000 last one will be XXXXX, second one X0000, one before the last one will be 0XXXX etc..). Maybe, this pattern will help to make the whole algorithm more efficient, not sure about this.
Thank you in advance!

Here is a really basic way using only Java APIs:
final int n = 3;
for (int i = 0; i < Math.pow(2, n); i++) {
String bin = Integer.toBinaryString(i);
while (bin.length() < n)
bin = "0" + bin;
System.out.println(bin);
}
Result:
000
001
010
011
100
101
110
111
Of course, you can set n to whatever you like. And, with this result, you can pick the nth character from the string as true/false.
If you only need to check if a bit is true, you don't need to convert it to a string. This is just to illustrate the output values.

Just a clue but think about the bits that are set for a number with at most 'n' bits. You'll see if you go from 0 to 'n' number of bits (3 in this case); the bits are 000, 001, 010, 011, 100, 101, 110, 111. You can figure out the max number that can fit in 'n' bits by using the ((n*n)-1) formula.

This should do the trick
int cols = 3;
int rows = (int) Math.pow(2, cols);
for (int row = 0; row < rows; row++)
System.out.println(String.format("%" + cols + "s",
Integer.toBinaryString(row)).replace(' ', '0').replace('1', 'X'));
out:
000
00X
0X0
0XX
X00
X0X
XX0
XXX

Using recursion is not as easy as using the Java Integer.toBinaryString() API for generating binary strings. But the code below gives you the flexibility to generate any base representation, e.g. base 3:
"000"
"001"
"002"
"010"
"011"
"012"
For base 2 (i.e. binary) strings, you call it like this:
getBinaryStrings(2, 3);
For base 3 strings, you call it like this:
getBinaryStrings(3, 3);
Here is the code:
public static List<String> getBinaryStrings(int base, int n){
ArrayList<String> result = new ArrayList<>();
getBinaryStringsCore(base, n, "", result);
return result;
}
private static void getBinaryStringsCore(int base, int n, String tempString, List<String> result){
if (tempString.length() == n) {
result.add(tempString);
return;
}
for (int i = 0; i < base; i++) {
tempString += i;
getBinaryStringsCore(base, n, tempString, result);
tempString = tempString.substring(0, tempString.length() - 1);
}
}

Here's a simple version implemented using recursion
public void optionality_generator(int n){
ArrayList<String> strings = generatorHelper(n);
for(String s : strings){
System.out.println(s);
}
}
private ArrayList<String> generatorHelper(int n){
if(n == 1){
ArrayList<String> returnVal = new ArrayList<String>();
returnVal.add("0");
returnVal.add("X");
return returnVal;
}
ArrayList<String> trueStrings = generatorHelper(n-1);
for(String s : trueStrings){
s += "0";
}
ArrayList<String> falseStrings = generatorHelper(n-1);
for(String s : falseStrings){
s += "X";
}
trueStrings.addAll(falseStrings);
return trueStrings;
}

Here's a test-driven version:
import static org.junit.Assert.assertEquals;
import java.util.ArrayList;
import java.util.List;
import org.junit.Test;
public class OptionalityTest {
#Test
public void testOptionality0() throws Exception {
assertEquals("[]", optionality(0).toString());
}
#Test
public void testOptionality1() throws Exception {
assertEquals("[0, x]", optionality(1).toString());
}
#Test
public void testOptionality2() throws Exception {
assertEquals("[00, x0, 0x, xx]", optionality(2).toString());
}
#Test
public void testOptionality3() throws Exception {
assertEquals("[000, x00, 0x0, xx0, 00x, x0x, 0xx, xxx]", optionality(3).toString());
}
private List<String> optionality(int i) {
final ArrayList<String> list = new ArrayList<String>();
if (i == 1) {
list.add("0");
list.add("x");
}
if (i > 1) {
List<String> sublist = optionality(i - 1);
for (String s : sublist) {
list.add("0" + s);
list.add("x" + s);
}
}
return list;
}
}

Here is a modification from Erics code above, that uses c# and allows input of any number of boolean variable names. It will output all possible combinations in c# code ready for insert into an if statement. Just edit the 1st line of code with var names, and then run in LINQpad to get a text output.
Output example...
!VariableNameA && !VariableNameB && !VariableNameC
!VariableNameA && !VariableNameB && VariableNameC
!VariableNameA && VariableNameB && !VariableNameC
!VariableNameA && VariableNameB && VariableNameC
VariableNameA && !VariableNameB && !VariableNameC
VariableNameA && !VariableNameB && VariableNameC
VariableNameA && VariableNameB && !VariableNameC
VariableNameA && VariableNameB && VariableNameC
//To setup edit var names below
string[] varNames = { "VariableNameA", "VariableNameB", "VariableNameC" };
int n = varNames.Count();
for (int i = 0; i < Math.Pow(2, n); i++) {
String bin = Convert.ToString(i, 2);
while (bin.Length < n) {
bin = "0" + bin;
}
string and = " && ";
string f = "!";
string t = " ";
var currentNot = bin[0] == '0' ? f : t;
//string visual = bin[0].ToString();
string visual = currentNot + varNames[0];
for (var j = 1; j < n; j++) {
currentNot = bin[j] == '0' ? f : t;
//visual = visual + and + bin[j].ToString();
visual = visual + and + currentNot + varNames[j];
}
Console.WriteLine(visual);
}

How to extract the left most common characters in a string list?

Assume I have the following list of string objects:
ABC1, ABC2, ABC_Whatever
What's the most efficient way to extract the left most common characters from this list ? So I'd get ABC in my case.

StringUtils.getCommonPrefix(String... strs) from Apache Commons Lang.

This will work for you
public static void main(String args[]) {
String commonInFirstTwo=greatestCommon("ABC1","ABC2");
String commonInLastTwo=greatestCommon("ABC2","ABC_Whatever");
System.out.println(greatestCommon(commonInFirstTwo,commonInLastTwo));
}
public static String greatestCommon(String a, String b) {
int minLength = Math.min(a.length(), b.length());
for (int i = 0; i < minLength; i++) {
if (a.charAt(i) != b.charAt(i)) {
return a.substring(0, i);
}
}
return a.substring(0, minLength);
}

You hash all the substrings of the words in the given list and keep track of those substrings. The one with the maximum occurrences is the one you want. Here is a sample implementation. It returns the most common substring
static String mostCommon(List<String> list) {
Map<String, Integer> word2Freq = new HashMap<String, Integer>();
String maxFreqWord = null;
int maxFreq = 0;
for (String word : list) {
for (int i = 0; i < word.length(); ++i) {
String sub = word.substring(0, i + 1);
Integer f = word2Freq.get(sub);
if (f == null) {
f = 0;
}
word2Freq.put(sub, f + 1);
if (f + 1 > maxFreq) {
if (maxFreqWord == null || maxFreqWord.length() < sub.length()) {
maxFreq = f + 1;
maxFreqWord = sub;
}
}
}
}
return maxFreqWord;
}
The above implementation may not suffice if you more than one common substring. Use the map within it.
System.out.println(mostCommon(Arrays.asList("ABC1", "ABC2", "ABC_Whatever")));
System.out.println(mostCommon(Arrays.asList("ABCDEFG1", "ABGG2", "ABC11_Whatever")));
Returns
ABC
AB

Your problem is just a rephrase of the standard problem of finding the longest common prefix

If you know what the common characters are, then you could check if the other strings contain those characters by using the .contains() method.

If you're willing to use a third party library, then the following using jOOλ generates that prefix for you:
String prefix = Seq.of("ABC1", "ABC2", "ABC_Whatever").commonPrefix();
Disclaimer: I work for the company behind jOOλ

if there are N strings and the minimum length among them is M charterers, then the most efficient (correct) answer will take N * M at worst case (when all strings are same).
outer loop - each character of first string at a time
inner loop - each of the strings
test - each charterer of the string in inner
loop against the charterer in outer loop.
the performance can be tuned upto (N-1) * M if we do not test against the first string in ther inner loop

A short as possible unique ID

I'm making a tool for optimizing script and now I want to compress all names in it to the minimum.
I got the function started for it, but it somehow bugs and stops after length 2 is exceeded.
Is there an easier way to do this? I just need a pattern that generates a String starting from a -> z then aa -> az ba -> bz and so on.
public String getToken() {
String result = ""; int i = 0;
while(i < length){
result = result + charmap.substring(positions[i], positions[i]+1);
positions[length]++;
if (positions[current] >= charmap.length()){
positions[current] = 0;
if ( current < 1 ) {
current++;length++;
}else{
int i2 = current-1;
while( i2 > -1 ){
positions[i2]++;
if(positions[i2] < charmap.length()){
break;
}else if( i2 > 0 ){
positions[i2] = 0;
}else{
positions[i2] = 0;
length++;current++;
}
i2--;
}
}
}
i++;
}
return result;
}
UNLIKE THE OTHER QUESTIONS!! I dont just want to increase an integer, the length increases to much.

Here's one I used
public class AsciiID {
private static final String alphabet=
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
private int currentId;
public String nextId() {
int id = currentId++;
StringBuilder b = new StringBuilder();
do {
b.append(alphabet.charAt(id % alphabet.length()));
} while((id /=alphabet.length()) != 0);
return b.toString();
}
}

I would use a base 36 or base 64 (depending on case sensitivity) library and run it with an integer and before you output, convert the integer to a base 36/64 number. You can think in terms of sequence, which is easier, and the output value is handled by a trusted library.

You can use:
Integer.toString(i++, Character.MAX_RADIX)
It's base36. It will be not as greatly compressed as Base64 but you have a 1-line implementation.

You could search for some library that operates numbers of any radix, say 27, 37 or more. Then you output that number as alphanumeric string (like HEX, but with a-zA-Z0-9).

Well let's assume we can only output ASCII (for unicode this problem gets.. complicated): As a quick look shows its printable characters are in the range [32,126]. So to get the most efficient representation of this problem we have to encode a given integer in base 94 so to speak and add 32 to any generated char.
How you do that? Look up how Sun does it in Integer.toString() and adapt it accordingly. Well it's probably more complex than necessary - just think about how you convert a number into radix 2 and adapt that. In its simplest form that's basically a loop with one division and modulo.

In your tool you need to create a dictionary, which will contain an unique integer id for each unique string and the string itself. When adding strings to the dictionary you increment given id for each newly added unique string. Once dictionary is completed, you can simply convert ids to String using something like this:
static final String CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
static final int CHARS_LENGTH = CHARS.length();
public String convert(int id) {
StringBuilder sb = new StringBuilder();
do {
sb.append(CHARS.charAt(id % CHARS_LENGTH));
id = id / CHARS_LENGTH;
} while(id != 0);
return sb.toString();
}

This function generates the Nth Bijective Number (except zeroth). This is the most optimal coding ever possible. (The zeroth would be an empty string.)
If there were 10 possible characters, 0-9, it generates, in order:
10 strings of length 1, from "0" to "9"
10*10 strings of length 2, from "00" to "99"
10*10*10 strings of length 3, from "000" to "999"
etc.
The example uses 93 characters, because I just happened to need those for Json.
private static final char[] ALLOWED_CHARS =
" !#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~"
.toCharArray();
private static final AtomicInteger uniqueIdCounter = new AtomicInteger();
public static String getToken() {
int id = uniqueIdCounter.getAndIncrement();
return toBijectiveNumber(id, ALLOWED_CHARS);
}
public static String toBijectiveNumber(int id, char[] allowedChars) {
assert id >= 0;
StringBuilder sb = new StringBuilder(8);
int divisor = 1;
int length = 1;
while (id >= divisor * allowedChars.length) {
divisor *= allowedChars.length;
length++;
id -= divisor;
}
for (int i = 0; i < length; i++) {
sb.append(allowedChars[(id / divisor) % allowedChars.length]);
divisor /= allowedChars.length;
}
return sb.toString();
}

Efficient way to compare version strings in Java [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How do you compare two version Strings in Java?
I've 2 strings which contains version information as shown below:
str1 = "1.2"
str2 = "1.1.2"
Now, can any one tell me the efficient way to compare these versions inside strings in Java & return 0 , if they're equal, -1, if str1 < str2 & 1 if str1>str2.

Requires commons-lang3-3.8.1.jar for string operations.
/**
* Compares two version strings.
*
* Use this instead of String.compareTo() for a non-lexicographical
* comparison that works for version strings. e.g. "1.10".compareTo("1.6").
*
* #param v1 a string of alpha numerals separated by decimal points.
* #param v2 a string of alpha numerals separated by decimal points.
* #return The result is 1 if v1 is greater than v2.
* The result is 2 if v2 is greater than v1.
* The result is -1 if the version format is unrecognized.
* The result is zero if the strings are equal.
*/
public int VersionCompare(String v1,String v2)
{
int v1Len=StringUtils.countMatches(v1,".");
int v2Len=StringUtils.countMatches(v2,".");
if(v1Len!=v2Len)
{
int count=Math.abs(v1Len-v2Len);
if(v1Len>v2Len)
for(int i=1;i<=count;i++)
v2+=".0";
else
for(int i=1;i<=count;i++)
v1+=".0";
}
if(v1.equals(v2))
return 0;
String[] v1Str=StringUtils.split(v1, ".");
String[] v2Str=StringUtils.split(v2, ".");
for(int i=0;i<v1Str.length;i++)
{
String str1="",str2="";
for (char c : v1Str[i].toCharArray()) {
if(Character.isLetter(c))
{
int u=c-'a'+1;
if(u<10)
str1+=String.valueOf("0"+u);
else
str1+=String.valueOf(u);
}
else
str1+=String.valueOf(c);
}
for (char c : v2Str[i].toCharArray()) {
if(Character.isLetter(c))
{
int u=c-'a'+1;
if(u<10)
str2+=String.valueOf("0"+u);
else
str2+=String.valueOf(u);
}
else
str2+=String.valueOf(c);
}
v1Str[i]="1"+str1;
v2Str[i]="1"+str2;
int num1=Integer.parseInt(v1Str[i]);
int num2=Integer.parseInt(v2Str[i]);
if(num1!=num2)
{
if(num1>num2)
return 1;
else
return 2;
}
}
return -1;
}

As others have pointed out, String.split() is a very easy way to do the comparison you want, and Mike Deck makes the excellent point that with such (likely) short strings, it probably won't matter much, but what the hey! If you want to make the comparison without manually parsing the string, and have the option of quitting early, you could try the java.util.Scanner class.
public static int versionCompare(String str1, String str2) {
try ( Scanner s1 = new Scanner(str1);
Scanner s2 = new Scanner(str2);) {
s1.useDelimiter("\\.");
s2.useDelimiter("\\.");
while (s1.hasNextInt() && s2.hasNextInt()) {
int v1 = s1.nextInt();
int v2 = s2.nextInt();
if (v1 < v2) {
return -1;
} else if (v1 > v2) {
return 1;
}
}
if (s1.hasNextInt() && s1.nextInt() != 0)
return 1; //str1 has an additional lower-level version number
if (s2.hasNextInt() && s2.nextInt() != 0)
return -1; //str2 has an additional lower-level version
return 0;
} // end of try-with-resources
}

This is almost certainly not the most efficient way to do it, but given that version number strings will almost always be only a few characters long I don't think it's worth optimizing further:
public static int compareVersions(String v1, String v2) {
String[] components1 = v1.split("\\.");
String[] components2 = v2.split("\\.");
int length = Math.min(components1.length, components2.length);
for(int i = 0; i < length; i++) {
int result = new Integer(components1[i]).compareTo(Integer.parseInt(components2[i]));
if(result != 0) {
return result;
}
}
return Integer.compare(components1.length, components2.length);
}

I was looking to do this myself and I see three different approaches to doing this, and so far pretty much everyone is splitting the version strings. I do not see doing that as being efficient, though code size wise it reads well and looks good.
Approaches:
Assume an upper limit to the number of sections (ordinals) in a version string as well as a limit to the value represented there. Often 4 dots max, and 999 maximum for any ordinal. You can see where this is going, and it's going towards transforming the version to fit into a string like: "1.0" => "001000000000" with string format or some other way to pad each ordinal. Then do a string compare.
Split the strings on the ordinal separator ('.') and iterate over them and compare a parsed version. This is the approach demonstrated well by Alex Gitelman.
Comparing the ordinals as you parse them out of the version strings in question. If all strings were really just pointers to arrays of characters as in C then this would be the clear approach (where you'd replace a '.' with a null terminator as it's found and move some 2 or 4 pointers around.
Thoughts on the three approaches:
There was a blog post linked that showed how to go with 1. The limitations are in version string length, number of sections and maximum value of the section. I don't think it's crazy to have such a string that breaks 10,000 at one point. Additionally most implementations still end up splitting the string.
Splitting the strings in advance is clear to read and think about, but we are going through each string about twice to do this. I'd like to compare how it times with the next approach.
Comparing the string as you split it give you the advantage of being able to stop splitting very early in a comparison of: "2.1001.100101.9999998" to "1.0.0.0.0.0.1.0.0.0.1". If this were C and not Java the advantages could go on to limit the amount of memory allocated for new strings for each section of each version, but it is not.
I didn't see anyone giving an example of this third approach, so I'd like to add it here as an answer going for efficiency.
public class VersionHelper {
/**
* Compares one version string to another version string by dotted ordinals.
* eg. "1.0" > "0.09" ; "0.9.5" < "0.10",
* also "1.0" < "1.0.0" but "1.0" == "01.00"
*
* #param left the left hand version string
* #param right the right hand version string
* #return 0 if equal, -1 if thisVersion < comparedVersion and 1 otherwise.
*/
public static int compare(#NotNull String left, #NotNull String right) {
if (left.equals(right)) {
return 0;
}
int leftStart = 0, rightStart = 0, result;
do {
int leftEnd = left.indexOf('.', leftStart);
int rightEnd = right.indexOf('.', rightStart);
Integer leftValue = Integer.parseInt(leftEnd < 0
? left.substring(leftStart)
: left.substring(leftStart, leftEnd));
Integer rightValue = Integer.parseInt(rightEnd < 0
? right.substring(rightStart)
: right.substring(rightStart, rightEnd));
result = leftValue.compareTo(rightValue);
leftStart = leftEnd + 1;
rightStart = rightEnd + 1;
} while (result == 0 && leftStart > 0 && rightStart > 0);
if (result == 0) {
if (leftStart > rightStart) {
return containsNonZeroValue(left, leftStart) ? 1 : 0;
}
if (leftStart < rightStart) {
return containsNonZeroValue(right, rightStart) ? -1 : 0;
}
}
return result;
}
private static boolean containsNonZeroValue(String str, int beginIndex) {
for (int i = beginIndex; i < str.length(); i++) {
char c = str.charAt(i);
if (c != '0' && c != '.') {
return true;
}
}
return false;
}
}
Unit test demonstrating expected output.
public class VersionHelperTest {
#Test
public void testCompare() throws Exception {
assertEquals(1, VersionHelper.compare("1", "0.9"));
assertEquals(1, VersionHelper.compare("0.0.0.2", "0.0.0.1"));
assertEquals(1, VersionHelper.compare("1.0", "0.9"));
assertEquals(1, VersionHelper.compare("2.0.1", "2.0.0"));
assertEquals(1, VersionHelper.compare("2.0.1", "2.0"));
assertEquals(1, VersionHelper.compare("2.0.1", "2"));
assertEquals(1, VersionHelper.compare("0.9.1", "0.9.0"));
assertEquals(1, VersionHelper.compare("0.9.2", "0.9.1"));
assertEquals(1, VersionHelper.compare("0.9.11", "0.9.2"));
assertEquals(1, VersionHelper.compare("0.9.12", "0.9.11"));
assertEquals(1, VersionHelper.compare("0.10", "0.9"));
assertEquals(0, VersionHelper.compare("0.10", "0.10"));
assertEquals(-1, VersionHelper.compare("2.10", "2.10.1"));
assertEquals(-1, VersionHelper.compare("0.0.0.2", "0.1"));
assertEquals(1, VersionHelper.compare("1.0", "0.9.2"));
assertEquals(1, VersionHelper.compare("1.10", "1.6"));
assertEquals(0, VersionHelper.compare("1.10", "1.10.0.0.0.0"));
assertEquals(1, VersionHelper.compare("1.10.0.0.0.1", "1.10"));
assertEquals(0, VersionHelper.compare("1.10.0.0.0.0", "1.10"));
assertEquals(1, VersionHelper.compare("1.10.0.0.0.1", "1.10"));
}
}

Split the String on "." or whatever your delimeter will be, then parse each of those tokens to the Integer value and compare.
int compareStringIntegerValue(String s1, String s2, String delimeter)
{
String[] s1Tokens = s1.split(delimeter);
String[] s2Tokens = s2.split(delimeter);
int returnValue = 0;
if(s1Tokens.length > s2Tokens.length)
{
for(int i = 0; i<s1Tokens.length; i++)
{
int s1Value = Integer.parseString(s1Tokens[i]);
int s2Value = Integer.parseString(s2Tokens[i]);
Integer s1Integer = new Integer(s1Value);
Integer s2Integer = new Integer(s2Value);
returnValue = s1Integer.compareTo(s2Value);
if( 0 == isEqual)
{
continue;
}
return returnValue; //end execution
}
return returnValue; //values are equal
}
I will leave the other if statement as an exercise.

Comparing version strings can be a mess; you're getting unhelpful answers because the only way to make this work is to be very specific about what your ordering convention is. I've seen one relatively short and complete version comparison function on a blog post, with the code placed in the public domain- it isn't in Java but it should be simple to see how to adapt this.

Adapted from Alex Gitelman's answer.
int compareVersions( String str1, String str2 ){
if( str1.equals(str2) ) return 0; // Short circuit when you shoot for efficiency
String[] vals1 = str1.split("\\.");
String[] vals2 = str2.split("\\.");
int i=0;
// Most efficient way to skip past equal version subparts
while( i<vals1.length && i<val2.length && vals[i].equals(vals[i]) ) i++;
// If we didn't reach the end,
if( i<vals1.length && i<val2.length )
// have to use integer comparison to avoid the "10"<"1" problem
return Integer.valueOf(vals1[i]).compareTo( Integer.valueOf(vals2[i]) );
if( i<vals1.length ){ // end of str2, check if str1 is all 0's
boolean allZeros = true;
for( int j = i; allZeros & (j < vals1.length); j++ )
allZeros &= ( Integer.parseInt( vals1[j] ) == 0 );
return allZeros ? 0 : -1;
}
if( i<vals2.length ){ // end of str1, check if str2 is all 0's
boolean allZeros = true;
for( int j = i; allZeros & (j < vals2.length); j++ )
allZeros &= ( Integer.parseInt( vals2[j] ) == 0 );
return allZeros ? 0 : 1;
}
return 0; // Should never happen (identical strings.)
}
So as you can see, not so trivial. Also this fails when you allow leading 0's, but I've never seen a version "1.04.5" or w/e. You would need to use integer comparison in the while loop to fix that. This gets even more complex when you mix letters with numbers in the version strings.

Split them into arrays and then compare.
// check if two strings are equal. If they are return 0;
String[] a1;
String[] a2;
int i = 0;
while (true) {
if (i == a1.length && i < a2.length) return -1;
else if (i < a1.length && i == a2.length) return 1;
if (a1[i].equals(a2[i]) {
i++;
continue;
}
return a1[i].compareTo(a2[i];
}
return 0;

I would divide the problem in two, formating and comparing. If you can assume that the format is correct, then comparing only numbers version is very simple:
final int versionA = Integer.parseInt( "01.02.00".replaceAll( "\\.", "" ) );
final int versionB = Integer.parseInt( "01.12.00".replaceAll( "\\.", "" ) );
Then both versions can be compared as integers. So the "big problem" is the format, but that can have many rules. In my case i just complete a minimum of two pair of digits, so the format is "99.99.99" always, and then i do the above conversion; so in my case the program logic is in the formatting, and not in the version comparison. Now, if you are doing something very specific and maybe you can trust the origin of the version string, maybe you just can check the length of the version string and then just do the int conversion... but i think it's a best practice to make sure the format is as expected.

Step1 : Use StringTokenizer in java with dot as delimiter
StringTokenizer(String str, String delimiters) or
You can use String.split() and Pattern.split(), split on dot and then convert each String to Integer using Integer.parseInt(String str)
Step 2: Compare integer from left to right.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to generate strings that share the same hashcode in Java? - java

String s = "Some String" for (int i = 0; i < SOME_VERY_BIG_NUMBER; ++i) { String copy = new String(s); // Do something with copy. } Will this work for you? It just creates a lot of copies of the same String literal that you can then use in your testing.

Related

How to efficiently remove consecutive same characters in a string

Get all possible combinations of n booleans? [duplicate]

How to extract the left most common characters in a string list?

A short as possible unique ID

Efficient way to compare version strings in Java [duplicate]

Categories

Resources