Convert hashcode to limited set of string - java

I know it's one-way function but I want to convert hashcode back to limited set of string (using char between 32 to 126). Is there an efficient way to do this?

It's not only feasible - it's actually pretty simple, given the definition for String.hashCode. You can just create a string of "base 31" characters with some arbitrary starting point to keep everything in the right range, subtracting an offset based on that starting point.
This isn't necessarily the shortest string with the given hash code, but 7 characters is pretty short :)
public class Test {
public static void main(String[] args) {
int hash = 100000;
String sample = getStringForHashCode(hash);
System.out.println(sample); // ASD^TYQ
System.out.println(sample.hashCode()); // 100000
}
private static final int OFFSET = "AAAAAAA".hashCode();
private static String getStringForHashCode(int hash) {
hash -= OFFSET;
// Treat it as an unsigned long, for simplicity.
// This avoids having to worry about negative numbers anywhere.
long longHash = (long) hash & 0xFFFFFFFFL;
System.out.println(longHash);
char[] c = new char[7];
for (int i = 0; i < 7; i++)
{
c[6 - i] = (char) ('A' + (longHash % 31));
longHash /= 31;
}
return new String(c);
}
}

Actually, I only need one string from that hashcode. I want to make Minecraft seed shortener.
The simplest way to turn an int value into a short string is to use
String s = Integer.toString(n, 36); // uses base 36.

Related

Reverse long array to string algorithm

i need to reverse the following algorithm which converts a long array into a string:
public final class LongConverter {
private final long[] l;
public LongConverter(long[] paramArrayOfLong) {
this.l = paramArrayOfLong;
}
private void convertLong(long paramLong, byte[] paramArrayOfByte, int paramInt) {
int i = Math.min(paramArrayOfByte.length, paramInt + 8);
while (paramInt < i) {
paramArrayOfByte[paramInt] = ((byte) (int) paramLong);
paramLong >>= 8;
paramInt++;
}
}
public final String toString() {
int i = this.l.length;
byte[] arrayOfByte = new byte[8 * (i - 1)];
long l1 = this.l[0];
Random localRandom = new Random(l1);
for (int j = 1; j < i; j++) {
long l2 = localRandom.nextLong();
convertLong(this.l[j] ^ l2, arrayOfByte, 8 * (j - 1));
}
String str;
try {
str = new String(arrayOfByte, "UTF8");
} catch (UnsupportedEncodingException localUnsupportedEncodingException) {
throw new AssertionError(localUnsupportedEncodingException);
}
int k = str.indexOf(0);
if (-1 == k) {
return str;
}
return str.substring(0, k);
}
So when I do the following call
System.out.println(new LongConverter(new long[]{-6567892116040843544L, 3433539276790523832L}).toString());
it prints 400 as result.
It would be great if anyone could say what algorithm this is or how i could reverse it.
Thanks for your help
This is not a solvable problem as stated because
you only use l[0] so any additional long values could be anything.
it is guaranteed that there is N << 16 solutions to this problem. While the seed for random is 64-bit in reality the value used internally is 48-bit. This means is there is any solution, there if at least 16K solutions for a long seed.
What you can do is;
find the smallest seed which would generate the string using brute force. For a short strings this won't take long, however if you have 5-6 character this will take a while and for 7+ character there might not be a solution.
instead of generating 8-bit characters where all 8-bit values are equal. You could restrict the range to say space, A-Z, a-z and 0-9. This means you can have ~6-bits of randomness, shorter seeds and slightly longer Strings.
BTW You might find this post interesting where I use contrived random seeds to generate specific sequences. http://vanillajava.blogspot.co.uk/2011/10/randomly-no-so-random.html
If you want a process which ensures you can always re-create the original longs from a String or a byte[], I suggest using encryption. You can encrypt a String which has been UTF-8 encoded or a byte[] into another byte[] which can be base64 encoded to be readable as text. (Or you could skip the encryption and use base64 alone)

Fastest way to permute bits in a Java array

What is the fastest way to randomly (but repeatedly) permute all the bits within a Java byte array? I've tried successfully doing it with a BitSet, but is there a faster way? Clearly the for-loop consumes the majority of the cpu time.
I've just done some profiling in my IDE and the for-loop constitutes 64% of the cpu time within the entire permute() method.
To clarify, the array (preRound) contains an existing array of numbers going into the procedure. I want the individual set bits of that array to be mixed up in a random manner. This is the reason for P[]. It contains a random list of bit positions. So for example, if bit 13 of preRound is set, it is transferred to place P[13] of postRound. This might be at position 20555 of postRound. The whole thing is part of a substitution - permutation network, and I'm looking to the fastest way to permute the incoming bits.
My code so far...
private byte[] permute(byte[] preRound) {
BitSet beforeBits = BitSet.valueOf(preRound);
BitSet afterBits = new BitSet(blockSize * 8);
for (int i = 0; i < blockSize * 8; i++) {
assert i != P[i];
if (beforeBits.get(i)) {
afterBits.set(P[i]);
}
}
byte[] postRound = afterBits.toByteArray();
postRound = Arrays.copyOf(postRound, blockSize); // Pad with 0s to the specified length
assert postRound.length == blockSize;
return postRound;
}
FYI, blockSize is about 60,000 and P is a random lookup table.
I didn't perform any performance tests, but you may want to consider the following:
To omit the call to Arrays.copyOf (which copies the copy of the long[] used interally, which is kind of annoying), just set the last bit in case it wasn't set before and unset it afterwards.
Furthermore, there is a nice idiom to iterate over the set bits in the input permutation.
private byte[] permute(final byte[] preRound) {
final BitSet beforeBits = BitSet.valueOf(preRound);
final BitSet afterBits = new BitSet(blockSize*8);
for (int i = beforeBits.nextSetBit(0); i >= 0; i =
beforeBits.nextSetBit(i + 1)) {
final int to = P[i];
assert i != to;
afterBits.set(to);
}
final int lastIndex = blockSize*8-1;
if (afterBits.get(lastIndex)) {
return afterBits.toByteArray();
}
afterBits.set(lastIndex);
final byte[] postRound = afterBits.toByteArray();
postRound[blockSize - 1] &= 0x7F;
return postRound;
}
If that doesn't cut it, in case you use the same P for lots of iterations, it may be worthwhile to consider transforming the permutation into cycle notation and perform the transformation in-place.
This way you can linearly iterate over P which may enable you to better exploit caching (P is 32 times as large as the byte array, assuming its an int array).
Yet, you will lose the advantage that you only have to look at 1s and end up shifting around every single bit in the byte array, set or not.
If you want to avoid using the BitSet, you can just do it by hand:
private byte[] permute(final byte[] preRound) {
final byte[] result = new byte[blockSize];
for (int i = 0; i < blockSize; i++) {
final byte b = preRound[i];
// if 1s are sparse, you may want to use this:
// if ((byte) 0 == b) continue;
for (int j = 0; j < 8; ++j) {
if (0 != (b & (1 << j))) {
final int loc = P[i * 8 + j];
result[loc / 8] |= (1 << (loc % 8));
}
}
}
return result;
}

Converting from binary to decimal in Java

I need to write a program that can convert bits into decimal. Whenever I enter a bit, it only outputs 0.0. I cannot figure out why. I know it's incredibly simple but I am just not seeing it. Any help would be appreciated.
import java.lang.Math;
import java.util.Scanner;
public class Lab1 {
static double number = 0;
public static double toDec(String num) {
char[] charArray = num.toCharArray();
for(int i = 0; i<charArray.length;i++) {
if(charArray[i] == 1) {
number = Math.pow(2, charArray.length-i);
}
}
return number;
}
public static void main(String[] args) {
Scanner keyboard = new Scanner(System.in);
int bit;
String bitString;
System.out.println("Please enter a bit");
bit = keyboard.nextInt();
bitString = Integer.toString(bit);
System.out.println(toDec(bitString));
}
}
You have compared charArray[i] to 1, but you're comparing apples to oranges, specifically, a char to an int.
Compare to the char '1' instead.
if(charArray[i] == '1') {
Also, you can make number a local variable in toDec; it doesn't need to exist outside that method.
In addition, this will only work if one bit is set. Right now you are working with one bitonly, but if you want to modify this to work with multiple bits, another changes is needed.
You overwrite number each time toDec is called and the condition is true. You will probably want to add to number with += instead of overwriting the previous value with =.
Integer#parseInt(String str, int radix) does the job :
public static Integer toDec(String num) {
return Integer.parseInt(num, 2);
}
So if you want to take the String "110011" which is 51. For big-endian you are going to have to determine how many bits to process. So if you read the string and it is 6 digits long then you know the first bit has to be shifted 6 places to the left.
int l = 6;
long value = 0;
for( int i = 0; i < l; i++ )
{
int bit = ( charArray[i] == "1" ) ? 1 : 0;
l = l + ( bit << l-i );
}
For float you would basically have to build an interpreter to decode the bits based on however the float is represented in binary.

Adaptation of LCS algorithm

new programmer here. I watched a video which displayed a recursive algorithm for LCS(longest common substring). The program only returned an int which was the length of the LCS between the two strings. I decided as an exercise to adapt the algorithm to return the string itself. Here is what I came up with, and it seems to be right, but I need to ask others more experienced if there are any bugs;
const int mAX=1001; //max size for the two strings to be compared
string soFar[mAX][mAX]; //keeps results of strings generated along way to solution
bool Get[mAX][mAX]; //marks what has been seen before(pairs of indexes)
class LCS{ //recursive version,use of global arrays not STL maps
private:
public:
string _getLCS(string s0,int k0, string s1,int k1){
if(k0<=0 || k1<=0){//base case
return "";
}
if(!Get[k0][k1]){ //checking bool memo to see if pair of indexes has been seen before
Get[k0][k1]=true; //mark seen pair of string indexs
if(s0[k0-1]==s1[k1-1]){
soFar[k0][k1]=s0[k0-1]+_getLCS(s0,k0-1,s1,k1-1);//if the char in positions k0 and k1 are equal add common char and move on
}
else{
string a=_getLCS(s0,k0-1,s1,k1);//this string is the result from keeping the k1 position the same and decrementing the k0 position
string b=_getLCS(s0,k0,s1,k1-1);//this string is the result from decrementing the k1 position keeping k0 the same
if(a.length()> b.length())soFar[k0][k1]=a;//the longer string is the one we are interested in
else
soFar[k0][k1]=b;
}
}
return soFar[k0][k1];
}
string LCSnum(string s0,string s1){
memset(Get,0,sizeof(Get));//memset works fine for zero, so no complaints please
string a=_getLCS(s0,s0.length(),s1,s1.length());
reverse(a.begin(),a.end());//because I start from the end of the strings, the result need to be reversed
return a;
}
};
I have only been programming for 6 months so I cant really tell if there is some bugs or cases where this algorithm will not work. It seems to work for two strings of size up to 1001 chars each.
What are the bugs and would the equivalent dynamic programming solution be faster or use less memory for the same result?
Thanks
Your program is not correct. What does it return for LCSnum("aba", "abba")?
string soFar[mAX][mAX] should be a hint that this is not a great solution. A simple dynamic programming solution (which has logic that you almost follow) has an array of size_t which is m*n in size, and no bool Get[mAX][mAX] either. (A better dynamic programming algorithm only has an array of 2*min(m, n).)
Edit: by the way, here is the space-efficient dynamic programming solution in Java. Complexity: time is O(m*n), space is O(min(m, n)), where m and n are the lengths of the strings. The result set is given in alphabetical order.
import java.util.Set;
import java.util.TreeSet;
class LCS {
public static void main(String... args) {
System.out.println(lcs(args[0], args[1]));
}
static Set<String> lcs(String s1, String s2) {
final Set<String> result = new TreeSet<String>();
final String shorter, longer;
if (s1.length() <= s2.length()) {
shorter = s1;
longer = s2;
}else{
shorter = s2;
longer = s1;
}
final int[][] table = new int[2][shorter.length()];
int maxLen = 0;
for (int i = 0; i < longer.length(); i++) {
int[] last = table[i % 2]; // alternate
int[] current = table[(i + 1) % 2];
for (int j = 0; j < shorter.length(); j++) {
if (longer.charAt(i) == shorter.charAt(j)) {
current[j] = (j > 0? last[j - 1] : 0) + 1;
if (current[j] > maxLen) {
maxLen = current[j];
result.clear();
}
if (current[j] == maxLen) {
result.add(shorter.substring(j + 1 - maxLen, j + 1));
}
}
}
}
return result;
}
}

Performance intensive string splitting and manipulation in java

What is the most efficient way to split a string by a very simple separator?
Some background:
I am porting a function I wrote in C with a bunch of pointer arithmetic to java and it is incredibly slow(After some optimisation still 5* slower).
Having profiled it, it turns out a lot of that overhead is in String.split
The function in question takes a host name or ip address and makes it generic:
123.123.123.123->*.123.123.123
a.b.c.example.com->*.example.com
This can be run over several million items on a regular basis, so performance is an issue.
Edit: the rules for converting are thus:
If it's an ip address, replace the first part
Otherwise, find the main domain name, and make the preceding part generic.
foo.bar.com-> *.bar.com
foo.bar.co.uk-> *.bar.co.uk
I have now rewritten using lastIndexOf and substring to work myself in from the back and the performance has improved by leaps and bounds.
I'll leave the question open for another 24 hours before settling on the best answer for future reference
Here's what I've come up with now(the ip part is an insignificant check before calling this function)
private static String hostConvert(String in) {
final String [] subs = { "ac", "co", "com", "or", "org", "ne", "net", "ad", "gov", "ed" };
int dotPos = in.lastIndexOf('.');
if(dotPos == -1)
return in;
int prevDotPos = in.lastIndexOf('.', dotPos-1);
if(prevDotPos == -1)
return in;
CharSequence cs = in.subSequence(prevDotPos+1, dotPos);
for(String cur : subs) {
if(cur.contentEquals(cs)) {
int start = in.lastIndexOf('.', prevDotPos-1);
if(start == -1 || start == 0)
return in;
return "*" + in.substring(start);
}
}
return "*" + in.substring(prevDotPos);
}
If there's any space for further improvement it would be good to hear.
Something like this is about as fast as you can make it:
static String starOutFirst(String s) {
final int K = s.indexOf('.');
return "*" + s.substring(K);
}
static String starOutButLastTwo(String s) {
final int K = s.lastIndexOf('.', s.lastIndexOf('.') - 1);
return "*" + s.substring(K);
}
Then you can do:
System.out.println(starOutFirst("123.123.123.123"));
// prints "*.123.123.123"
System.out.println(starOutButLastTwo("a.b.c.example.com"));
// prints "*.example.com"
You may need to use regex to see which of the two method is applicable for any given string.
I'd try using .indexOf("."), and .substring(index)
You didn't elaborate on the exact pattern you wanted to match but if you can avoid split(), it should cut down on the number of new strings it allocates (1 instead of several).
It's unclear from your question exactly what the code is supposed to do. Does it find the first '.' and replace everything up to it with a '*'? Or is there some fancier logic behind it? Maybe everything up to the nth '.' gets replaced by '*'?
If you're trying to find an instance of a particular string, use something like the Boyer-Moore algorithm. It should be able to find the match for you and you can then replace what you want.
Keep in mind that String in Java is immutable. It might be faster to change the sequence in-place. Check out other CharSequence implementations to see what you can do, e.g. StringBuffer and CharBuffer. If concurrency is not needed, StringBuilder might be an option.
By using a mutable CharSequence instead of the methods on String, you avoid a bunch of object churn. If all you're doing is replacing some slice of the underlying character array with a shorter array (i.e. {'*'}), this is likely to yield a speedup since such array copies are fairly optimized. You'll still be doing an array copy at the end of the day, but it may be faster than new String allocations.
UPDATE
All the above is pretty much hogwash. Sure, maybe you can implement your own CharSequence that gives you better slicing and lazily resizes the array (aka doesn't actually truncate anything until it absolutely must), returning Strings based on offsets and whatnot. But StringBuffer and StringBuilder, at least directly, do not perform as well as the solution poly posted. CharBuffer is entirely inapplicable; I didn't realize it was an nio class earlier: it's meant for other things entirely.
There are some interesting things about poly's code, which I wonder whether he/she knew before posting it, namely that changing the "*" on the final lines of the methods to a '*' results in a significant slowdown.
Nevertheless, here is my benchmark. I found one small optimization: declaring the '.' and "*" expressions as constants adds a bit of a speedup as well as using a locally-scoped StringBuilder instead of the binary infix string concatenation operator.
I know the gc() is at best advisory and at worst a no-op, but I figured adding it with a bit of sleep time might let the VM do some cleanup after creating 1M Strings. Someone may correct me if this is totally naïve.
Simple Benchmark
import java.util.ArrayList;
import java.util.Arrays;
public class StringSplitters {
private static final String PREFIX = "*";
private static final char DOT = '.';
public static String starOutFirst(String s) {
final int k = s.indexOf(DOT);
return PREFIX + s.substring(k);
}
public static String starOutFirstSb(String s) {
StringBuilder sb = new StringBuilder();
final int k = s.indexOf(DOT);
return sb.append(PREFIX).append(s.substring(k)).toString();
}
public static void main(String[] args) throws InterruptedException {
double[] firstRates = new double[10];
double[] firstSbRates = new double[10];
double firstAvg = 0;
double firstSbAvg = 0;
double firstMin = Double.POSITIVE_INFINITY;
double firstMax = Double.NEGATIVE_INFINITY;
double firstSbMin = Double.POSITIVE_INFINITY;
double firstSbMax = Double.NEGATIVE_INFINITY;
for (int i = 0; i < 10; i++) {
firstRates[i] = testFirst();
firstAvg += firstRates[i];
if (firstRates[i] < firstMin)
firstMin = firstRates[i];
if (firstRates[i] > firstMax)
firstMax = firstRates[i];
Thread.sleep(100);
System.gc();
Thread.sleep(100);
}
firstAvg /= 10.0d;
for (int i = 0; i < 10; i++) {
firstSbRates[i] = testFirstSb();
firstSbAvg += firstSbRates[i];
if (firstSbRates[i] < firstSbMin)
firstSbMin = firstSbRates[i];
if (firstSbRates[i] > firstSbMax)
firstSbMax = firstSbRates[i];
Thread.sleep(100);
System.gc();
Thread.sleep(100);
}
firstSbAvg /= 10.0d;
System.out.printf("First:\n\tMin:\t%07.3f\tMax:\t%07.3f\tAvg:\t%07.3f\n\tRates:\t%s\n\n", firstMin, firstMax,
firstAvg, Arrays.toString(firstRates));
System.out.printf("FirstSb:\n\tMin:\t%07.3f\tMax:\t%07.3f\tAvg:\t%07.3f\n\tRates:\t%s\n\n", firstSbMin,
firstSbMax, firstSbAvg, Arrays.toString(firstSbRates));
}
private static double testFirst() {
ArrayList<String> strings = new ArrayList<String>(1000000);
for (int i = 0; i < 1000000; i++) {
int first = (int) (Math.random() * 128);
int second = (int) (Math.random() * 128);
int third = (int) (Math.random() * 128);
int fourth = (int) (Math.random() * 128);
strings.add(String.format("%d.%d.%d.%d", first, second, third, fourth));
}
long before = System.currentTimeMillis();
for (String s : strings) {
starOutFirst(s);
}
long after = System.currentTimeMillis();
return 1000000000.0d / (after - before);
}
private static double testFirstSb() {
ArrayList<String> strings = new ArrayList<String>(1000000);
for (int i = 0; i < 1000000; i++) {
int first = (int) (Math.random() * 128);
int second = (int) (Math.random() * 128);
int third = (int) (Math.random() * 128);
int fourth = (int) (Math.random() * 128);
strings.add(String.format("%d.%d.%d.%d", first, second, third, fourth));
}
long before = System.currentTimeMillis();
for (String s : strings) {
starOutFirstSb(s);
}
long after = System.currentTimeMillis();
return 1000000000.0d / (after - before);
}
}
Output
First:
Min: 3802281.369 Max: 5434782.609 Avg: 5185796.131
Rates: [3802281.3688212926, 5181347.150259067, 5291005.291005291, 5376344.086021505, 5291005.291005291, 5235602.094240838, 5434782.608695652, 5405405.405405405, 5434782.608695652, 5405405.405405405]
FirstSb:
Min: 4587155.963 Max: 5747126.437 Avg: 5462087.511
Rates: [4587155.963302752, 5747126.436781609, 5617977.528089887, 5208333.333333333, 5681818.181818182, 5586592.17877095, 5586592.17877095, 5524861.878453039, 5524861.878453039, 5555555.555555556]

Categories