Reverse long array to string algorithm - java

i need to reverse the following algorithm which converts a long array into a string:
public final class LongConverter {
private final long[] l;
public LongConverter(long[] paramArrayOfLong) {
this.l = paramArrayOfLong;
}
private void convertLong(long paramLong, byte[] paramArrayOfByte, int paramInt) {
int i = Math.min(paramArrayOfByte.length, paramInt + 8);
while (paramInt < i) {
paramArrayOfByte[paramInt] = ((byte) (int) paramLong);
paramLong >>= 8;
paramInt++;
}
}
public final String toString() {
int i = this.l.length;
byte[] arrayOfByte = new byte[8 * (i - 1)];
long l1 = this.l[0];
Random localRandom = new Random(l1);
for (int j = 1; j < i; j++) {
long l2 = localRandom.nextLong();
convertLong(this.l[j] ^ l2, arrayOfByte, 8 * (j - 1));
}
String str;
try {
str = new String(arrayOfByte, "UTF8");
} catch (UnsupportedEncodingException localUnsupportedEncodingException) {
throw new AssertionError(localUnsupportedEncodingException);
}
int k = str.indexOf(0);
if (-1 == k) {
return str;
}
return str.substring(0, k);
}
So when I do the following call
System.out.println(new LongConverter(new long[]{-6567892116040843544L, 3433539276790523832L}).toString());
it prints 400 as result.
It would be great if anyone could say what algorithm this is or how i could reverse it.
Thanks for your help

This is not a solvable problem as stated because
you only use l[0] so any additional long values could be anything.
it is guaranteed that there is N << 16 solutions to this problem. While the seed for random is 64-bit in reality the value used internally is 48-bit. This means is there is any solution, there if at least 16K solutions for a long seed.
What you can do is;
find the smallest seed which would generate the string using brute force. For a short strings this won't take long, however if you have 5-6 character this will take a while and for 7+ character there might not be a solution.
instead of generating 8-bit characters where all 8-bit values are equal. You could restrict the range to say space, A-Z, a-z and 0-9. This means you can have ~6-bits of randomness, shorter seeds and slightly longer Strings.
BTW You might find this post interesting where I use contrived random seeds to generate specific sequences. http://vanillajava.blogspot.co.uk/2011/10/randomly-no-so-random.html
If you want a process which ensures you can always re-create the original longs from a String or a byte[], I suggest using encryption. You can encrypt a String which has been UTF-8 encoded or a byte[] into another byte[] which can be base64 encoded to be readable as text. (Or you could skip the encryption and use base64 alone)

Related

Caesar Cipher decoded wrongly in Java

I have implemented the Caesar Cipher algorithm in Java 8.
The Problem
Heute ist Freitag.
results into this encoded text using 22 as a key:
^{
{6
6{
w}D
Decoding this again gets me this output:
Heu
e is
Frei
ag.
The Code and description
It should be noted that my algorithm doesn't care about characters like '\n', meaning some characters might be translated to escape sequences or spaces etc.
This is also totally what I want to happen, thought it doesn't work.
public String encode(String txt, int key) {
if(key <= 0)
return txt;
String result = "";
for (int i = 0; i < txt.length(); i++) {
int x = (txt.charAt(i) + key) % 128;
result += (char) x;
}
System.out.println(result);
return result;
}
public String decipherM(String txt, int key) {
if(key <= 0)
return txt;
String result = "";
for (int i = 0; i < txt.length(); i++) {
int x = (txt.charAt(i) - key) % 128;
if(x < 0)
x += 128;
result += (char) x;
}
System.out.println(result);
return result;
}
The question
I would really like to know why it doesn't work with escape sequences or other non alphabetic characters.
Control characters have a defined meaning and text processing tools may retain the meaning or even remove those control characters not having a valid meaning, rather than retaining the exact byte representation.
Note that when you go beyond ASCII, this may even happen with ordinary characters, e.g. since you used a German sample text, you have to be aware that the two Unicode codepoint sequences \u00E4 and \u0061\u0308 are semantically equivalent, both referring to the character ä and you can not rely on text processing tool to retain both forms.
After all, there is a reason why encodings like Base 64 have been invented for lossless transfer of byte sequences through text processing tools.
For an encoding as simple as yours, it might be the best to simply forbid control characters in the source string and rotate only through the ASCII non-control character range:
public String encodeRotation(String txt, int distance) {
int first = ' ', last = 128, range = last - first;
while(distance<0) distance+=range;
if(distance == 0) return txt;
char[] buffer = txt.toCharArray();
for (int i = 0; i < txt.length(); i++) {
char c = buffer[i];
if(c<first || c>=last)
throw new IllegalArgumentException("unsupported character "+c);
buffer[i] = (char) ((c - first + distance) % range + first);
}
return String.valueOf(buffer);
}
public String decodeRotation(String txt, int key) {
return encodeRotation(txt, -key);
}
System.out.println(encodeRotation("Heute ist Freitag.", 22));
^{+*{6)*6\({*w}D
System.out.println(decodeRotation("^{+*{6)*6\\({*w}D", 22));
Heute ist Freitag.

Fastest way to permute bits in a Java array

What is the fastest way to randomly (but repeatedly) permute all the bits within a Java byte array? I've tried successfully doing it with a BitSet, but is there a faster way? Clearly the for-loop consumes the majority of the cpu time.
I've just done some profiling in my IDE and the for-loop constitutes 64% of the cpu time within the entire permute() method.
To clarify, the array (preRound) contains an existing array of numbers going into the procedure. I want the individual set bits of that array to be mixed up in a random manner. This is the reason for P[]. It contains a random list of bit positions. So for example, if bit 13 of preRound is set, it is transferred to place P[13] of postRound. This might be at position 20555 of postRound. The whole thing is part of a substitution - permutation network, and I'm looking to the fastest way to permute the incoming bits.
My code so far...
private byte[] permute(byte[] preRound) {
BitSet beforeBits = BitSet.valueOf(preRound);
BitSet afterBits = new BitSet(blockSize * 8);
for (int i = 0; i < blockSize * 8; i++) {
assert i != P[i];
if (beforeBits.get(i)) {
afterBits.set(P[i]);
}
}
byte[] postRound = afterBits.toByteArray();
postRound = Arrays.copyOf(postRound, blockSize); // Pad with 0s to the specified length
assert postRound.length == blockSize;
return postRound;
}
FYI, blockSize is about 60,000 and P is a random lookup table.
I didn't perform any performance tests, but you may want to consider the following:
To omit the call to Arrays.copyOf (which copies the copy of the long[] used interally, which is kind of annoying), just set the last bit in case it wasn't set before and unset it afterwards.
Furthermore, there is a nice idiom to iterate over the set bits in the input permutation.
private byte[] permute(final byte[] preRound) {
final BitSet beforeBits = BitSet.valueOf(preRound);
final BitSet afterBits = new BitSet(blockSize*8);
for (int i = beforeBits.nextSetBit(0); i >= 0; i =
beforeBits.nextSetBit(i + 1)) {
final int to = P[i];
assert i != to;
afterBits.set(to);
}
final int lastIndex = blockSize*8-1;
if (afterBits.get(lastIndex)) {
return afterBits.toByteArray();
}
afterBits.set(lastIndex);
final byte[] postRound = afterBits.toByteArray();
postRound[blockSize - 1] &= 0x7F;
return postRound;
}
If that doesn't cut it, in case you use the same P for lots of iterations, it may be worthwhile to consider transforming the permutation into cycle notation and perform the transformation in-place.
This way you can linearly iterate over P which may enable you to better exploit caching (P is 32 times as large as the byte array, assuming its an int array).
Yet, you will lose the advantage that you only have to look at 1s and end up shifting around every single bit in the byte array, set or not.
If you want to avoid using the BitSet, you can just do it by hand:
private byte[] permute(final byte[] preRound) {
final byte[] result = new byte[blockSize];
for (int i = 0; i < blockSize; i++) {
final byte b = preRound[i];
// if 1s are sparse, you may want to use this:
// if ((byte) 0 == b) continue;
for (int j = 0; j < 8; ++j) {
if (0 != (b & (1 << j))) {
final int loc = P[i * 8 + j];
result[loc / 8] |= (1 << (loc % 8));
}
}
}
return result;
}

Reading and writing huge files in java

My idea is to make a little software that reads a file (which can't be read "naturally", but it contains some images), turns its data into hex, looks for the PNG chunks (a kind of marks that are at the beginning and end of a .png file), and saves the resulting data in different files (after getting it back from hex). I am doing this in Java, using a code like this:
// out is where to show the result and file is the source
public static void hexDump(PrintStream out, File file) throws IOException {
InputStream is = new FileInputStream(file);
StringBuffer Buffer = new StringBuffer();
while (is.available() > 0) {
StringBuilder sb1 = new StringBuilder();
for (int j = 0; j < 16; j++) {
if (is.available() > 0) {
int value = (int) is.read();
// transform the current data into hex
sb1.append(String.format("%02X ", value));
}
}
Buffer.append(sb1);
// Should I look for the PNG here? I'm not sure
}
is.close();
// Print the result in out (that may be the console or a file)
out.print(Buffer);
}
I'm sure there are another ways to do this using less "machine-resources" while opening huge files. If you have any idea, please tell me. Thanks!
This is the first time I post, so if there is any error, please help me to correct it.
As Erwin Bolwidt says in the comments, first thing is don't convert to hex. If for some reason you must convert to hex, quit appending the content to two buffers, and always use StringBuilder, not StringBuffer. StringBuilder can be as much as 3x faster than StringBuffer.
Also, buffer your file reads with BufferedReader. Reading one character at a time with FileInputStream.read() is very slow.
A very simple way to do this, which is probably quite fast, is to read the entire file into memory (as binary data, not as a hex dump) and then search for the markers.
This has two limitations:
it only handles files up to 2 GiB in length (max size of Java arrays)
it requires large chunks of memory - it is possible to optimize this by reader smaller chunks but that makes the algorithm more complex
The basic code to do that is like this:
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
public class Png {
static final String PNG_MARKER_HEX = "abcdef0123456789"; // TODO: replace with real marker
static final byte[] PNG_MARKER = hexStringToByteArray(PNG_MARKER_HEX);
public void splitPngChunks(File file) throws IOException {
byte[] bytes = Files.readAllBytes(file.toPath());
int offset = KMPMatch.indexOf(bytes, 0, PNG_MARKER);
while (offset >= 0) {
int nextOffset = KMPMatch.indexOf(bytes, 0, PNG_MARKER);
if (nextOffset < 0) {
writePngChunk(bytes, offset, bytes.length - offset);
} else {
writePngChunk(bytes, offset, nextOffset - offset);
}
offset = nextOffset;
}
}
public void writePngChunk(byte[] bytes, int offset, int length) {
// TODO: implement - where do you want to write the chunks?
}
}
I'm not sure how these PNG chunk markers work exactly, I'm assuming above that they start the section of the data that you're interested in, and that the next marker starts the next section of the data.
There are two things missing in standard Java: code to convert a hex string to a byte array and code to search for a byte array inside another byte array.
Both can be found in various apache-commons libraries but I'll include that answers the people posted to earlier questions on StackOverflow. You can copy these verbatim into the Png class to make the above code work.
Convert a string representation of a hex dump to a byte array using Java?
public static byte[] hexStringToByteArray(String s) {
int len = s.length();
byte[] data = new byte[len / 2];
for (int i = 0; i < len; i += 2) {
data[i / 2] = (byte) ((Character.digit(s.charAt(i), 16) << 4) + Character.digit(s.charAt(i + 1), 16));
}
return data;
}
Searching for a sequence of Bytes in a Binary File with Java
/**
* Knuth-Morris-Pratt Algorithm for Pattern Matching
*/
static class KMPMatch {
/**
* Finds the first occurrence of the pattern in the text.
*/
public static int indexOf(byte[] data, int offset, byte[] pattern) {
int[] failure = computeFailure(pattern);
int j = 0;
if (data.length - offset <= 0)
return -1;
for (int i = offset; i < data.length; i++) {
while (j > 0 && pattern[j] != data[i]) {
j = failure[j - 1];
}
if (pattern[j] == data[i]) {
j++;
}
if (j == pattern.length) {
return i - pattern.length + 1;
}
}
return -1;
}
/**
* Computes the failure function using a boot-strapping process, where the pattern is matched against itself.
*/
private static int[] computeFailure(byte[] pattern) {
int[] failure = new int[pattern.length];
int j = 0;
for (int i = 1; i < pattern.length; i++) {
while (j > 0 && pattern[j] != pattern[i]) {
j = failure[j - 1];
}
if (pattern[j] == pattern[i]) {
j++;
}
failure[i] = j;
}
return failure;
}
}
I modified this last piece of code to make it possible to start the search at an offset other than zero.
Reading the file a byte at a time would be taking substantial time here. You can improve that by orders of magnitude. You should be using a DataInputStream around a BufferedInputStream around the FileInputStream, and reading 16 bytes at a time with readFully.
And then processing them, without conversion to and from hex, which is quite unnecessary here, and writing them to the output(s) as you go, via a BufferedOutputStream around the FileOutputStream, rather than concatenating the entire file into memory and having to write it all out in one go. Of course that takes time, but that's because it does, not because you have to do it that way.

Convert hashcode to limited set of string

I know it's one-way function but I want to convert hashcode back to limited set of string (using char between 32 to 126). Is there an efficient way to do this?
It's not only feasible - it's actually pretty simple, given the definition for String.hashCode. You can just create a string of "base 31" characters with some arbitrary starting point to keep everything in the right range, subtracting an offset based on that starting point.
This isn't necessarily the shortest string with the given hash code, but 7 characters is pretty short :)
public class Test {
public static void main(String[] args) {
int hash = 100000;
String sample = getStringForHashCode(hash);
System.out.println(sample); // ASD^TYQ
System.out.println(sample.hashCode()); // 100000
}
private static final int OFFSET = "AAAAAAA".hashCode();
private static String getStringForHashCode(int hash) {
hash -= OFFSET;
// Treat it as an unsigned long, for simplicity.
// This avoids having to worry about negative numbers anywhere.
long longHash = (long) hash & 0xFFFFFFFFL;
System.out.println(longHash);
char[] c = new char[7];
for (int i = 0; i < 7; i++)
{
c[6 - i] = (char) ('A' + (longHash % 31));
longHash /= 31;
}
return new String(c);
}
}
Actually, I only need one string from that hashcode. I want to make Minecraft seed shortener.
The simplest way to turn an int value into a short string is to use
String s = Integer.toString(n, 36); // uses base 36.

Performance intensive string splitting and manipulation in java

What is the most efficient way to split a string by a very simple separator?
Some background:
I am porting a function I wrote in C with a bunch of pointer arithmetic to java and it is incredibly slow(After some optimisation still 5* slower).
Having profiled it, it turns out a lot of that overhead is in String.split
The function in question takes a host name or ip address and makes it generic:
123.123.123.123->*.123.123.123
a.b.c.example.com->*.example.com
This can be run over several million items on a regular basis, so performance is an issue.
Edit: the rules for converting are thus:
If it's an ip address, replace the first part
Otherwise, find the main domain name, and make the preceding part generic.
foo.bar.com-> *.bar.com
foo.bar.co.uk-> *.bar.co.uk
I have now rewritten using lastIndexOf and substring to work myself in from the back and the performance has improved by leaps and bounds.
I'll leave the question open for another 24 hours before settling on the best answer for future reference
Here's what I've come up with now(the ip part is an insignificant check before calling this function)
private static String hostConvert(String in) {
final String [] subs = { "ac", "co", "com", "or", "org", "ne", "net", "ad", "gov", "ed" };
int dotPos = in.lastIndexOf('.');
if(dotPos == -1)
return in;
int prevDotPos = in.lastIndexOf('.', dotPos-1);
if(prevDotPos == -1)
return in;
CharSequence cs = in.subSequence(prevDotPos+1, dotPos);
for(String cur : subs) {
if(cur.contentEquals(cs)) {
int start = in.lastIndexOf('.', prevDotPos-1);
if(start == -1 || start == 0)
return in;
return "*" + in.substring(start);
}
}
return "*" + in.substring(prevDotPos);
}
If there's any space for further improvement it would be good to hear.
Something like this is about as fast as you can make it:
static String starOutFirst(String s) {
final int K = s.indexOf('.');
return "*" + s.substring(K);
}
static String starOutButLastTwo(String s) {
final int K = s.lastIndexOf('.', s.lastIndexOf('.') - 1);
return "*" + s.substring(K);
}
Then you can do:
System.out.println(starOutFirst("123.123.123.123"));
// prints "*.123.123.123"
System.out.println(starOutButLastTwo("a.b.c.example.com"));
// prints "*.example.com"
You may need to use regex to see which of the two method is applicable for any given string.
I'd try using .indexOf("."), and .substring(index)
You didn't elaborate on the exact pattern you wanted to match but if you can avoid split(), it should cut down on the number of new strings it allocates (1 instead of several).
It's unclear from your question exactly what the code is supposed to do. Does it find the first '.' and replace everything up to it with a '*'? Or is there some fancier logic behind it? Maybe everything up to the nth '.' gets replaced by '*'?
If you're trying to find an instance of a particular string, use something like the Boyer-Moore algorithm. It should be able to find the match for you and you can then replace what you want.
Keep in mind that String in Java is immutable. It might be faster to change the sequence in-place. Check out other CharSequence implementations to see what you can do, e.g. StringBuffer and CharBuffer. If concurrency is not needed, StringBuilder might be an option.
By using a mutable CharSequence instead of the methods on String, you avoid a bunch of object churn. If all you're doing is replacing some slice of the underlying character array with a shorter array (i.e. {'*'}), this is likely to yield a speedup since such array copies are fairly optimized. You'll still be doing an array copy at the end of the day, but it may be faster than new String allocations.
UPDATE
All the above is pretty much hogwash. Sure, maybe you can implement your own CharSequence that gives you better slicing and lazily resizes the array (aka doesn't actually truncate anything until it absolutely must), returning Strings based on offsets and whatnot. But StringBuffer and StringBuilder, at least directly, do not perform as well as the solution poly posted. CharBuffer is entirely inapplicable; I didn't realize it was an nio class earlier: it's meant for other things entirely.
There are some interesting things about poly's code, which I wonder whether he/she knew before posting it, namely that changing the "*" on the final lines of the methods to a '*' results in a significant slowdown.
Nevertheless, here is my benchmark. I found one small optimization: declaring the '.' and "*" expressions as constants adds a bit of a speedup as well as using a locally-scoped StringBuilder instead of the binary infix string concatenation operator.
I know the gc() is at best advisory and at worst a no-op, but I figured adding it with a bit of sleep time might let the VM do some cleanup after creating 1M Strings. Someone may correct me if this is totally naïve.
Simple Benchmark
import java.util.ArrayList;
import java.util.Arrays;
public class StringSplitters {
private static final String PREFIX = "*";
private static final char DOT = '.';
public static String starOutFirst(String s) {
final int k = s.indexOf(DOT);
return PREFIX + s.substring(k);
}
public static String starOutFirstSb(String s) {
StringBuilder sb = new StringBuilder();
final int k = s.indexOf(DOT);
return sb.append(PREFIX).append(s.substring(k)).toString();
}
public static void main(String[] args) throws InterruptedException {
double[] firstRates = new double[10];
double[] firstSbRates = new double[10];
double firstAvg = 0;
double firstSbAvg = 0;
double firstMin = Double.POSITIVE_INFINITY;
double firstMax = Double.NEGATIVE_INFINITY;
double firstSbMin = Double.POSITIVE_INFINITY;
double firstSbMax = Double.NEGATIVE_INFINITY;
for (int i = 0; i < 10; i++) {
firstRates[i] = testFirst();
firstAvg += firstRates[i];
if (firstRates[i] < firstMin)
firstMin = firstRates[i];
if (firstRates[i] > firstMax)
firstMax = firstRates[i];
Thread.sleep(100);
System.gc();
Thread.sleep(100);
}
firstAvg /= 10.0d;
for (int i = 0; i < 10; i++) {
firstSbRates[i] = testFirstSb();
firstSbAvg += firstSbRates[i];
if (firstSbRates[i] < firstSbMin)
firstSbMin = firstSbRates[i];
if (firstSbRates[i] > firstSbMax)
firstSbMax = firstSbRates[i];
Thread.sleep(100);
System.gc();
Thread.sleep(100);
}
firstSbAvg /= 10.0d;
System.out.printf("First:\n\tMin:\t%07.3f\tMax:\t%07.3f\tAvg:\t%07.3f\n\tRates:\t%s\n\n", firstMin, firstMax,
firstAvg, Arrays.toString(firstRates));
System.out.printf("FirstSb:\n\tMin:\t%07.3f\tMax:\t%07.3f\tAvg:\t%07.3f\n\tRates:\t%s\n\n", firstSbMin,
firstSbMax, firstSbAvg, Arrays.toString(firstSbRates));
}
private static double testFirst() {
ArrayList<String> strings = new ArrayList<String>(1000000);
for (int i = 0; i < 1000000; i++) {
int first = (int) (Math.random() * 128);
int second = (int) (Math.random() * 128);
int third = (int) (Math.random() * 128);
int fourth = (int) (Math.random() * 128);
strings.add(String.format("%d.%d.%d.%d", first, second, third, fourth));
}
long before = System.currentTimeMillis();
for (String s : strings) {
starOutFirst(s);
}
long after = System.currentTimeMillis();
return 1000000000.0d / (after - before);
}
private static double testFirstSb() {
ArrayList<String> strings = new ArrayList<String>(1000000);
for (int i = 0; i < 1000000; i++) {
int first = (int) (Math.random() * 128);
int second = (int) (Math.random() * 128);
int third = (int) (Math.random() * 128);
int fourth = (int) (Math.random() * 128);
strings.add(String.format("%d.%d.%d.%d", first, second, third, fourth));
}
long before = System.currentTimeMillis();
for (String s : strings) {
starOutFirstSb(s);
}
long after = System.currentTimeMillis();
return 1000000000.0d / (after - before);
}
}
Output
First:
Min: 3802281.369 Max: 5434782.609 Avg: 5185796.131
Rates: [3802281.3688212926, 5181347.150259067, 5291005.291005291, 5376344.086021505, 5291005.291005291, 5235602.094240838, 5434782.608695652, 5405405.405405405, 5434782.608695652, 5405405.405405405]
FirstSb:
Min: 4587155.963 Max: 5747126.437 Avg: 5462087.511
Rates: [4587155.963302752, 5747126.436781609, 5617977.528089887, 5208333.333333333, 5681818.181818182, 5586592.17877095, 5586592.17877095, 5524861.878453039, 5524861.878453039, 5555555.555555556]

Categories