String to Byte[] and Byte to String - java

Given the following example:
String f="FF00000000000000";
byte[] bytes = DatatypeConverter.parseHexBinary(f);
String f2= new String (bytes);
I want the output to be FF00000000000000 but it's not working with this method.

You're currently trying to interpret the bytes as if they were text encoded using the platform default encoding (UTF-8, ISO-8859-1 or whatever). That's not what you actually want to do at all - you want to convert it back to hex.
For that, just look at the converter you're using for the parsing step, and look for similar methods which work in the opposite direction. In this case, you want printHexBinary:
String f2 = DatatypeConverter.printHexBinary(bytes);
The approach of "look for reverse operations near the original operation" is a useful one in general... but be aware that sometimes you need to look at a parallel type, e.g. DataInputStream / DataOutputStream. When you find yourself using completely different types for inverse operations, that's usually a bit of a warning sign. (It's not always wrong, it's just worth investigating other options.)

Related

Avoiding line breaks in encrypted and encoded URL string

I am trying to implement a simple string encoder to obfuscates some parts of a URL string (to prevent them from getting mucked with by a user). I'm using code nearly identical to the sample in the JCA guide, except:
using DES (assuming it's a little faster than AES, and requires a smaller key) and
Base64 en/decoding the string to make sure it stays safe for a URL.
For reasons I can't understand, the output string ends up with linebreaks, which I presume won't work. I can't figure out what's causing this. Suggestions on something similar that's easier or pointers to some other resources to read? I'm finding all the cryptography references a bit over my head (and overkill), but a simple ROT13 implementation won't work since I want to deal with a larger character set (and don't want to waste time implementing something likely to have issues with obscure characters i didn't think of).
Sample input (no line break):
http://maps.google.com/maps?q=kansas&hl=en&sll=42.358431,-71.059773&sspn=0.415552,0.718918&hnear=Kansas&t=m&z=7
Sample Output (line breaks as shown below):
GstikIiULcJSGEU2NWNTpyucSWUFENptYk4m5lD8RJl8l1CuspiuXiE9a07fUEAGM/tC7h0Vzus+
jAH6cT4Wtz2RUlBdGf8WtQxVDKZVOzKwi84eQh2kZT9T3KomlnPOu2owJ/2RAEvG+QuGem5UGw==
my encode snippet:
final Key key = new SecretKeySpec(seed.getBytes(), "DES");
final Cipher c = Cipher.getInstance("DES");
c.init(Cipher.ENCRYPT_MODE, key);
final byte[] encVal = c.doFinal(s.getBytes());
return new BASE64Encoder().encode(encVal);
Simply perform base64Str = base64Str.replaceAll("(?:\\r\\n|\\n\\r|\\n|\\r)", "")
on the encoded string.
It works fine when you try do decode it back to bytes. I did test it several times with random generated byte arrays. Obviously decoding process just ignores the newlines either they are present or not.
I tested this "confirmed working" by using com.sun.org.apache.xml.internal.security.utils.Base64
Other encoders not tested.
Base64 encoders usually impose some maximum line (chunk) length, and adds newlines when necessary. You can normally configure that, but that depends on the particular coder implementation.
For example, the class from Apache Commons has a linelength attribute, setting it to zero (or negative) disables the line separation.
BTW: I agree with the other answer in that DES is hardly advisable today. Further, are you just "obfuscating" or really encrypting? Who has the key? The whole thing does not smell very well to me.
import android.util.Base64;
...
return new BASE64.encodeToString(encVal, Base64.NO_WRAP);
Though it's unrelated to your actual question, DES is generally slower than AES (at least in software), so unless you really need to keep the key small, AES is almost certainly a better choice.
Second, it's perfectly normal that encryption (DES or AES) would/will produce new-line characters in its output. Producing output without them will be entirely up to the base-64 encoder, so that's where you clearly need to look.
It's not particularly surprising to see a base-64 insert new-line characters at regular intervals in its output though. The most common use for base-64 encoding is putting raw data into something like the body of an email, where a really long line would cause a problem. To prevent that, the data is broken up into pieces, typically no more than 80 columns (and usually a bit less). In this case, the new-lines should be ignored, however, so you should be able to just delete them, if memory serves.

how to write hexadecimal values to a binary file

Im currently trying to build a save editor for a video game. Anyway the I figured out how to write to the binary file with output stream rather than writer I'm running into a problem. I'm trying to overwrite certain hexadecimal values but every time I try I end up replacing the whole file, theres probably an easy explanation for this but I also wanted advice on how to replace the hex values converting the hex values (ex. 5acd) from a string only gives me the byte data for the strings. Heres what I'm doing:
String textToWrite = inputField.getText();
byte[] charsToWrite = textToWrite.getBytes();
FileOutputStream out = new FileOutputStream(theFile);
out.write(charsToWrite, 23, charsToWrite.length)
Use a RandomAccessFile. This has the methods that you are looking for. FileOutputStream will only allow you to overwrite or append. However, note as Murali VP eluded to, this will only allow you to perform direct replacements (byte-for-byte) - and not removal or insertion of bytes.
Converting from Hex String to Byte Array (which is essentially what you need) - see this SO post for what you need.
HTH

Compact a string into a smaller string?

This may sound foolish, but I'm wondering all the same...
Is it possible to take a string composed of a given character set and compress it by using a bigger character set, or composing it into a number then converting it back at one?
For example, if you had a string that you know what be composed of [a-z][A-Z][0-9]-_+=, could you turn that into a number, the swap it back using more characters in order to compress it?
This is an area I'm not familiar with, I still want to keep it as a string, just a shorter one. (for displaying/echoing/etc, not memory)
I wouldn't bother doing that, unless the string is huge. You can then try to compress it with commons-compress or java.util.zip
A String internally keeps an array of 16 bit characters, which for western european languages is a waste, you can convert to utf-8 which should give you 50% reduction by doing
String myString = .....
ByteArrayOutputStream baos = new ByteArrayOutputStream();
baos.write(myString.getBytes("UTF-8");
byte[] data = baos.toByteArray();
and hold onto it as a byte array.
Of course this is rather inconvienent if you actually want to use them as Strings, but if the point is long term storage, without much access, this would save you a bunch.
You would have to do the reverse to recreate a String.
String is a primitive type, you are unlikely to regain any space by converting unless you use Java's zip library, and even that will not yield the performance benefits you are presumably seeking.

How does a person go about learning Java? (convert byte array to hex string)

I know this sounds like a broad question but I can narrow it down with an example. I am VERY new at Java. For one of my "learning" projects, I wanted to create an in-house MD5 file hasher for us to use. I started off very simple by attempting to hash a string and then moving on to a file later. I created a file called MD5Hasher.java and wrote the following:
import java.security.*;
import java.io.*;
public class MD5Hasher{
public static void main(String[] args){
String myString = "Hello, World!";
byte[] myBA = myString.getBytes();
MessageDigest myMD;
try{
myMD = MessageDigest.getInstance("MD5");
myMD.update(myBA);
byte[] newBA = myMD.digest();
String output = newBA.toString();
System.out.println("The Answer Is: " + output);
} catch(NoSuchAlgorithmException nsae){
// print error here
}
}
}
I visited java.sun.com to view the javadocs for java.security to find out how to use MessageDigest class. After reading I knew that I had to use a "getInstance" method to get a usable MessageDigest object I could use. The Javadoc went on to say "The data is processed through it using the update methods." So I looked at the update methods and determined that I needed to use the one where I fed it a byte array of my string, so I added that part. The Javadoc went on to say "Once all the data to be updated has been updated, one of the digest methods should be called to complete the hash computation." I, again, looked at the methods and saw that digest returned a byte array, so I added that part. Then I used the "toString" method on the new byte array to get a string I could print. However, when I compiled and ran the code all that printed out was this:
The Answer Is: [B#4cb162d5
I have done some looking around here on StackOverflow and found some information here:
How can I generate an MD5 hash?
that gave the following example:
String plaintext = 'your text here';
MessageDigest m = MessageDigest.getInstance("MD5");
m.reset();
m.update(plaintext.getBytes());
byte[] digest = m.digest();
BigInteger bigInt = new BigInteger(1,digest);
String hashtext = bigInt.toString(16);
// Now we need to zero pad it if you actually want the full 32 chars.
while(hashtext.length() < 32 ){
hashtext = "0"+hashtext;
}
It seems the only part I MAY be missing is the "BigInteger" part, but I'm not sure.
So, after all of this, I guess what I am asking is, how do you know to use the "BigInteger" part? I wrongly assumed that the "toString" method on my newBA object would convert it to a readable output, but I was, apparently, wrong. How is a person supposed to know which way to go in Java? I have a background in C so this Java thing seems pretty weird. Any advice on how I can get better without having to "cheat" by Googling how to do something all the time?
Thank you all for taking the time to read. :-)
The key in this particular case is that you need to realize that bytes are not "human readable", but characters are. So you need to convert bytes to characters in a certain format. For arbitrary bytes like hashes, usually hexadecimal is been used as "human readable" format. Every byte is then to be converted to a 2-character hexadecimal string which you in turn concatenate together.
This is unrelated to the language you use. You just have to understand/realize how it works "under the hoods" in a language agnostic way. You have to understand what you have (a byte array) and what you want (a hexstring). The programming language is just a tool to achieve the desired result. You just google the "functional requirement" along with the programming language you'd like to use to achieve the requirement. E.g. "convert byte array to hex string in java".
That said, the code example you found is wrong. You should actually determine each byte inside a loop and test if it is less than 0x10 and then pad it with zero instead of only padding the zero depending on the length of the resulting string (which may not necessarily be caused by the first byte being less than 0x10!).
StringBuilder hex = new StringBuilder(bytes.length * 2);
for (byte b : bytes) {
if ((b & 0xff) < 0x10) hex.append("0");
hex.append(Integer.toHexString(b & 0xff));
}
String hexString = hex.toString();
Update as per the comments on the answer of #extraneon, using new BigInteger(byte[]) is also the wrong solution. This doesn't unsign the bytes. Bytes (as all primitive numbers) in Java are signed. They have a negative range. The byte in Java ranges from -128 to 127 while you want to have a range of 0 to 255 to get a proper hexstring. You basically just need to remove the sign to make them unsigned. The & 0xff in the above example does exactly that.
The hexstring as obtained from new BigInteger(bytes).toString(16) is NOT compatible with the result of all other hexstring producing MD5 generators the world is aware of. They will differ whenever you've a negative byte in the MD5 digest.
You have actually successfully digested the message. You just don't know how to present the found digest value properly. What you have is a byte array. That's a bit difficult to read, and a toString of a byte array yields [B#somewhere which is not useful at all.
The BigInteger comes into it as a tool to format the byte array to a single number.
What you do is:
construct a BigInteger with the proper value (in this case that value happens to be encoded in the form of a byte array - your digest
Instruct the BigInteger object to return a String representation (e.g. plain, readable text) of that number, base 16 (e.g. hex)
And the while loop prefixes that value with 0-characters to get a width of 32. I'd probably use String.format for that, but whatever floats your boat :)
MessageDigests compute a byte array of something, the string that you usually see (such as 1f3870be274f6c49b3e31a0c6728957f) is actually just a conversion of the byte array to a hexadecimal string.
When you call MessageDigest.toString(), it calls MessageDigest.digest().toString(), and in Java, the toString method for a byte[] (returned by MessageDigest.digest()) returns a sort of reference to the bytes, not the actual bytes.
In the code you posted, the byte array is changed to an integer (in this case a BigInteger because it would be extremely large), and then converted to hexadecimal to be printed to a String.
The byte array computed by the digest represents a number (a 128-bit number according to http://en.wikipedia.org/wiki/MD5), and that number can be converted to any other base, so the result of the MD5 could be represented as a base-10 number, a base-2 number (as in a byte array), or, most commonly, a base-16 number.
It is OK to google for answers as long as you (eventually) understand what you copy-pasted into your app :-)
In general, I recommend starting with a good Java introductory book, or web tutorial. See these threads for more tips:
https://stackoverflow.com/questions/77839/what-are-the-best-resources-for-learning-java-books-websites-etc
Learning Java
https://stackoverflow.com/questions/78293/good-book-to-learn-to-program-well-in-java-engineering-or-architecture-wise-not
Though I'm afraid that I have no experience whatsoever using Java to play with MD5 hashes, I can recommend Sun's Java Tutorials as a fantastic resource for learning Java. They go through most of the language, and helped me out a ton when I was learing Java.
Also look around for other posts asking the same thing and see what suggestions popped up there.
The reason BigInteger is used is because the byte array is very long, too big too fit into an int or long. However, if you do want to see everything in the byte array, there's an alternate approach. You could just replace the line:
String output = newBA.toString();
with:
String output = Arrays.toString(newBA);
This will print out the contents of the array, not the reference address.
Use an IDE that shows you where the "toString()" method is coming from. In most cases it's just from the Object class and won't be very useful. It's generally recommended to overwrite the toString-method to provide some clean output, but many classes don't do this.
I'm also a newbie to development. For the current problem, I suggest the Book "Introduction To Cryptography With Java Applets" by David Bishop. It demonstrates what you need and so forth...
Any advice on how I can get better
without having to "cheat" by Googling
how to do something all the time?
By by not starting out with an MD5 hasher! Seriously, work your way up little by little on programs that you can complete without worrying about domain-specific stuff like MD5.
If you're dumping everything into main, you're not programming Java.
In a program of this scale, your main() should do one thing: create an MD5Hasher object and then call some methods on it. You should have a constructor that takes an initial string, a method to "do the work" (update, digest), and a method to print the result.
Get some tutorials and spend time on simple, traditional exercises (a Fibonacci generator, a program to solve some logic puzzle), so you understand the language basics before bothering with the libraries, which is what you are struggling with now. Then you can start doing useful stuff.
I wrongly assumed that the "toString" method on my newBA object would convert it to a readable output, but I was, apparently, wrong. How is a person supposed to know which way to go in Java?
You could replace here Java with the language of your choice that you don't know/haven't mastered yet. Even if you worked 10 years in a specific language, you will still get those "Aha! This is the way it's working!"-effects, though not that often as in the beginning.
The point you need to learn here is that toString() is not returning the representation you want/expect, but any the implementer has chosen. The default implementation of toString() is like this (javadoc):
Returns a string representation of the object. In general, the toString method returns a string that "textually represents" this object. The result should be a concise but informative representation that is easy for a person to read. It is recommended that all subclasses override this method.
The toString method for class Object returns a string consisting of the name of the class of which the object is an instance, the at-sign character `#', and the unsigned hexadecimal representation of the hash code of the object. In other words, this method returns a string equal to the value of:
getClass().getName() + '#' + Integer.toHexString(hashCode())
How is a person supposed to know which
way to go in Java? I have a background
in C so this Java thing seems pretty
weird. Any advice on how I can get
better without having to "cheat" by
Googling how to do something all the
time?
Obvious answers are 1- google when you have questions (and it's not considered cheating imo) and 2- read books on the subject matter.
Apart from these two, I would recommend trying to find a mentor for yourself. If you do not have experienced Java developers at work, then try to join a local Java developer user group. You can find more experienced developers there and perhaps pick their brains to get answers to your questions.

Convert ASCII byte[] to String

I am trying to pass a byte[] containing ASCII characters to log4j, to be logged into a file using the obvious representation. When I simply pass in the byt[] it is of course treated as an object and the logs are pretty useless. When I try to convert them to strings using new String(byte[] data), the performance of my application is halved.
How can I efficiently pass them in, without incurring the approximately 30us time penalty of converting them to strings.
Also, why does it take so long to convert them?
Thanks.
Edit
I should add that I am optmising for latency here - and yes, 30us does make a difference! Also, these arrays vary from ~100 all the way up to a few thousand bytes.
ASCII is one of the few encodings that can be converted to/from UTF16 with no arithmetic or table lookups so it's possible to convert manually:
String convert(byte[] data) {
StringBuilder sb = new StringBuilder(data.length);
for (int i = 0; i < data.length; ++ i) {
if (data[i] < 0) throw new IllegalArgumentException();
sb.append((char) data[i]);
}
return sb.toString();
}
But make sure it really is ASCII, or you'll end up with garbage.
What you want to do is delay processing of the byte[] array until log4j decides that it actually wants to log the message. This way you could log it at DEBUG level, for example, while testing and then disable it during production. For example, you could:
final byte[] myArray = ...;
Logger.getLogger(MyClass.class).debug(new Object() {
#Override public String toString() {
return new String(myArray);
}
});
Now you don't pay the speed penalty unless you actually log the data, because the toString method isn't called until log4j decides it'll actually log the message!
Now I'm not sure what you mean by "the obvious representation" so I've assumed that you mean convert to a String by reinterpreting the bytes as the default character encoding. Now if you are dealing with binary data, this is obviously worthless. In that case I'd suggest using Arrays.toString(byte[]) to create a formatted string along the lines of
[54, 23, 65, ...]
If your data is in fact ASCII (i.e. 7-bit data), then you should be using new String(data, "US-ASCII") instead of depending on the platform default encoding. This may be faster than trying to interpret it as your platform default encoding (which could be UTF-8, which requires more introspection).
You could also speed this up by avoiding the Charset-Lookup hit each time, by caching the Charset instance and calling new String(data, charset) instead.
Having said that: it's been a very, very long time since I've seen real ASCII data in production environment
Halved performance? How large is this byte array? If it's for example 1MB, then there are certainly more factors to take into account than just "converting" from bytes to chars (which is supposed to be fast enough though). Writing 1MB of data instead of "just" 100bytes (which the byte[].toString() may generate) to a log file is obviously going to take some time. The disk file system is not as fast as RAM memory.
You'll need to change the string representation of the byte array. Maybe with some more sensitive information, e.g. the name associated with it (filename?), its length and so on. After all, what does that byte array actually represent?
Edit: I can't remember to have seen the "approximately 30us" phrase in your question, maybe you edited it in within 5 minutes after asking, but this is actually microoptimization and it should certainly not cause "halved performance" in general. Unless you write them a million times per second (still then, why would you want to do that? aren't you overusing the phenomenon "logging"?).
Take a look here: Faster new String(bytes, cs/csn) and String.getBytes(cs/csn)

Categories