How to encode a ByteArray to a base36 String - java

To encode a ByteArray to Base64 I could simply use Base64 from java.util. But now I need to change my code to create base36
instead. Unfortunately java.util doesn't have this functionality. What I need is a function/method that takes a ByteArray and outputs a String containing a base36 representation of it. No other changes like cutting off leading zeroes.
So this other question looks similar but we're are 2 problems. First the question got edited so that I don't understand the answer. Second, the answer uses BigInteger and I'm afraid that converting the ByteArray to a BigInteger could lead to information loss (like leading zeroes).
Similar question on stackoverflow.

Related

Hashing a file with SHA256

I need to add a digital signature to a picture using the RSA method. To do this, I first need to hash the file with SHA256. But how can this be done? As I understand it, I should get an array of byte hash[] with hashed bytes.
For example in c# example MD5 there is such a way:
byte[] hash = MD5.Create.ComputeHash(bytesOfFile);
I tried like this, but how can I get an array of hashes and how can I change the size of the final one?
MessageDigest sha256 = MessageDigest.getInstance("SHA-256");
byte[] array = Files.readAllBytes(Path.of("C:\\Users\\User\\IdeaProjects\\CryptoLab1\\src\\crypto\\dino.png"));
sha256.digest(array);
System.out.println(Arrays.toString(sha256.digest()));
You're almost there. sha256.digest(array) itself returns the result, your next digest call wouldn't work anymore. The API of MessageDigest is: Send as many bytes as you want in chunks via the update method, and then do a final call with a digest method. Purely as a convenience, digest(byteArr) is identical to update(byteArr); digest();.
So, replace your last 2 lines with just System.out.println(Arrays.toString(sha256.digest(array));.
A few more notes:
This code eats a ton of memory if the file is large, and there is absolutely no need for it; you can just repeatedly read a smallish byte array, send it to the MessageDigest object via an update method. Imagine this file is 4GB large; such a setup can easily do it in a tiny memory footprint, whereas your code would require 4GB worth of memory.
A hash's size is what it is, you can't "resize" it, that doesn't make sense. Because bytes are annoying to show, various solutions exist. There is no one unified standard. Whatever tool gives you the supposed hash will have picked a way to turn a bunch of bytes (the hash) into string form. The common candidates are hex-nibbles, which looks like a sequence of symbols where each symbol is 0-9 or A-F, and all the letters are either all uppercased, or all lowercased. Or, base64, which is a mix of letters and digits.
To turn a byte array to hex-nibble form in java, follow this guide. To turn a byte array into base64, it's simply Base64.getEncoder().encodeToString(byteArr).

How to inflate in Python some data that was deflated by Peoplesoft (Java)?

DISCLAIMER: Peoplesoft knowledge is not mandatory in order to help me with this one!
How could i extract the data from that Peoplesoft table, from the PUBDATALONG column?
The description of the table is here:
http://www.go-faster.co.uk/peopletools/psiblogdata.htm
Currently i am using a program written in Java and below is a piece of the code:
Inflater inflater = new Inflater();
byte[] result = new byte[rs.getInt("UNCOMPDATALEN")];
inflater.setInput(rs.getBytes("PUBDATALONG"));
int length = inflater.inflate(result);
System.out.println(new String(result, 0, length, "UTF-8"));
System.out.println();
System.out.println("-----");
System.out.println();
How could I rewrite this using Python?
It is a question that appeared in other forms on Stackoverflow but had no real answer.
I have basic understanding of what the code does in Java but i don't know any library in Python i could work with to achieve the same thing.
Some recommended to try zlib, as it is compatible with the algorithm used by Java Inflater class, but i did not succeed in doing that.
Considering the below facts from PeopleSoft manual:
When the message is received by the PeopleSoft database, the XML data
is converted to UTF-8 to prevent any UCS2 byte order issues. It is
also compressed using the deflate algorithm prior to storage in the
database.
I tried something like this:
import zlib
import base64
UNCOMPDATALEN = 362 #this value is taken from the DB and is the dimension of the data after decompression.
PUBDATALONG = '789CB3B1AFC8CD51284B2D2ACECCCFB35532D43350B2B7E3E5B2F130F40C8977770D8977F4710D0A890F0E710C090D8EF70F0D09080DB183C8BAF938BAC707FBBBFB783ADA19DAE86388D904B90687FAC0F4DAD940CD70F67771B533B0D147E6DAE8A3A9D5C76B3F00E2F4355C=='
print zlib.decompress(base64.b64decode(PUBDATALONG), 0, 362)
and I get this:
zlib.error: Error -3 while decompressing data: incorrect header check
for sure I do something wrong but I am not smart enough to figure it out by myself.
That string is not Base-64 encoded. It is simply hexadecimal. (I have no idea why it ends in ==, which makes it look a little like a Base-64 string.) You should be able to see by inspection that there are no lower case letters, or for that matter upper case letters after F as there would be in a typical Base-64 encoded string of compressed, i.e. random-appearing data.
Remove the equal signs at the end and use .decode("hex") in Python 2, or bytes.fromhex() in Python 3.

String to Byte[] and Byte to String

Given the following example:
String f="FF00000000000000";
byte[] bytes = DatatypeConverter.parseHexBinary(f);
String f2= new String (bytes);
I want the output to be FF00000000000000 but it's not working with this method.
You're currently trying to interpret the bytes as if they were text encoded using the platform default encoding (UTF-8, ISO-8859-1 or whatever). That's not what you actually want to do at all - you want to convert it back to hex.
For that, just look at the converter you're using for the parsing step, and look for similar methods which work in the opposite direction. In this case, you want printHexBinary:
String f2 = DatatypeConverter.printHexBinary(bytes);
The approach of "look for reverse operations near the original operation" is a useful one in general... but be aware that sometimes you need to look at a parallel type, e.g. DataInputStream / DataOutputStream. When you find yourself using completely different types for inverse operations, that's usually a bit of a warning sign. (It's not always wrong, it's just worth investigating other options.)

Avoiding line breaks in encrypted and encoded URL string

I am trying to implement a simple string encoder to obfuscates some parts of a URL string (to prevent them from getting mucked with by a user). I'm using code nearly identical to the sample in the JCA guide, except:
using DES (assuming it's a little faster than AES, and requires a smaller key) and
Base64 en/decoding the string to make sure it stays safe for a URL.
For reasons I can't understand, the output string ends up with linebreaks, which I presume won't work. I can't figure out what's causing this. Suggestions on something similar that's easier or pointers to some other resources to read? I'm finding all the cryptography references a bit over my head (and overkill), but a simple ROT13 implementation won't work since I want to deal with a larger character set (and don't want to waste time implementing something likely to have issues with obscure characters i didn't think of).
Sample input (no line break):
http://maps.google.com/maps?q=kansas&hl=en&sll=42.358431,-71.059773&sspn=0.415552,0.718918&hnear=Kansas&t=m&z=7
Sample Output (line breaks as shown below):
GstikIiULcJSGEU2NWNTpyucSWUFENptYk4m5lD8RJl8l1CuspiuXiE9a07fUEAGM/tC7h0Vzus+
jAH6cT4Wtz2RUlBdGf8WtQxVDKZVOzKwi84eQh2kZT9T3KomlnPOu2owJ/2RAEvG+QuGem5UGw==
my encode snippet:
final Key key = new SecretKeySpec(seed.getBytes(), "DES");
final Cipher c = Cipher.getInstance("DES");
c.init(Cipher.ENCRYPT_MODE, key);
final byte[] encVal = c.doFinal(s.getBytes());
return new BASE64Encoder().encode(encVal);
Simply perform base64Str = base64Str.replaceAll("(?:\\r\\n|\\n\\r|\\n|\\r)", "")
on the encoded string.
It works fine when you try do decode it back to bytes. I did test it several times with random generated byte arrays. Obviously decoding process just ignores the newlines either they are present or not.
I tested this "confirmed working" by using com.sun.org.apache.xml.internal.security.utils.Base64
Other encoders not tested.
Base64 encoders usually impose some maximum line (chunk) length, and adds newlines when necessary. You can normally configure that, but that depends on the particular coder implementation.
For example, the class from Apache Commons has a linelength attribute, setting it to zero (or negative) disables the line separation.
BTW: I agree with the other answer in that DES is hardly advisable today. Further, are you just "obfuscating" or really encrypting? Who has the key? The whole thing does not smell very well to me.
import android.util.Base64;
...
return new BASE64.encodeToString(encVal, Base64.NO_WRAP);
Though it's unrelated to your actual question, DES is generally slower than AES (at least in software), so unless you really need to keep the key small, AES is almost certainly a better choice.
Second, it's perfectly normal that encryption (DES or AES) would/will produce new-line characters in its output. Producing output without them will be entirely up to the base-64 encoder, so that's where you clearly need to look.
It's not particularly surprising to see a base-64 insert new-line characters at regular intervals in its output though. The most common use for base-64 encoding is putting raw data into something like the body of an email, where a really long line would cause a problem. To prevent that, the data is broken up into pieces, typically no more than 80 columns (and usually a bit less). In this case, the new-lines should be ignored, however, so you should be able to just delete them, if memory serves.

How does a person go about learning Java? (convert byte array to hex string)

I know this sounds like a broad question but I can narrow it down with an example. I am VERY new at Java. For one of my "learning" projects, I wanted to create an in-house MD5 file hasher for us to use. I started off very simple by attempting to hash a string and then moving on to a file later. I created a file called MD5Hasher.java and wrote the following:
import java.security.*;
import java.io.*;
public class MD5Hasher{
public static void main(String[] args){
String myString = "Hello, World!";
byte[] myBA = myString.getBytes();
MessageDigest myMD;
try{
myMD = MessageDigest.getInstance("MD5");
myMD.update(myBA);
byte[] newBA = myMD.digest();
String output = newBA.toString();
System.out.println("The Answer Is: " + output);
} catch(NoSuchAlgorithmException nsae){
// print error here
}
}
}
I visited java.sun.com to view the javadocs for java.security to find out how to use MessageDigest class. After reading I knew that I had to use a "getInstance" method to get a usable MessageDigest object I could use. The Javadoc went on to say "The data is processed through it using the update methods." So I looked at the update methods and determined that I needed to use the one where I fed it a byte array of my string, so I added that part. The Javadoc went on to say "Once all the data to be updated has been updated, one of the digest methods should be called to complete the hash computation." I, again, looked at the methods and saw that digest returned a byte array, so I added that part. Then I used the "toString" method on the new byte array to get a string I could print. However, when I compiled and ran the code all that printed out was this:
The Answer Is: [B#4cb162d5
I have done some looking around here on StackOverflow and found some information here:
How can I generate an MD5 hash?
that gave the following example:
String plaintext = 'your text here';
MessageDigest m = MessageDigest.getInstance("MD5");
m.reset();
m.update(plaintext.getBytes());
byte[] digest = m.digest();
BigInteger bigInt = new BigInteger(1,digest);
String hashtext = bigInt.toString(16);
// Now we need to zero pad it if you actually want the full 32 chars.
while(hashtext.length() < 32 ){
hashtext = "0"+hashtext;
}
It seems the only part I MAY be missing is the "BigInteger" part, but I'm not sure.
So, after all of this, I guess what I am asking is, how do you know to use the "BigInteger" part? I wrongly assumed that the "toString" method on my newBA object would convert it to a readable output, but I was, apparently, wrong. How is a person supposed to know which way to go in Java? I have a background in C so this Java thing seems pretty weird. Any advice on how I can get better without having to "cheat" by Googling how to do something all the time?
Thank you all for taking the time to read. :-)
The key in this particular case is that you need to realize that bytes are not "human readable", but characters are. So you need to convert bytes to characters in a certain format. For arbitrary bytes like hashes, usually hexadecimal is been used as "human readable" format. Every byte is then to be converted to a 2-character hexadecimal string which you in turn concatenate together.
This is unrelated to the language you use. You just have to understand/realize how it works "under the hoods" in a language agnostic way. You have to understand what you have (a byte array) and what you want (a hexstring). The programming language is just a tool to achieve the desired result. You just google the "functional requirement" along with the programming language you'd like to use to achieve the requirement. E.g. "convert byte array to hex string in java".
That said, the code example you found is wrong. You should actually determine each byte inside a loop and test if it is less than 0x10 and then pad it with zero instead of only padding the zero depending on the length of the resulting string (which may not necessarily be caused by the first byte being less than 0x10!).
StringBuilder hex = new StringBuilder(bytes.length * 2);
for (byte b : bytes) {
if ((b & 0xff) < 0x10) hex.append("0");
hex.append(Integer.toHexString(b & 0xff));
}
String hexString = hex.toString();
Update as per the comments on the answer of #extraneon, using new BigInteger(byte[]) is also the wrong solution. This doesn't unsign the bytes. Bytes (as all primitive numbers) in Java are signed. They have a negative range. The byte in Java ranges from -128 to 127 while you want to have a range of 0 to 255 to get a proper hexstring. You basically just need to remove the sign to make them unsigned. The & 0xff in the above example does exactly that.
The hexstring as obtained from new BigInteger(bytes).toString(16) is NOT compatible with the result of all other hexstring producing MD5 generators the world is aware of. They will differ whenever you've a negative byte in the MD5 digest.
You have actually successfully digested the message. You just don't know how to present the found digest value properly. What you have is a byte array. That's a bit difficult to read, and a toString of a byte array yields [B#somewhere which is not useful at all.
The BigInteger comes into it as a tool to format the byte array to a single number.
What you do is:
construct a BigInteger with the proper value (in this case that value happens to be encoded in the form of a byte array - your digest
Instruct the BigInteger object to return a String representation (e.g. plain, readable text) of that number, base 16 (e.g. hex)
And the while loop prefixes that value with 0-characters to get a width of 32. I'd probably use String.format for that, but whatever floats your boat :)
MessageDigests compute a byte array of something, the string that you usually see (such as 1f3870be274f6c49b3e31a0c6728957f) is actually just a conversion of the byte array to a hexadecimal string.
When you call MessageDigest.toString(), it calls MessageDigest.digest().toString(), and in Java, the toString method for a byte[] (returned by MessageDigest.digest()) returns a sort of reference to the bytes, not the actual bytes.
In the code you posted, the byte array is changed to an integer (in this case a BigInteger because it would be extremely large), and then converted to hexadecimal to be printed to a String.
The byte array computed by the digest represents a number (a 128-bit number according to http://en.wikipedia.org/wiki/MD5), and that number can be converted to any other base, so the result of the MD5 could be represented as a base-10 number, a base-2 number (as in a byte array), or, most commonly, a base-16 number.
It is OK to google for answers as long as you (eventually) understand what you copy-pasted into your app :-)
In general, I recommend starting with a good Java introductory book, or web tutorial. See these threads for more tips:
https://stackoverflow.com/questions/77839/what-are-the-best-resources-for-learning-java-books-websites-etc
Learning Java
https://stackoverflow.com/questions/78293/good-book-to-learn-to-program-well-in-java-engineering-or-architecture-wise-not
Though I'm afraid that I have no experience whatsoever using Java to play with MD5 hashes, I can recommend Sun's Java Tutorials as a fantastic resource for learning Java. They go through most of the language, and helped me out a ton when I was learing Java.
Also look around for other posts asking the same thing and see what suggestions popped up there.
The reason BigInteger is used is because the byte array is very long, too big too fit into an int or long. However, if you do want to see everything in the byte array, there's an alternate approach. You could just replace the line:
String output = newBA.toString();
with:
String output = Arrays.toString(newBA);
This will print out the contents of the array, not the reference address.
Use an IDE that shows you where the "toString()" method is coming from. In most cases it's just from the Object class and won't be very useful. It's generally recommended to overwrite the toString-method to provide some clean output, but many classes don't do this.
I'm also a newbie to development. For the current problem, I suggest the Book "Introduction To Cryptography With Java Applets" by David Bishop. It demonstrates what you need and so forth...
Any advice on how I can get better
without having to "cheat" by Googling
how to do something all the time?
By by not starting out with an MD5 hasher! Seriously, work your way up little by little on programs that you can complete without worrying about domain-specific stuff like MD5.
If you're dumping everything into main, you're not programming Java.
In a program of this scale, your main() should do one thing: create an MD5Hasher object and then call some methods on it. You should have a constructor that takes an initial string, a method to "do the work" (update, digest), and a method to print the result.
Get some tutorials and spend time on simple, traditional exercises (a Fibonacci generator, a program to solve some logic puzzle), so you understand the language basics before bothering with the libraries, which is what you are struggling with now. Then you can start doing useful stuff.
I wrongly assumed that the "toString" method on my newBA object would convert it to a readable output, but I was, apparently, wrong. How is a person supposed to know which way to go in Java?
You could replace here Java with the language of your choice that you don't know/haven't mastered yet. Even if you worked 10 years in a specific language, you will still get those "Aha! This is the way it's working!"-effects, though not that often as in the beginning.
The point you need to learn here is that toString() is not returning the representation you want/expect, but any the implementer has chosen. The default implementation of toString() is like this (javadoc):
Returns a string representation of the object. In general, the toString method returns a string that "textually represents" this object. The result should be a concise but informative representation that is easy for a person to read. It is recommended that all subclasses override this method.
The toString method for class Object returns a string consisting of the name of the class of which the object is an instance, the at-sign character `#', and the unsigned hexadecimal representation of the hash code of the object. In other words, this method returns a string equal to the value of:
getClass().getName() + '#' + Integer.toHexString(hashCode())
How is a person supposed to know which
way to go in Java? I have a background
in C so this Java thing seems pretty
weird. Any advice on how I can get
better without having to "cheat" by
Googling how to do something all the
time?
Obvious answers are 1- google when you have questions (and it's not considered cheating imo) and 2- read books on the subject matter.
Apart from these two, I would recommend trying to find a mentor for yourself. If you do not have experienced Java developers at work, then try to join a local Java developer user group. You can find more experienced developers there and perhaps pick their brains to get answers to your questions.

Categories