MD5 Hash of ISO-8859-1 string in Java - java

I'm implementing an interface for digital payment service called Suomen Verkkomaksut. The information about the payment is sent to them via HTML form. To ensure that no one messes with the information during the transfer a MD5 hash is calculated at both ends with a special key that is not sent to them.
My problem is that for some reason they seem to decide that the incoming data is encoded with ISO-8859-1 and not UTF-8. The hash that I sent to them is calculated with UTF-8 strings so it differs from the hash that they calculate.
I tried this with following code:
String prehash = "6pKF4jkv97zmqBJ3ZL8gUw5DfT2NMQ|13466|123456||Testitilaus|EUR|http://www.esimerkki.fi/success|http://www.esimerkki.fi/cancel|http://www.esimerkki.fi/notify|5.1|fi_FI|0412345678|0412345678|esimerkki#esimerkki.fi|Matti|Meikäläinen||Testikatu 1|40500|Jyväskylä|FI|1|2|Tuote #101|101|1|10.00|22.00|0|1|Tuote #202|202|2|8.50|22.00|0|1";
String prehashIso = new String(prehash.getBytes("ISO-8859-1"), "ISO-8859-1");
String hash = Crypt.md5sum(prehash).toUpperCase();
String hashIso = Crypt.md5sum(prehashIso).toUpperCase();
Unfortunately both hashes are identical with value C83CF67455AF10913D54252737F30E21. The correct value for this example case is 975816A41B9EB79B18B3B4526569640E according to Suomen Verkkomaksut's documentation.
Is there a way to calculate MD5 hash in Java with ISO-8859-1 strings?
UPDATE: While waiting answer from Suomen Verkkomaksut, I found an alternative way to make the hash. Michael Borgwardt corrected my understanding of String and encodings and I looked for a way to make the hash from byte[].
Apache Commons is an excellent source of libraries and I found their DigestUtils class which has a md5hex function which takes byte[] input and returns a 32 character hex string.
For some reason this still doesn't work. Both of these return the same value:
DigestUtils.md5Hex(prehash.getBytes());
DigestUtils.md5Hex(prehash.getBytes("ISO-8859-1"));

You seem to misunderstand how string encoding works, and your Crypt class's API is suspect.
Strings don't really "have an encoding" - an encoding is what you use to convert between Strings and bytes.
Java Strings are internally stored as UTF-16, but that does not really matter, as MD5 works on bytes, not Strings. Your Crypt.md5sum() method has to convert the Strings it's passed to bytes first - what encoding does it use to do that? That's probably the source of your problem.
Your example code is pretty nonsensical as the only effect this line has:
String prehashIso = new String(prehash.getBytes("ISO-8859-1"), "ISO-8859-1");
is to replace characters that cannot be represented in ISO-8859-1 with question marks.

Java has a standard java.security.MessageDigest class, for calculating different hashes.
Here is the sample code
include java.security.MessageDigest;
// Exception handling not shown
String prehash = ...
final byte[] prehashBytes= prehash.getBytes( "iso-8859-1" );
System.out.println( prehash.length( ) );
System.out.println( prehashBytes.length );
final MessageDigest digester = MessageDigest.getInstance( "MD5" );
digester.update( prehashBytes );
final byte[] digest = digester.digest( );
final StringBuffer hexString = new StringBuffer();
for ( final byte b : digest ) {
final int intByte = 0xFF & b;
if ( intByte < 10 )
{
hexString.append( "0" );
}
hexString.append(
Integer.toHexString( intByte )
);
}
System.out.println( hexString.toString( ).toUpperCase( ) );
Unfortunately for you it produces the same "C83CF67455AF10913D54252737F30E21" hash. So, I guess your Crypto class is exonerated. I specifically added the prehash and prehashBytes length printouts to verify that indeed 'ISO-8859-1' is used. In this case both are 328.
When I did presash.getBytes( "utf-8" ) it produced "9CC2E0D1D41E67BE9C2AB4AABDB6FD3" (and the length of the byte array became 332). Again, not the result you are looking for.
So, I guess Suomen Verkkomaksut does some massaging of the prehash string that they did not document, or you have overlooked.

Not sure if you solved your problem, but I had a similar problem with ISO-8859-1 encoded strings with nordic ä & ö characters and calculating a SHA-256 hash to compare with stuff in documentation. The following snippet worked for me:
import java.security.MessageDigest;
//imports omitted
#Test
public void test() throws ProcessingException{
String test = "iamastringwithäöchars";
System.out.println(this.digest(test));
}
public String digest(String data) throws ProcessingException {
MessageDigest hash = null;
try{
hash = MessageDigest.getInstance("SHA-256");
}
catch(Throwable throwable){
throw new ProcessingException(throwable);
}
byte[] digested = null;
try {
digested = hash.digest(data.getBytes("ISO-8859-1"));
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
String ret = BinaryUtils.BinToHexString(digested);
return ret;
}
To transform bytes to hex string there are many options, including the apache commons codec Hex class mentioned in this thread.

If you send UTF-8 encoded data that they treat as ISO-8859-1 then that could be the source of your problem. I suggest you either send the data in ISO-8859-1 or try to communicate to Suomen Verkkomaksut that you're sending UTF-8. In a http-based protocol you do this by adding charset=utf-8 to Content-Type in the HTTP header.
A way to rule out some issues would be to try a prehash String that only contains characters that are encoded the same in UTF-8 and ISO-8859-1. From what I can see you can achieve this by removing all "ä" characters in the string you'e used.

Related

Correctly compare code_verifier with code_challenge in Java

I'm using passport-oauth2 (passportjs.org and https://github.com/jaredhanson/passport-oauth2/blob/master/lib/strategy.js) for OAuth2+PKCE integration in a nodejs application.
The backend it's authenticating against is written in Java.
The problem is that I can't seem to decode->hash the code_verifier to correctly match the code_challenge that comes from passport-oauth2.
I know that the Base64 encoding that comes from passport has been generated to be URL safe (no padding, no wrapping, replacements for + or /), so I'm using a Url Decoder:
Base64.getUrlDecoder().decode(...)
Then I'm using commons DigestUtils to generate a SHA256 of the decoded verifier and comparing it with the challenge. So the whole thing looks something like this:
java.util.Base64.Decoder decoder = java.util.Base64.getUrlDecoder();
String codeChallenge = // get the code challenge from my cache
byte[] decodedCodeChallenge = decoder.decode(codeChallenge);
byte[] decodedCodeVerifier = decoder.decode(codeVerifier);
if (!Arrays.equals(sha256(decodedCodeVerifier), decodedCodeChallenge)) {
return Response.status(400).entity(ERROR_INVALID_CHALLENGE_VERIFIER).build();
}
Example:
This code verifier: 5CFCAiZC0g0OA-jmBmmjTBZiyPCQsnq_2q5k9fD-aAY
should match this code challenge: Fw7s3XHRVb2m1nT7s646UrYiYLMJ54as0ZIU_injyqw once both have been Base64-url-decoded and the verifier has been SHA256 hashed, but it doesn't.
What am I doing wrong?
Just 5 minutes later I figured it out.
In passport-oauth2, the code verifier is Base64-url-encoded(random bytes):
verifier = base64url(crypto.pseudoRandomBytes(32))
See: https://github.com/jaredhanson/passport-oauth2/blob/master/lib/strategy.js#L236
The challenge is then Base64-url-encoded(sha256(verifier)), which expands to Base64-url-encoded(sha256(Base64-url-encoded(random bytes))):
challenge = base64url(crypto.createHash('sha256').update(verifier).digest());
See: https://github.com/jaredhanson/passport-oauth2/blob/master/lib/strategy.js#L242
So to do the verification, I don't need to decode anything. It was sha256-d in it's encoded state.
This worked in the end:
java.util.Base64.Encoder encoder = java.util.Base64.getUrlEncoder();
String codeChallenge = // get code challenge from my cache;
String encodedVerifier = new String(encoder.encode(sha256(codeVerifier))).split("=")[0]; // Remember to remove padding
if (!encodedVerifier.equals(codeChallenge)) {
return Response.status(400).entity(ERROR_INVALID_CHALLENGE_VERIFIER).build();
}

What encoding Java uses to create string from give unicode data?

I am quite perplexed on why I should not be encoding unicode text with UTF-8 for comparison when other text(to compare) has been encoded with UTF-8?
I wanted to compare a text(= アクセス拒否 - means Access denied) stored in external file encoded as UTF-8 with a constant string stored in a .java file as
public static final String ACCESS_DENIED_IN_JAPANESE = "\u30a2\u30af\u30bb\u30b9\u62d2\u5426"; // means Access denied
The java file was encoded as Cp1252.
I read the file as as input stream by using below code. Point to note that I am using UTF-8 for encoding.
InputStream in = new FileInputStream("F:\\sample.txt");
int b1;
byte[] bytes = new byte[4096];
int i = 0;
while (true) {
b1 = in.read();
if (b1 == -1)
break;
bytes[i++] = (byte) b1;
}
String japTextFromFile = new String(bytes, 0, i, Charset.forName("UTF-8"));
Now when I compare as
System.out.println(ACCESS_DENIED_IN_JAPANESE.equals(japTextFromFile)); // result is `true` , and works fine
but when I encode ACCESS_DENIED_IN_JAPANESE with UTF-8 and try to compare it with japTextFromFile result is false. The code is
String encodedAccessDenied = new String(ACCESS_DENIED_IN_JAPANESE.getBytes(),Charset.forName("UTF-8"));
System.out.println(encodedAccessDenied .equals(japTextFromFile)); // result is `false`
So my doubt is why above comparison is failing, when both the strings are same and have been encoded with UTF-8? The result should be true.
However, in first case, when compared different encoded strings- one with UTF-16(Java default way of encoding string) and other with UTF-8 , result is true, which I think should be false as it is different encoding ,no matter text we read, is same.
Where I am wrong in my understanding? Any clarification is greatly appreciated.
ACCESS_DENIED_IN_JAPANESE.getBytes() does not use UTF-8. It uses your platform's default charset. But then you use UTF-8 to turn those bytes back into a String. This gets you a different String to the one you started with.
Try this:
String encodedAccessDenied = new String(ACCESS_DENIED_IN_JAPANESE.getBytes(StandardCharsets.UTF_8),StandardCharsets.UTF_8
);
System.out.println(encodedAccessDenied .equals(japTextFromFile)); // result is `true`
The best way I know is put all static texts into a text file encoded with UTF-8. And then read those resources with FileReader, setting encoding parameter to "UTF-8"

How to unescape html special characters in Java?

I have some text strings that I need to process and inside the strings there are HTML special characters. For example:
10😭😭😂😂😂😂😢😂10😭😭😂😂😂😂😢😂😂
I would like to convert those characters to utf-8.
I used org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4 but didn't have any luck. Is there an easy way to deal with this problem?
Apache commons-text library has the StringEscapeUtils class that has the unescapeHtml4() utility method.
String utf8Str = StringEscapeUtils.unescapeHtml4(htmlStr);
You may also need unescapeXml()
#Bohemian 's code is correct, It works for me, your un-encoded string is 10😭😭😂😂😂😂😢😂10😭😭😂😂😂😂😢😂😂.
Now, I'm adding another answer instead of commenting on Bohemian's answer because there are two things that still need to be mentioned:
I copy-pasted your string into HTML code and the browser can't render your characters properly, because your String is incorrectly encoded, i. e. the string has encoded the high surrogate and the low one for two-bytes-chars separately, instead of encoding the whole codepoint (it seems the original string is a UTF-16 encoded string, maybe a Java String?).
You want the string to be re-encoded to UTF-8.
Once you have your String unencoded by StringEscapeUtils.unescapeHtml(htmlStr) (which un-encodes your string successfully despite being encoded incorrectly), it doesn't have much sense talking about "string encodings" as java strings are "unaware" about encodings. (they use UTF-16 internally though).
If you need a group of bytes containing a UTF-8 encoded "string", you need to get the "raw" bytes from a String encoded as UTF-8:
String javaStr = StringEscapeUtils.unescapeHtml(htmlStr);
byte[] rawUft8String = javaStr.getBytes("UTF-8");
And do with such byte array whatever you need.
Now if what you need is to write a UTF-8 encoded string to a File, instead of that byte array you need to specify the encoding when you create the proper java.io.Writer.
Try this code to un-encode your string (change the file path first) and then open the resulting file in any editor that supports UTF-8:
java.io.Writer approach (better):
public static void main(String[] args) throws IOException {
String str = "10😭😭😂😂😂😂😢😂10😭😭😂😂😂😂😢😂😂";
String javaString = StringEscapeUtils.unescapeHtml(str);
try(Writer output = new OutputStreamWriter(
new FileOutputStream("/path/to/testing.txt"), "UTF-8")) {
output.write(javaString);
}
}
java.io.OutputStream approach (if you already have a "raw string"):
public static void main(String[] args) throws IOException {
String str = "10😭😭😂😂😂😂😢😂10😭😭😂😂😂😂😢😂😂";
String javaString = StringEscapeUtils.unescapeHtml(str);
try(OutputStream output = new FileOutputStream("/path/to/testing.txt")) {
for (byte b : javaString.getBytes(Charset.forName("UTF-8"))) {
output.write(b);
}
}
}

Why those calls to base64 classes return different results?

My code:
private static String convertToBase64(String string)
{
final byte[] encodeBase64 =
org.apache.commons.codec.binary.Base64.encodeBase64(string
.getBytes());
System.out.println(Hex.encodeHexString(encodeBase64));
final byte[] data = string.getBytes();
final String encoded =
javax.xml.bind.DatatypeConverter.printBase64Binary(data);
System.out.println(encoded);
return encoded;
}
Now I'm calling it: convertToBase64("stackoverflow"); and get following result:
6333526859327476646d56795a6d787664773d3d
c3RhY2tvdmVyZmxvdw==
Why I get different results?
I think Hex.encodeHexString will encode your String to hexcode, and the second one is a normal String
From the API doc of Base64.encodeBase64():
byte[] containing Base64 characters in their UTF-8 representation.
So instead
System.out.println(Hex.encodeHexString(encodeBase64));
you should write
System.out.println(new String(encodeBase64, "UTF-8"));
BTW: You should never use the String.getBytes() version without explicit encoding, because the result depends on the default platform encoding (for Windows this is usually "Cp1252" and Linux "UTF-8").

Parsing Facebook signed_request using Java returns malformed JSON

I'm trying to parse Facebook signed_request inside Java Servlet's doPost. And I decode the signed request using commons-codec-1.3's Base64.
Here is the code which I used to do it inside servlet's doPost
String signedRequest = (String) req.getParameter("signed_request");
String payload = signedRequest.split("[.]", 2)[1];
payload = payload.replace("-", "+").replace("_", "/").trim();
String jsonString = new String(Base64.decodeBase64(payload.getBytes()));
when I System.out the jsonString it's malformed. Sometime's it misses the ending } of JSON
sometime it misses "} in the end of the string.
How can I get the proper JSON response from Facebook?
facebook is using Base64 for URLs and you are probably trying to decode the text using the standard Base64 algorithm.
among other things, the URL variant doesn't required padding with "=".
you could add the required characters in code (padding, etc)
you can use commons-codec 1.5 ( new Base64(true)), where they added support for this encoding.
The Facebook is sending you "unpadded" Base64 values (the URL "standard") and this is problematic for Java decoders that don't expect it. You can tell you have the problem when the Base64 encoded data that you want to decode has a length that is not a multiple of 4.
I used this function to fix the values:
public static String padBase64(String b64) {
String padding = "";
// If you are a java developer, *this* is the critical bit.. FB expects
// the base64 decode to do this padding for you (as the PHP one
// apparently
// does...
switch (b64.length() % 4) {
case 0:
break;
case 1:
padding = "===";
break;
case 2:
padding = "==";
break;
default:
padding = "=";
}
return b64 + padding;
}
I have never done this in Java so I don't have a full answer, but the fact that you are sometimes losing one and sometimes two characters from the end of the string suggests it may be an issue with Base64 padding. You might want to output the value of payload and see if when it ends with '=' then jsonString is missing '}' and when payload ends with '==' then jsonString is missing '"}'. If that seems to be the case then something is going wrong with the interpretation of the equals signs at the end of payload which are supposed to represent empty bits.
Edit: On further reflection I believe this is because Facebook is using Base64 URL encoding (which does not add = as pad chars) instead of regular Base64, whereas your decoding function is expecting regular Base64 with the trailing = chars.
I've upgraded to common-codec-1.5 using code very similar to this and am not experiencing this issue. Have you confirmed that payload really is malformed by using an online decoder?
Hello in the year 2021.
The other answers are obsolete, because with Java 8 and newer you can decode the base64url scheme by using the new Base64.getUrlDecoder() (instead of getDecoder).
The base64url scheme is a URL and filename safe dialect of the main base64 scheme and uses "-" instead of "+" and "_" instead of "/" (because the plus and slash chars have special meanings in URLs). Also it does not use "=" chars for the padding (0 to 4 chars) at the end of string.
Here is how you can parse the Facebook signed_request parameter in Java into a Map object:
public static Map<String, String> parseSignedRequest(HttpServletRequest httpReq, String facebookSecret) throws ServletException {
String signedRequest = httpReq.getParameter("signed_request");
String splitArray[] = signedRequest.split("\\.", 2);
String sigBase64 = splitArray[0];
String payloadBase64 = splitArray[1];
String payload = new String(Base64.getUrlDecoder().decode(payloadBase64));
try {
Mac sha256_HMAC = Mac.getInstance("HmacSHA256");
SecretKeySpec secretKey = new SecretKeySpec(facebookSecret.getBytes(), "HmacSHA256");
sha256_HMAC.init(secretKey);
String sigExpected = Base64.getUrlEncoder().withoutPadding().encodeToString(sha256_HMAC.doFinal(payloadBase64.getBytes()));
if (!sigBase64.equals(sigExpected)) {
LOG.warn("sigBase64 = {}", sigBase64);
LOG.warn("sigExpected = {}", sigExpected);
throw new ServletException("Invalid sig = " + sigBase64);
}
} catch (IllegalStateException | InvalidKeyException | NoSuchAlgorithmException ex) {
throw new ServletException("parseSignedRequest", ex);
}
// use Jetty JSON parsing or some other library
return (Map<String, String>) JSON.parse(payload);
}
I have used the Jetty JSON parser:
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-util</artifactId>
<version>9.4.43.v20210629</version>
</dependency>
but there are more libraries available in Java for parsing JSON.

Categories