Issue with java, String.getBytes method

Issue with java, String.getBytes method - java

I have a byte array of size 8.
I am converting it to string using the following code. (See below).
Now, when I convert the string again to byte[] using getBytes method, the result is absurd, which is a 16-sized byte[] with only a few (2 or 3) matching bytes to the previous byte array. Can someone tell me where I am going wrong?
byte[] message = new byte[8];
//initialize message
printBytes("message: " + message.length + " = ", message);
try {
String test = new String(message, "utf-8");
System.out.println(test);
byte[] f = test.getBytes("utf-8");
Help.printBytes("test = " + f.length, f);
} catch (UnsupportedEncodingException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
printBytes function:
public static void printBytes(String msg, byte[] b){
System.out.print(msg + " = ");
for(int i = 0; i < b.length; i++){
System.out.print("" + String.format("%02X", b[i]));
}
System.out.println("\n");
}
Output:
message: 8 = = 9A52D5D6C6E999AD
�R���陭
test = 16 = EFBFBD52EFBFBDEFBFBDEFBFBDE999AD

Your original byte[] had illegal byte sequences (that is, sequences that don't form valid UTF-8 characters). This has unspecified behavior for the String(byte[], String) constructor, but in your implementation, these bad bytes are replaced by the "�" characters, which is \uFFFD -- a three-byte character in UTF-8. You seem to have four of these, which account for 12 bytes right there.

new String(message, "utf-8");
This code tells the string object, that your message utf-8 encoded is.
test.getBytes("utf-8");
This code means, give me the bytes of string and encode as utf-8 encoded string. The result is, your string will be double utf-8 encoded.
Do once code, only.
String test = new String(message, "utf-8");
test.getBytes();
Sample for double encoded strings:
public class Test {
public static void main(String[] args) {
try {
String message = "äöü";
Test.printBytes("java internal encoded: = ", message.getBytes());
Test.printBytes("utf-8 encoded: = ", message.getBytes("utf-8"));
// get the string utf-8 encoded and create a new string with the
// utf-8 encoded content
message = new String(message.getBytes("utf-8"), "utf-8");
Test.printBytes("test get bytes without charset: = ", message.getBytes());
Test.printBytes("test get bytes with charset: = ", message.getBytes("utf-8"));
System.out.println(message);
System.out.println("double encoded: " + new String(message.getBytes("utf-8")));
} catch (Exception e) {
e.printStackTrace();
}
}
public static void printBytes(String msg, byte[] b) {
System.out.print(msg + " = ");
for (int i = 0; i < b.length; i++) {
System.out.print("" + String.format("%02X", b[i]));
}
System.out.println("\n");
}
}
Ouput:
java internal encoded: = = E4F6FC
utf-8 encoded: = = C3A4C3B6C3BC
test get bytes without charset: = = E4F6FC
test get bytes with charset: = = C3A4C3B6C3BC
äöü
double encoded: Ã¤Ã¶Ã¼ <-- the java internal encoding is not converted to utf-8, it is double encoded

Related

what this android function returns

I am trying to decode an APK file. I need to get what m21862a function returns.
Simply I need HASH value. Hash is requested to https://api.SOMESITE.net/external/auth. How it is generated?
Here is my part code:
a = HttpTools.m22199a("https://api.somesite.net/external/hello", false);
String str = BuildConfig.FLAVOR;
str = BuildConfig.FLAVOR;
str = BuildConfig.FLAVOR;
try {
str = ((String) new JSONObject(a).get("token")) + ZaycevApp.f15130a.m21564W();
Logger.m22256a("ZAuth", "token - " + str);
str = m21862a(str);
a = new JSONObject(HttpTools.m22199a(String.format("https://api.SOMESITE.net/external/auth?code=%s&hash=%s", new Object[]{a, str}), false)).getString("token");
if (!ae.m21746b((CharSequence) a)) {
ZaycevApp.f15130a.m21595f(a);
}
}
I need to know what is m21862a function. Is there PHP replacement for m21862a? Here is m21862a function:
private String m21862a(String str) {
try {
MessageDigest instance = MessageDigest.getInstance("MD5");
instance.update(str.getBytes());
byte[] digest = instance.digest();
StringBuffer stringBuffer = new StringBuffer();
for (byte b : digest) {
String toHexString = Integer.toHexString(b & RadialCountdown.PROGRESS_ALPHA);
while (toHexString.length() < 2) {
toHexString = "0" + toHexString;
}
stringBuffer.append(toHexString);
}
return stringBuffer.toString();
} catch (Exception e) {
Logger.m22252a((Object) this, e);
return BuildConfig.FLAVOR;
}
}

The function computes the MD5 digest of the input, takes each byte of the computed MD5, "ANDize" with RadialCountdown.PROGRESS_ALPHA, translates to hex (pad with 0 to have 2 char) and appends that to the ouput.
There is probably a way to do the same thing in php (using md5()?).

convert byte array to string in java

I try to convert byte array to string in java using new String( bytes, "UTF-8") method, but they only return the object. like this #AB4634bSbbfa
So, I searched some way to solve this problem.
I finally get valid string array, by converting hex-code to basic-character array.
like this. char[] chars = {"0", "1", ... "e", "f"};
This never happened before why do i have to convert hex-code to get valid string.
Here is method.
byte array which is hashed by Mac-sha-256 with specific key when i hashed.
public static String getHashString() {
String algorithm = "HmacSHA256";
String hashKey = "some_key";
String message = "abcdefg";
String hexed = "";
try {
Mac sha256_HMAC = Mac.getInstance(algorithm);
SecretKeySpec secret_key = new SecretKeySpec(hashKey.getBytes(), algorithm);
sha256_HMAC.init(secret_key);
byte[] hash = sha256_HMAC.doFinal(message.getBytes("UTF-8"));
// it doesn't work for me.
// hexed = new String(hash, "UTF-8");
// it works.
hexed = bytesToHex(hash);
} catch (NoSuchAlgorithmException e) {
e.printStackTrace();
} catch (InvalidKeyException e) {
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return hexed;
}
public static final char[] HEX_DIGITS = "0123456789abcdef".toCharArray();
public static String bytesToHex(final byte[] data ) {
final int l = data.length;
final char[] hexChars = new char[l<<1];
for( int i=0, j =0; i < l; i++ ) {
hexChars[j++] = HEX_DIGITS[(0xF0 & data[i]) >>> 4];
hexChars[j++] = HEX_DIGITS[0x0F & data[i]];
}
return new String(hexChars);
}
Thanks.

Following is a sample which shows Conversion of Byte array to String :-
public class TestByte
{
public static void main(String[] argv) {
String example = "This is an example";
byte[] bytes = example.getBytes();
System.out.println("Text : " + example);
System.out.println("Text [Byte Format] : " + bytes);
System.out.println("Text [Byte Format] : " + bytes.toString());
String s = new String(bytes);
System.out.println("Text Decryted : " + s);
}}

I'm not sure the string you get in the end is what you're after. I think a common scenario is to use
new BASE64Encoder().encode(hash)
which will return you the hashed message as String.

just do new String(byteArray);

Java java.io.IOException: Not in GZIP format

I searched for an example of how to compress a string in Java.
I have a function to compress then uncompress. The compress seems to work fine:
public static String encStage1(String str)
{
String format1 = "ISO-8859-1";
String format2 = "UTF-8";
if (str == null || str.length() == 0)
{
return str;
}
System.out.println("String length : " + str.length());
ByteArrayOutputStream out = new ByteArrayOutputStream();
String outStr = null;
try
{
GZIPOutputStream gzip = new GZIPOutputStream(out);
gzip.write(str.getBytes());
gzip.close();
outStr = out.toString(format2);
System.out.println("Output String lenght : " + outStr.length());
} catch (Exception e)
{
e.printStackTrace();
}
return outStr;
}
But the reverse is complaining about the string not being in GZIP format, even when I pass the return from encStage1 straight back into the decStage3:
public static String decStage3(String str)
{
if (str == null || str.length() == 0)
{
return str;
}
System.out.println("Input String length : " + str.length());
String outStr = "";
try
{
String format1 = "ISO-8859-1";
String format2 = "UTF-8";
GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(str.getBytes(format2)));
BufferedReader bf = new BufferedReader(new InputStreamReader(gis, format2));
String line;
while ((line = bf.readLine()) != null)
{
outStr += line;
}
System.out.println("Output String lenght : " + outStr.length());
} catch (Exception e)
{
e.printStackTrace();
}
return outStr;
}
I get this error when I call with a string return from encStage1:
public String encIDData(String idData)
{
String tst = "A simple test string";
System.out.println("Enc 0: " + tst);
String stg1 = encStage1(tst);
System.out.println("Enc 1: " + toHex(stg1));
String dec1 = decStage3(stg1);
System.out.println("unzip: " + toHex(dec1));
}
Output/Error:
Enc 0: A simple test string
String length : 20
Output String lenght : 40
Enc 1: 1fefbfbd0800000000000000735428efbfbdefbfbd2defbfbd495528492d2e51282e29efbfbdefbfbd4b07005aefbfbd21efbfbd14000000
Input String length : 40
java.io.IOException: Not in GZIP format
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:137)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)

A small error is:
gzip.write(str.getBytes());
takes the default platform encoding, which on Windows will never be ISO-8859-1. Better:
gzip.write(str.getBytes(format1));
You could consider taking "Cp1252", Windows Latin-1 (for some European languages), instead of "ISO-8859-1", Latin-1. That adds comma like quotes and such.
The major error is converting the compressed bytes to a String. Java separates binary data (byte[], InputStream, OutputStream) from text (String, char, Reader, Writer) which internally is always kept in Unicode. A byte sequence does not need to be valid UTF-8. You might get away by converting the bytes as a single byte encoding (ISO-8859-1 for instance).
The best way would be
gzip.write(str.getBytes(StandardCharsets.UTF_8));
So you have full Unicode, every script may be combined.
And uncompressing to a ByteArrayOutputStream and new String(baos.toByteArray(), StandardCharsets.UTF_8).
Using BufferedReader on an InputStreamReader with UTF-8 is okay too, but a readLine throws away the newline characters
outStr += line + "\r\n"; // Or so.
Clean answer:
public static byte[] encStage1(String str) throws IOException
{
try (ByteArrayOutputStream out = new ByteArrayOutputStream())
{
try (GZIPOutputStream gzip = new GZIPOutputStream(out))
{
gzip.write(str.getBytes(StandardCharsets.UTF_8));
}
return out.toByteArray();
//return out.toString(StandardCharsets.ISO_8859_1);
// Some single byte encoding
}
}
public static String decStage3(byte[] str) throws IOException
{
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(str)))
{
int b;
while ((b = gis.read()) != -1) {
baos.write((byte) b);
}
}
return new String(baos.toByteArray(), StandardCharset.UTF_8);
}

usage of toString/getBytes for encoding/decoding is a wrong way. try to use something like BASE64 encoding for this purpose (java.util.Base64 in jdk 1.8)
as a proof try this simple test:
import org.testng.annotations.Test;
import java.io.ByteArrayOutputStream;
import static org.testng.Assert.assertEquals;
public class SimpleTest {
#Test
public void test() throws Exception {
final String CS = "utf-8";
byte[] b0 = {(byte) 0xff};
ByteArrayOutputStream out = new ByteArrayOutputStream();
out.write(b0);
out.close();
byte[] b1 = out.toString(CS).getBytes(CS);
assertEquals(b0, b1);
}
}

URL decoding in Java for non-ASCII characters

I'm trying in Java to decode URL containing % encoded characters
I've tried using java.net.URI class to do the job, but it's not always working correctly.
String test = "https://fr.wikipedia.org/wiki/Fondation_Alliance_fran%C3%A7aise";
URI uri = new URI(test);
System.out.println(uri.getPath());
For the test String "https://fr.wikipedia.org/wiki/Fondation_Alliance_fran%C3%A7aise", the result is correct "/wiki/Fondation_Alliance_française" (%C3%A7 is correctly replaced by ç).
But for some other test strings, like "http://sv.wikipedia.org/wiki/Anv%E4ndare:Lsjbot/Statistik#Drosophilidae", it gives an incorrect result "/wiki/Anv�ndare:Lsjbot/Statistik" (%E4 is replaced by � instead of ä).
I did some testing with getRawPath() and URLDecoder class.
System.out.println(URLDecoder.decode(uri.getRawPath(), "UTF8"));
System.out.println(URLDecoder.decode(uri.getRawPath(), "ISO-8859-1"));
System.out.println(URLDecoder.decode(uri.getRawPath(), "WINDOWS-1252"));
Depending on the test String, I get correct results with different encodings:
For %C3%A7, I get a correct result with "UTF-8" encoding as expected, and incorrect results with "ISO-8859-1" or "WINDOWS-1252" encoding
For %E4, it's the opposite.
For both test URL, I get the correct page if I put them in Chrome address bar.
How can I correctly decode the URL in all situations ?
Thanks for any help
==== Answer ====
Thanks to the suggestions in McDowell answer below, it now seems to work. Here's what I now have as code:
private static void appendBytes(ByteArrayOutputStream buf, String data) throws UnsupportedEncodingException {
byte[] b = data.getBytes("UTF8");
buf.write(b, 0, b.length);
}
private static byte[] parseEncodedString(String segment) throws UnsupportedEncodingException {
ByteArrayOutputStream buf = new ByteArrayOutputStream(segment.length());
int last = 0;
int index = 0;
while (index < segment.length()) {
if (segment.charAt(index) == '%') {
appendBytes(buf, segment.substring(last, index));
if ((index < segment.length() + 2) &&
("ABCDEFabcdef0123456789".indexOf(segment.charAt(index + 1)) >= 0) &&
("ABCDEFabcdef0123456789".indexOf(segment.charAt(index + 2)) >= 0)) {
buf.write((byte) Integer.parseInt(segment.substring(index + 1, index + 3), 16));
index += 3;
} else if ((index < segment.length() + 1) &&
(segment.charAt(index + 1) == '%')) {
buf.write((byte) '%');
index += 2;
} else {
buf.write((byte) '%');
index++;
}
last = index;
} else {
index++;
}
}
appendBytes(buf, segment.substring(last));
return buf.toByteArray();
}
private static String parseEncodedString(String segment, Charset... encodings) {
if ((segment == null) || (segment.indexOf('%') < 0)) {
return segment;
}
try {
byte[] data = parseEncodedString(segment);
for (Charset encoding : encodings) {
try {
if (encoding != null) {
return encoding.newDecoder().
onMalformedInput(CodingErrorAction.REPORT).
decode(ByteBuffer.wrap(data)).toString();
}
} catch (CharacterCodingException e) {
// Incorrect encoding, try next one
}
}
} catch (UnsupportedEncodingException e) {
// Nothing to do
}
return segment;
}

Anv%E4ndare
As PopoFibo says this is not a valid UTF-8 encoded sequence.
You can do some tolerant best-guess decoding:
public static String parse(String segment, Charset... encodings) {
byte[] data = parse(segment);
for (Charset encoding : encodings) {
try {
return encoding.newDecoder()
.onMalformedInput(CodingErrorAction.REPORT)
.decode(ByteBuffer.wrap(data))
.toString();
} catch (CharacterCodingException notThisCharset_ignore) {}
}
return segment;
}
private static byte[] parse(String segment) {
ByteArrayOutputStream buf = new ByteArrayOutputStream();
Matcher matcher = Pattern.compile("%([A-Fa-f0-9][A-Fa-f0-9])")
.matcher(segment);
int last = 0;
while (matcher.find()) {
appendAscii(buf, segment.substring(last, matcher.start()));
byte hex = (byte) Integer.parseInt(matcher.group(1), 16);
buf.write(hex);
last = matcher.end();
}
appendAscii(buf, segment.substring(last));
return buf.toByteArray();
}
private static void appendAscii(ByteArrayOutputStream buf, String data) {
byte[] b = data.getBytes(StandardCharsets.US_ASCII);
buf.write(b, 0, b.length);
}
This code will successfully decode the given strings:
for (String test : Arrays.asList("Fondation_Alliance_fran%C3%A7aise",
"Anv%E4ndare")) {
String result = parse(test, StandardCharsets.UTF_8,
StandardCharsets.ISO_8859_1);
System.out.println(result);
}
Note that this isn't some foolproof system that allows you to ignore correct URL encoding. It works here because v%E4n - the byte sequence 76 E4 6E - is not a valid sequence as per the UTF-8 scheme and the decoder can detect this.
If you reverse the order of the encodings the first string can happily (but incorrectly) be decoded as ISO-8859-1.
Note: HTTP doesn't care about percent-encoding and you can write a web server that accepts http://foo/%%%%% as a valid form. The URI spec mandates UTF-8 but this was done retroactively. It is really up to the server to describe what form its URIs should be and if you have to handle arbitrary URIs you need to be aware of this legacy.
I've written a bit more about URLs and Java here.

Java SHA512 digest output differs from PHP script

Can someone figure out why the output of these (php and java) snippets of code don't return the same SHA512 for the same input?
$password = 'whateverpassword';
$salt = 'ieerskzcjy20ec8wkgsk4cc8kuwgs8g';
$salted = $password.'{'.$salt.'}';
$digest = hash('sha512', $salted, true);
echo "digest: ".base64_encode($digest);
for ($i = 1; $i < 5000; $i++) {
$digest = hash('sha512', $digest.$salted, true);
}
$encoded_pass = base64_encode($digest);
echo $encoded_pass;
This is the code on the android application:
public String processSHA512(String pw, String salt, int rounds)
{
try {
md = MessageDigest.getInstance("SHA-512");
} catch (NoSuchAlgorithmException e) {
e.printStackTrace();
throw new RuntimeException("No Such Algorithm");
}
String result = hashPw(pw, salt, rounds);
System.out.println(result);
return result;
}
private static String hashPw(String pw, String salt, int rounds) {
byte[] bSalt;
byte[] bPw;
String appendedSalt = new StringBuilder().append('{').append(salt).append('}').toString();
try {
bSalt = appendedSalt.getBytes("ISO-8859-1");
bPw = pw.getBytes("ISO-8859-1");
} catch (UnsupportedEncodingException e) {
throw new RuntimeException("Unsupported Encoding", e);
}
byte[] digest = run(bPw, bSalt);
Log.d(LCAT, "first hash: " + Base64.encodeBytes(digest));
for (int i = 1; i < rounds; i++) {
digest = run(digest, bSalt);
}
return Base64.encodeBytes(digest);
}
private static byte[] run(byte[] input, byte[] salt) {
md.update(input);
return md.digest(salt);
}
The library for base64 encoding is this: base64lib
This java code is actually some modified code I found around another question in StackOverflow.
Although the Android code is running fine it doesn't match with the output from the php script. It doesn't even match the first hash!
Note 1: On php hash('sha512',$input, $raw_output) returns raw binary output
Note 2: On java I tried to change the charset (UTF-8, ASCII) but it also didn't work.
Note 3: The code from the server can not be changed, so I would appreciate any answer regarding how to change my android code.

The first hash should be the same on the server and in Java. But then in the loop what gets appended to the digest is password{salt} in the PHP code, but only {salt} in the Java code.

For the lazy ones, one example better than a thousand words ;). I finally understood what was happening. The method update appends bytes to the digest, so when you append $password.{$salt} is the same as doing mda.update(password bytes) and the mda.digest("{$salt}" bytes. I do that answer because I was going crazy finding why it was not working and it was all in this answer.
Thanks guys.
This is the example that works in a Java Server:
public static String hashPassword(String password, String salt) throws Exception {
String result = password;
String appendedSalt = new StringBuilder().append('{').append(salt).append('}').toString();
String appendedSalt2 = new StringBuilder().append(password).append('{').append(salt).append('}').toString();
if(password != null) {
//Security.addProvider(new BouncyCastleProvider());
MessageDigest mda = MessageDigest.getInstance("SHA-512");
byte[] pwdBytes = password.getBytes("UTF-8");
byte[] saltBytes = appendedSalt.getBytes("UTF-8");
byte[] saltBytes2 = appendedSalt2.getBytes("UTF-8");
byte[] digesta = encode(mda, pwdBytes, saltBytes);
//result = new String(digesta);
System.out.println("first hash: " + new String(Base64.encode(digesta),"UTF-8"));
for (int i = 1; i < ROUNDS; i++) {
digesta = encode(mda, digesta, saltBytes2);
}
System.out.println("last hash: " + new String(Base64.encode(digesta),"UTF-8"));
result = new String(Base64.encode(digesta));
}
return result;
}
private static byte[] encode(MessageDigest mda, byte[] pwdBytes,
byte[] saltBytes) {
mda.update(pwdBytes);
byte [] digesta = mda.digest(saltBytes);
return digesta;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Issue with java, String.getBytes method - java

Related

what this android function returns

convert byte array to string in java

Java java.io.IOException: Not in GZIP format

URL decoding in Java for non-ASCII characters

Java SHA512 digest output differs from PHP script

Categories

Resources