The following code(using commons codec Base64):
byte[] a = Hex.decodeHex("9349c513ed080dab".toCharArray());
System.out.println(Base64.encodeBase64URLSafeString(a));
System.out.println(Base64.encodeBase64String(a));
gives the following output:
k0nFE-0IDas //should be k0nFE-0IDas=
k0nFE+0IDas=
Base64.encodeBase64URLSafeString(a) returns k0nFE-0IDas instead of k0nFE-0IDas=. Why is this happening?
Why is this happening?
Because that's what it's documented to do:
Note: no padding is added.
The = characters at the end of a base64 string are called padding. They're used to make sure that the final string's length is a multiple of 4 characters - but they're not really required, in terms of information theory, so it's reasonable to remove them so long as you then convert the data back to binary using a method which doesn't expect padding. The Apache Codec Base64 class claims it transparently handles both regular and URL-safe base64, so presumably does handle a lack of padding.
Related
I understand the need to specify encoding when converting a byte[] to String in Java using appropriate format i.e. hex, base64 etc. because the default encoding may not be same in different platforms. But I am not sure I understand the same while converting a string to bytes. So this question, is to wrap my head around the concept of need to specify character set while transferring Strings over web.
Consider foll. code in Java
Note: The String in example below is not read from a file, another resource, it is created.
1: String message = "a good message";
2: byte[] encryptedMsgBytes = encrypt(key,,message.getBytes());
3: String base64EncodedMessage = new String (Base64.encodeBase64(encryptedMsgBytes));
I need to send this over the web using Http Post & will be received & processed (decrypted, converted from base64 etc.) at other end.
Based on reading up article, the recommended practice is to use .getBytes("utf-8")
on line 2, i.e message.getBytes("UTF-8")
& the similar approach is recommended on other end to process the data as shown on line 7 below
4: String base64EncodedMsg =
5: byte[] base64EncodedMsgBytes = Base64.encodeBase64(base64EncodedMsg));
6: byte[] decryptedMsgBytes = decrypt(aesKey, "AES", Base64.decodeBase64(base64EncodedMessage);
7: String originalMsg = new String(decryptedMsgBytes, "UTF-8");
Given that Java's internal in-memory string representation is utf-16. ( excluding: UTF8 during serialization & file saving) , do we really need this if the decryption was also done in Java (Note: This is not a practical assumption, just for sake of discussion to understand the need to mention encoding)? Since, in JVM the String 'message' on line 1 was represented using UTF-16, wouldn't the .getBytes() method without specifying the encoding always return the UTF-16 bytes ? or is that incorrect and .getBytes() method without specifying the encoding always returns raw bytes ? Since the internal representation is UTF-16 why would the default character encoding on a particular JVM matter ?
If indeed it returns UTF-16, then is there is need to use new String(decryptedMsgBytes, "UTF-8") on other end ?
wouldn't the .getBytes() method without specifying the encoding
always return the UTF-16 bytes ?
This is incorrect. Per the Javadoc, this uses the platform's default charset:
Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array.
I am trying to debug a flaky Java application. I can't (easily) debug it in the only way I would know how - by putting a log statement in it and re-compiling. Then checking the logs.
(I don't have access to a reliable set of source code). And I'm not a Java developer.
The actual question:
If I did this:
str = URLDecoder.decode("%25C3%2596");
What would be in str?
Would it realize that this is double-encoded and handle that i.e. turn it into %C3%96 - and then decode that? (Which decodes into a German Umlaut).
Thanks
--Justin Wyllie
From the Java API URLDecoder:
A sequence of the form "%xy" will be treated as representing a byte where xy is the two-digit hexadecimal representation of the 8 bits.
So my guess would be - most likely not.
You could however call the decode method twice.
str = URLDecoder.decode(URLDecoder.decode("%25C3%2596"));
str = URLDecoder.decode("%25C3%2596");
The result of this operation is system-dependent (the reason the method is deprecated.)
The result of this call:
str = URLDecoder.decode("%25C3%2596", "UTF-8");
...would be %C3%96 which is Ö in percent-encoded UTF-8. The API does not try to recursively decode any percent-signs.
I have a blowfish encryption script in PHP and JAVA vice versa that was working fine until today when I came across a problem.
The same content is encrypted differently in Java vs PHP by only 2 chars, which is really weird.
PHP
wTHzxfxLHdMm/JMFnoh0hciS/JADvFFg
Java
wTHzxfxLHdMm/JMFnoh0hciS/D8DvFFg
-------------------------^^
As you see those two positions do not match. Unfortunately the value is a real email address and I can't share it. Also I was not able to reproduce the problem with other few values I've tested. I've tried changing Base64 encode classes on Java, and that neither helped.
The source code for PHP is here, and for Java is here.
What could I do to resolve this problem?
Let's have a look at your Java code:
String c = new String(Test.encrypt((new String("thevalue")).getBytes(),
(new String("mykey")).getBytes()));
...
System.out.println("Base64 encoded String:" +
new sun.misc.BASE64Encoder().encode(c.getBytes()));
What you are doing here is:
Convert the plaintext string to bytes, using the system's default encoding
convert the key to bytes, using the system's default encoding
encrypt the bytes
convert the encrypted bytes back to a string, using the system's default encoding
convert the encrypted string back to bytes, using the system's default encoding
encode these encrypted bytes using Base64.
The problem is in step 4. It assumes that an arbitrary byte array represents a string in your system's default encoding, and encoding this string back gives the same byte[]. This is valid for some encodings (the ISO-8859 series, for example), but not for others. In Java, when some byte (or byte sequence) is not representable in the given encoding, it will be replaced by some other character, which later for reconverting will be mapped to byte 63 (ASCII ?). Actually, the documentation even says:
The behavior of this constructor when the given bytes are not valid in the default charset is unspecified.
In your case, there is no reason to do this at all - simply use the bytes which your encrypt method outputs directly to convert them to Base64.
byte[] encrypted = Test.encrypt("thevalue".getBytes(),
"mykey".getBytes());
System.out.println("Base64 encoded String:"+ new sun.misc.BASE64Encoder().encode(encrypted));
(Also note that I removed the superfluous new String("...") constructor calls here, though this does not relate to your problem.)
The point to remember: Never ever convert an arbitrary byte[], which did not come from encoding a string, to a string. Output of an encryption algorithm (and most other cryptographic algorithms, except decryption) certainly belongs to the category of data which should not be converted to a string.
And never ever use the System's default encoding, if you want portable programs.
Your code seems right to me.
It looks like you have a trailing white space in the input to one of these programs, and it is only one. I'll tell you why:
Each of these 4-char blocks represent 3 characters in the encrypted string. Th different part (JA and D8 in the 7th block) actually come from a single different character.
wTHz xfxL HdMm /JMF noh0 hciS /JAD vFFg
wTHz xfxL HdMm /JMF noh0 hciS /D8D vFFg
If I have got it right your email address is 19 characters long. The 20th character in one of your input strings is a white space.
Question: Have you tried the associated PHP decryption library to decrypt the PHP generated encrypted text? Have you tried the associated JAVA decryption library to decrypt the JAVA encrypted text?
If both produce differing outputs, then one MUST fail decrypting.
Is that one PHP, or Java?
Whichever one it is -- I would try to duplicate another such failure with a publicly shareable string... give that string as a unit test -- to the developer or developers that created the encrypt/decrypt code in the language that the round-trip encrypt/decrypt fails in.
Then... wait for them to fix it.
Not sure of any faster solutions -- except maybe change encryption/decryption library providers... or roll your own...
I want to compress/transform a string as new string.
i.e.:
input string:
USERNAME/REGISTERID
output string after compress:
<some-string-in-UTF8-format>
output string after decompress:
USERNAME/REGISTERID
There are some compress or hash method for this transformation?
I prefer some solution using Java or an algorithm with basic process steps.
I already read and try to use Huffman transformation, but the compressed output are composed by bytes outbound UTF-8 charset.
You could use ZipOutputStream.
ByteArrayOutputStream result = new ByteArrayOutputStream();
new ZipOutputStream(result).write("myString".getBytes());
byte[] bytes = result.toByteArray();
You just have to figure out the right string encoding. This case be done with a Base64 representation.
Take a look at Base64, commons-codec, etc.
Commons-code provides a very simple Base64 class to use.
You can't use a hash function as hashing functions are typically meant to be one-way only: i.e. given a MD5 or SHA1 hash, you should not be able to decode it to find out what the source message was.
See iconv and mb_convert_encoding. For encoding, maybe consider base64_encode.
if you have database ids for your identifiers as your names suggests, why not using this number as encoding ? (put it as string if you like).
You shouldn't hope to get better compression using compression algorithms as they all need some headers and the header size by itself is probably longer than your input string.
It looks like someone is asking you to obfuscate username/password combinations. This is probably not a good idea, since it suggests security where there is none. You might as well implement a ROT13 encryption for this and use double ROT13 to decrypt.
Currently when I make a signature using java.security.signature, it passes back a string.
I can't seem to use this string since there are special characters that can only be seen when i copy the string into notepad++, from there if I remove these special characters I can use the remains of the string in my program.
In notepad they look like black boxes with the words ACK GS STX SI SUB ETB BS VT
I don't really understand what they are so its hard to tell how to get ride of them.
Is there a function that i can run to remove these and potentially similar characters?
when i use the base64 class supplied in the posts, i cant go back to a signature
System.out.println(signature);
String base64 = Base64.encodeBytes(sig);
System.out.println(base64);
String sig2 = new String (Base64.decode(base64));
System.out.println(sig2);
gives the output
”zÌý¥y]žd”xKmËY³ÕN´Ìå}ÏBÊNÈ›`Αrp~jÖüñ0…Rõ…•éh?ÞÀ_û_¥ÂçªsÂk{6H7œÉ/”âtTK±Ï…Ã/Ùê²
lHrM/aV5XZ5klHhLbctZs9VOtMzlfc9Cyk7Im2DOkXJwfmoG1vzxMIVS9YWV6Wg/HQLewF/7X6XC56pzwmt7DzZIN5zJL5TidFRLsc+Fwy/Z6rIaNA2uVlCh3XYkWcu882tKt2RySSkn1heWhG0IeNNfopAvbmHDlgszaWaXYzY=
[B#15356d5
The odd characters are there because cryptographic signatures produce bytes rather than strings. Consequently if you want a printable representation you should Base64 encode it (here's a public domain implementation for Java).
Stripping the non-printing characters from a cryptographic signature will render it useless as you will be unable to use it for verification.
Update:
[B#15356d5
This is the result of toString called on a byte array. "[" means array, "B" means byte and "15356d5" is the address of the array. You should be passing the array you get out of decode to [Signature.verify](http://java.sun.com/j2se/1.4.2/docs/api/java/security/Signature.html#verify(byte[])).
Something like:
Signature sig = new Signature("dsa");
sig.initVerify(key);
sig.verify(Base64.decode(base64)); // <-- bytes go here
How are you "making" the signature? If you use the sign method, you get back a byte array, not a string. That's not a binary representation of some text, it's just arbitrary binary data. That's what you should use, and if you need to convert it into a string you should use a base64 conversion to avoid data corruption.
If I understand your problem correctly, you need to get rid of characters with code below 32, except maybe char 9 (tab), char 10 (new line) and char 13 (return).
Edit: I agree with the others as handling a crypto output like this is not what you usually want.