I am working on the Matasano CryptoChallenge, and the first one is to create a Hex to Base 64 converter. I honestly don't know how to continue from here. My code:
public class HexToBase64 {
public static void main(String[] args) {
// String hex = "49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d";
String hex = "DA65A";
convertHexTo64(hex);
}
public static String convertHexTo64(String hex) {
//convert each letter in the hex string to a 4-digit binary string to create a binary representation of the hex string
StringBuilder binary = new StringBuilder();
for (int i = 0; i < hex.length(); i++) {
int dec = Integer.parseInt(hex.charAt(i) + "", 16);
StringBuilder bin = new StringBuilder(Integer.toBinaryString(dec));
while(bin.length() < 4){
bin.insert(0,'0');
}
binary.append(bin);
}
//now take 6 bits at a time and convert to a single b64 digit to create the final b64 representation
StringBuilder b64 = new StringBuilder();
for (int i = 0; i < binary.length(); i++) {
String temp = binary.substring(i, i+5);
int dec = Integer.parseInt(temp, 10);
//convert dec to b64 with the lookup table here then append to b64
}
return b64.toString();
}
}
So after I separate the binary 6 bits at a time and convert to decimal, how do I map the decimal number to the corresponding digit in b64? Would a Hashmap/Hashtable implementation be efficient?
Also, this algorithm displays how I would go about doing the conversion by hand. Is there a better way of doing this? I am looking for a way to convert that will take a reasonable amount of time, so time, and implicitly efficiency, is relevant.
Thank you for your time
EDIT: And the page also mentions that "Always operate on raw bytes, never on encoded strings. Only use hex and base64 for pretty-printing." What does that mean exactly?
Extracted from this Stack Overflow post, which references Apache Commons Codec
byte[] decodedHex = Hex.decodeHex(hex);
byte[] encodedHexB64 = Base64.codeBase64(decodedHex);
String hex = "00bc9d2a05ef06c79a6e972f8a36737e";
byte[] decodedHex = org.apache.commons.codec.binary.Hex.decodeHex(hex.toCharArray());
String result = Base64.encodeBase64String(decodedHex);
System.out.println("==> " + result);
Related
Suppose there's a hex string of emoji character like "1f1e81f1f3", it's unwell-formed hex string of code point of an emoji character, and it's supposed to be two string like 1f1e8 1f1f3
I'm using org.apache.commons.codec.binary.Hex to decode hex string, but obviously Hex need the length of input string be even, so I need to make the hex string in zero padding style like "01f1e801f1f3".
Currently, I simply replace "1f" with "01f", so far so good, but since an emoji glyph may contains a sequence of unicode characters, so
Is it safe to simply replace "1f" with "01f" ?
If it's not safe, how to decode such hex string safely/properly and restore/translate them to correct emoji character/character_sequence? It seems I need to implement a custom UTF16BE decoder?
Background
This hex string of emoji character is stripped from "<span class="emoji emojiXXXXXXXXXX"></span>" string, it's a text message retrieved from a popular IM software via unofficial HTTP API.
I ends up with writing a small function to restore emoji characters.
Basic procedure:
Make a pointer to the start of the hex string.
Search from the the pointer position of the hex string,
If it's starts with "1f", then pad three zeroes before "1f", store it to a new hex string, then pointer step to next 5th position. Otherwise, no zero padding is made, store the sub string to a new hex string, and pointer step to the next 4th position.
Decode the new hex string to byte array.
Create new String using UTF_32BE or UTF_16BE character encoding from the byte array.
Loop to step 2, until end of the hex string.
It works, but it's not perfect, it could introduce bug if
One character of emoji character sequence is located in supplementary character
And
It's hex string does not starts with "1f", or the length of it's hex string is not 5.
Code snippet:
import java.util.*;
import java.util.regex.*;
import org.apache.commons.codec.*;
import org.apache.commons.codec.binary.Hex;
import org.apache.commons.lang3.*;
public static final Charset UTF_32BE = Charset.forName ("UTF-32BE");
public static final String REGEXP_FindTransformedEmojiHexString = "<span class=\"emoji emoji(\\p{XDigit}+)\"></span>";
public static final Pattern PATTERN_FindTransformedEmojiHexString = Pattern.compile (REGEXP_FindTransformedEmojiHexString, Pattern.CASE_INSENSITIVE);
public static String RestoreEmojiCharacters (String sContent)
{
bMatched = true;
String sEmojiHexString = matcher.group(1);
Hex hex = new Hex (StandardCharsets.ISO_8859_1);
try
{
for (int i=0; i<sEmojiHexString.length ();)
{
String sEmoji = null;
Charset charset = null;
String sSingleEmojiGlyphHexString = null;
String sStartString = StringUtils.substring (sEmojiHexString, i, i+2);
if (StringUtils.startsWithIgnoreCase (sStartString, "1f"))
{
sSingleEmojiGlyphHexString = "000" + StringUtils.substring (sEmojiHexString, i, i+5);
i += 5;
charset = UTF_32BE;
}
else
{
sSingleEmojiGlyphHexString = StringUtils.substring (sEmojiHexString, i, i+4);
i += 4;
charset = StandardCharsets.UTF_16BE;
}
byte[] arrayEmoji = null;
arrayEmoji = (byte[])hex.decode (sSingleEmojiGlyphHexString);
sEmoji = new String (arrayEmoji, charset);
matcher.appendReplacement (sbReplace, sEmoji);
}
}
catch (DecoderException e)
{
e.printStackTrace();
}
}
matcher.appendTail (sbReplace);
if (bMatched)
sContent = sbReplace.toString ();
return sContent;
}
I need to save a binary stream, that I will later convert to text. Since binary streams don't exist in Java, I just saved my 'bits' in a stream of 'bytes' just to test my code. Now I have a stream of bytes where 1 bit is encoded on 1 byte.
byte [] stream = new byte [1500];
int str = 0;
byte [] data = new byte [1];
for (int i = 0; i<original.cols(); i++)
{
for (int j= 0; j<original.rows(); j++)
{
original.get(j,i, data);
if ((data[0]==0))
{
stream [str]=0;
str = str+1;
}
else
{
stream [str]=1;
str = str+1;
}
}
}
Can anyone help me to properly save my bits encoded in a stream of bytes, where 1 byte would represent 8 bits ?
A java.util.BitSet contains helper methods for dealing with raw bits, and conversions to and from byte arrays. In the following example, bytes will contain a single byte:
int numberOfBits = 8;
BitSet bitSet = new BitSet(numberOfBits);
bitSet.set(3, true);
bitSet.set(7, true);
byte[] bytes = bitSet.toByteArray();
I have generated SipHash for 1 string and 2 long values (for many such combinations of string and long). I used -
Hasher hash = Hashing.sipHash24().newHasher().putUnencodedChars("abcd").putLong(123).putLong(123);
Now I converted this hash to string using -
String hashString = hash.hash().toString();
But, I wanted the bytes array of the string, Could there be any way, so that I am able to get the bytes array from this string same as to the one I would have got from byte[] hashBytes = hash.hash().asBytes(); I wanted to convert the string I had got from these hashes to bytes array.
Actually I realised that the bytes array was using only 8 bytes of space for the siphash, where as the length of string was 18 bytes. So , I guess storing the hash as bytes array would be more optimised.
BaseEncoding.base16().lowerCase().decode(string)
should convert HashCode.toString() back into the byte array you would've gotten from asBytes().
You can parse the string back into a HashCode instance with HashCode.fromString(string). Then you can call .asBytes() on the HashCode instance to get back a copy of the underlying byte[].
So basically you want:
byte[] bytes = HashCode.fromString(string).asBytes();
Here is the code to get bytes array from string -
public static byte[] getBytes(String hashString) {
final byte[] bytes = new byte[8];
HashMap<Character, String> bin = new HashMap<>();
bin.put('0', "0000");
bin.put('1', "0001");
bin.put('2', "0010");
bin.put('3', "0011");
bin.put('4', "0100");
bin.put('5', "0101");
bin.put('6', "0110");
bin.put('7', "0111");
bin.put('8', "1000");
bin.put('9', "1001");
bin.put('a', "1010");
bin.put('b', "1011");
bin.put('c', "1100");
bin.put('d', "1101");
bin.put('e', "1110");
bin.put('f', "1111");
for (int i = 0; i < 16 && i < hashString.length(); i += 2) {
final BitSet bitset = new BitSet(8);
String byteBinary = bin.get(hashString.charAt(i)) + bin.get(hashString.charAt(i + 1));
for (int j = 0; j<8; j++) {
if (byteBinary.charAt(j) == '1')
bitset.set(7-j, true);
else
bitset.set(7-j, false);
}
bytes[i/2] = bitset.toByteArray()[0];
//System.out.println(byteBinary);
}
return bytes;
}
How can UTF-8 value like =D0=93=D0=B0=D0=B7=D0=B5=D1=82=D0=B0 be converted in Java?
I have tried something like:
Character.toCodePoint((char)(Integer.parseInt("D0", 16)),(char)(Integer.parseInt("93", 16));
but it does not convert to a valid code point.
That string is an encoding of bytes in hex, so the best way is to decode the string into a byte[], then call new String(bytes, StandardCharsets.UTF_8).
Update
Here is a slightly more direct version of decoding the string, than provided by "sstan" in another answer. Of course both versions are good, so use whichever makes you more comfortable, or write your own version.
String src = "=D0=93=D0=B0=D0=B7=D0=B5=D1=82=D0=B0";
assert src.length() % 3 == 0;
byte[] bytes = new byte[src.length() / 3];
for (int i = 0, j = 0; i < bytes.length; i++, j+=3) {
assert src.charAt(j) == '=';
bytes[i] = (byte)(Character.digit(src.charAt(j + 1), 16) << 4 |
Character.digit(src.charAt(j + 2), 16));
}
String str = new String(bytes, StandardCharsets.UTF_8);
System.out.println(str);
Output
Газета
In UTF-8, a single character is not always encoded with the same amount of bytes. Depending on the character, it may require 1, 2, 3, or even 4 bytes to be encoded. Therefore, it's definitely not a trivial matter to try to map UTF-8 bytes yourself to a Java char which uses UTF-16 encoding, where each char is encoded using 2 bytes. Not to mention that, depending on the character (code point > 0xffff), you may also have to worry about dealing with surrogate characters, which is just one more complication that you can easily get wrong.
All this to say that Andreas is absolutely right. You should focus on parsing your string to a byte array, and then let the built-in libraries convert the UTF-8 bytes to a Java string for you. From a Java String, it's trivial to extract the Unicode code points if that's what you want.
Here is some sample code that shows one way this can be achieved:
public static void main(String[] args) throws Exception {
String src = "=D0=93=D0=B0=D0=B7=D0=B5=D1=82=D0=B0";
// Parse string into hex string tokens.
String[] tokens = Arrays.stream(src.split("="))
.filter(s -> s.length() != 0)
.toArray(String[]::new);
// Convert the hex string representations to a byte array.
byte[] utf8bytes = new byte[tokens.length];
for (int i = 0; i < utf8bytes.length; i++) {
utf8bytes[i] = (byte) Integer.parseInt(tokens[i], 16);
}
// Convert UTF-8 bytes to Java String.
String str = new String(utf8bytes, StandardCharsets.UTF_8);
// Display string + individual unicode code points.
System.out.println(str);
str.codePoints().forEach(System.out::println);
}
Output:
Газета
1043
1072
1079
1077
1090
1072
I am surfing around from quite some time for a proper solution for the above question.
I could not find the solution for the conversion/encoding in Java language.
I need to encode a hex string into base 36 formatted string.
For example, these are sample inputs and outputs.
ID and reversed B36 encoding
3028354D8202028000000000,CHL58FYDITHJ83VN0G1
3028354D8202028000000001,DHL58FYDITHJ83VN0G1
3028354D8202028000000002,EHL58FYDITHJ83VN0G1
Suggestions are highly appreciated.
Have you tried:
String convertHexToBase36(String hex) {
BigInteger big = new BigInteger(hex, 16);
return big.toString(36);
}
Thanks #rossum for your help and patience.
I could now do a conversion from hex to base36 and vice-versa as per my requirements.
public static String convertHexToBase36(String hex)
{
BigInteger big = new BigInteger(hex, 16);
StringBuilder sb = new StringBuilder(big.toString(36));
return sb.reverse().toString();
}
public static String convertBase36ToHex(String b36)
{
StringBuilder sb = new StringBuilder(b36);
BigInteger base = new BigInteger(sb.reverse().toString(), 36);
return base.toString(16);
}
I just did reverse B36 encoding.
Loads of applause to #rossum for his patience and help.