How to do bitwise XOR operation to two strings in java.
You want something like this:
import sun.misc.BASE64Decoder;
import sun.misc.BASE64Encoder;
import java.io.IOException;
public class StringXORer {
public String encode(String s, String key) {
return base64Encode(xorWithKey(s.getBytes(), key.getBytes()));
}
public String decode(String s, String key) {
return new String(xorWithKey(base64Decode(s), key.getBytes()));
}
private byte[] xorWithKey(byte[] a, byte[] key) {
byte[] out = new byte[a.length];
for (int i = 0; i < a.length; i++) {
out[i] = (byte) (a[i] ^ key[i%key.length]);
}
return out;
}
private byte[] base64Decode(String s) {
try {
BASE64Decoder d = new BASE64Decoder();
return d.decodeBuffer(s);
} catch (IOException e) {throw new RuntimeException(e);}
}
private String base64Encode(byte[] bytes) {
BASE64Encoder enc = new BASE64Encoder();
return enc.encode(bytes).replaceAll("\\s", "");
}
}
The base64 encoding is done because xor'ing the bytes of a string may not give valid bytes back for a string.
Note: this only works for low characters i.e. below 0x8000, This works for all ASCII characters.
I would do an XOR each charAt() to create a new String. Like
String s, key;
StringBuilder sb = new StringBuilder();
for(int i = 0; i < s.length(); i++)
sb.append((char)(s.charAt(i) ^ key.charAt(i % key.length())));
String result = sb.toString();
In response to #user467257's comment
If your input/output is utf-8 and you xor "a" and "æ", you are left with an invalid utf-8 string consisting of one character (decimal 135, a continuation character).
It is the char values which are being xor'ed, but the byte values and this produces a character whichc an be UTF-8 encoded.
public static void main(String... args) throws UnsupportedEncodingException {
char ch1 = 'a';
char ch2 = 'æ';
char ch3 = (char) (ch1 ^ ch2);
System.out.println((int) ch3 + " UTF-8 encoded is " + Arrays.toString(String.valueOf(ch3).getBytes("UTF-8")));
}
prints
135 UTF-8 encoded is [-62, -121]
Pay attention:
A Java char corresponds to a UTF-16 code unit, and in some cases two consecutive chars (a so-called surrogate pair) are needed for one real Unicode character (codepoint).
XORing two valid UTF-16 sequences (i.e. Java Strings char by char, or byte by byte after encoding to UTF-16) does not necessarily give you another valid UTF-16 string - you may have unpaired surrogates as a result. (It would still be a perfectly usable Java String, just the codepoint-concerning methods could get confused, and the ones that convert to other encodings for output and similar.)
The same is valid if you first convert your Strings to UTF-8 and then XOR these bytes - here you quite probably will end up with a byte sequence which is not valid UTF-8, if your Strings were not already both pure ASCII strings.
Even if you try to do it right and iterate over your two Strings by codepoint and try to XOR the codepoints, you can end up with codepoints outside the valid range (for example, U+FFFFF (plane 15) XOR U+10000 (plane 16) = U+1FFFFF (which would the last character of plane 31), way above the range of existing codepoints. And you could also end up this way with codepoints reserved for surrogates (= not valid ones).
If your strings only contain chars < 128, 256, 512, 1024, 2048, 4096, 8192, 16384, or 32768, then the (char-wise) XORed strings will be in the same range, and thus certainly not contain any surrogates. In the first two cases you could also encode your String as ASCII or Latin-1, respectively, and have the same XOR-result for the bytes. (You still can end up with control chars, which may be a problem for you.)
What I'm finally saying here: don't expect the result of encrypting Strings to be a valid string again - instead, simply store and transmit it as a byte[] (or a stream of bytes). (And yes, convert to UTF-8 before encrypting, and from UTF-8 after decrypting).
This solution is compatible with Android (I've tested and used it myself). Thanks to #user467257 whose solution I adapted this from.
import android.util.Base64;
public class StringXORer {
public String encode(String s, String key) {
return new String(Base64.encode(xorWithKey(s.getBytes(), key.getBytes()), Base64.DEFAULT));
}
public String decode(String s, String key) {
return new String(xorWithKey(base64Decode(s), key.getBytes()));
}
private byte[] xorWithKey(byte[] a, byte[] key) {
byte[] out = new byte[a.length];
for (int i = 0; i < a.length; i++) {
out[i] = (byte) (a[i] ^ key[i%key.length]);
}
return out;
}
private byte[] base64Decode(String s) {
return Base64.decode(s,Base64.DEFAULT);
}
private String base64Encode(byte[] bytes) {
return new String(Base64.encode(bytes,Base64.DEFAULT));
}
}
Assuming (!) the strings are of equal length, why not convert the strings to byte arrays and then XOR the bytes. The resultant byte arrays may be of different lengths too depending on your encoding (e.g. UTF8 will expand to different byte lengths for different characters).
You should be careful to specify the character encoding to ensure consistent/reliable string/byte conversion.
This is the code I'm using:
private static byte[] xor(final byte[] input, final byte[] secret) {
final byte[] output = new byte[input.length];
if (secret.length == 0) {
throw new IllegalArgumentException("empty security key");
}
int spos = 0;
for (int pos = 0; pos < input.length; ++pos) {
output[pos] = (byte) (input[pos] ^ secret[spos]);
++spos;
if (spos >= secret.length) {
spos = 0;
}
}
return output;
}
the abs function is when the Strings are not the same length so the legth of the result will be the same as the min lenght of the two String a and b
public String xor(String a, String b){
StringBuilder sb = new StringBuilder();
for(int k=0; k < a.length(); k++)
sb.append((a.charAt(k) ^ b.charAt(k + (Math.abs(a.length() - b.length()))))) ;
return sb.toString();
}
Related
my project requires converting Arabic text to binary , then convert binary to text (reverse process).
I used this code,but I notice that when I use utf-16 to convert string to binary, then read this binary to convert it back to the original UTF-16 String will give me different chars
for example : the Arabic char used in encoding is (ن) which converted in binary (0100011000000110) by utf-16lE
now when I want to convert these binary bits(0100011000000110) to original utf-16 string will give me different character is F.
These problem just appears if the string is Arabic characters and utf-16 encoding. How I can solve this problem..?
// Convert the text to binary
public static String getBinaryFromText(String secretText) {
byte[] bytes = secretText.getBytes(StandardCharsets.UTF_16LE);
StringBuilder binary = new StringBuilder();
for (byte b : bytes) {
int val = b;
for (int i = 0; i < 8; i++) {
binary.append((val & 128) == 0 ? 0 : 1);
val <<= 1;
}
}
return binary.toString();
}
// Convert the binary to text.
public static String getTextFromBinary(String binaryString) {
String binary = binaryString.replace(" ", "");
String binaryPer8Bits;
byte[] byteData;
byteData = new byte[binary.length() / 8];
for (int i = 0; i < binary.length() / 8; i++) {
// To divide the string into 8 characters
binaryPer8Bits = binary.substring(i * 8, (i + 1) * 8);
// The integer of decimal string of binary numbers
Integer integer = Integer.parseInt(binaryPer8Bits, 2);
// The casting to a byte type variable
byteData[i] = integer.byteValue();
}
return new String(byteData);
}
With new String(byteData); you interpret the create byte[] with default encoding. To interpret it as UTF_16LE you need to use a different constructor:
new String(byteData, StandardCharsets.UTF_16LE);
(Almost) never use the new String(byte[]) it will use the default encoding, so your application will be platform dependend.
I have a string, which I believed contains some of ISO-8859-1 hex character code
String doc = "#xC1;o thun b#xE9; g#xE1;i c#x1ED9;t d#xE2;y xanh bi#x1EC3;n"
And I want to change it into this,
Áo thun bé gái cột dây xanh biển
I have tried this method but no luck
byte[] isoBytes = doc.getBytes("ISO-8859-1");
System.out.println(new String(isoBytes, "UTF-8"));
What is the proper way to convert it? Many thanks for your help!
On the assumption that the #nnnn; sequences are plain old Unicode character representation, I suggest the following approach.
class Cvt {
static String convert(String in) {
String str = in;
int curPos = 0;
while (curPos < str.length()) {
int j = str.indexOf("#x", curPos);
if (j < 0) // no more #x
curPos = str.length();
else {
int k = str.indexOf(';', curPos + 2);
if (k < 0) // unterminated #x
curPos = str.length();
else { // convert #xNNNN;
int n = Integer.parseInt(str.substring(j+2, k), 16);
char[] ch = { (char)n };
str = str.substring(0, j) + new String(ch) + str.substring(k+1);
curPos = j + 1; // after ch
}
}
}
return str;
}
static public void main(String... args) {
String doc = "#xC1;o thun b#xE9; g#xE1;i c#x1ED9;t d#xE2;y xanh bi#x1EC3;n";
System.out.println(convert(doc));
}
}
This is very similar to the approach of the previous answer, except for the assumption that the character is a Unicode codepoint and not an 8859-1 codepoint.
And the output is
Áo thun bé gái cột dây xanh biển
There is no hex literal syntax for strings in Java. If you need to support that String format, I would make a helper function which parses that format and builds up a byte array and then parse that as ISO-8859-1.
import java.io.ByteArrayOutputStream;
public class translate {
private static byte[] parseBytesWithHexLiterals(String s) throws Exception {
final ByteArrayOutputStream baos = new ByteArrayOutputStream();
while (!s.isEmpty()) {
if (s.startsWith("#x")) {
s = s.substring(2);
while (s.charAt(0) != ';') {
int i = Integer.parseInt(s.substring(0, 2), 16);
baos.write(i);
s = s.substring(2);
}
} else {
baos.write(s.substring(0, 1).getBytes("US-ASCII")[0]);
}
s = s.substring(1);
}
return baos.toByteArray();
}
public static void main(String[] args) throws Exception {
String doc = "#xC1;o thun b#xE9; g#xE1;i c#x1ED9;t d#xE2;y xanh bi#x1EC3;n";
byte[] parsedAsISO88591 = parseBytesWithHexLiterals(doc);
doc = new String(parsedAsISO88591, "ISO-8859-1");
System.out.println(doc); // Print out the string, which is in Unicode internally.
byte[] asUTF8 = doc.getBytes("UTF-8"); // Get a UTF-8 version of the string.
}
}
This is a case where the code can really obscure the requirements. The requirements are a bit uncertain but seem to be to decode a specialized Unicode character entity reference similar to HTML and XML, as documented in the comments.
It is also a somewhat rare case where the advantage of the regular expression engine outweighs any studying needed to understand the pattern language.
String input = "#xC1;o thun b#xE9; g#xE1;i c#x1ED9;t d#xE2;y xanh bi#x1EC3;n";
// Hex digits between "#x" and ";" are a Unicode codepoint value
String text = java.util.regex.Pattern.compile("(#x([0-9A-Fa-f]+);)")
.matcher(input)
// group 2 is the matched input between the 2nd ( in the pattern and its paired )
.replaceAll(x -> new String(Character.toChars(Integer.parseInt(x.group(2), 16))));
System.out.println(text);
The matcher function finds candidate strings to replace that match the pattern. The replaceAll function replaces them with the calculated Unicode codepoint. Since a Unicode codepoint might be encoded as two char (UTF-16) values the desired replacement string must be constructed from a char[].
I have the following code trying to convert between byte and bit arrays, somehow it's not converting correctly, what is wrong and how to correct it ?
String getBitsFromBytes(byte[] Byte_Array) // 129
{
String Bits="";
for (int i=0;i<Byte_Array.length;i++) Bits+=String.format("%8s",Integer.toBinaryString(Byte_Array[i] & 0xFF)).replace(' ','0');
System.out.println(Bits); // 10000001
return Bits;
}
byte[] getBytesFromBits(int[] bits)
{
byte[] results=new byte[(bits.length+7)/8];
int byteValue=0;
int index;
for (index=0;index<bits.length;index++)
{
byteValue=(byteValue<<1)|bits[index];
if (index%8==7) results[index/8]=(byte)byteValue;
}
if (index%8!=0) results[index/8]=(byte)((byte)byteValue<<(8-(index%8)));
System.out.println(Arrays.toString(results));
return results;
}
...
String bit_string=getBitsFromBytes("ab".getBytes()); // 0110000101100010 : 01100001 + 01100010 --> ab
int[] bits=new int[bit_string.length()];
for (int i=0;i<bits.length;i++) bits[i]=Integer.parseInt(bit_string.substring(i,i+1));
getBytesFromBits(bits);
When I ran it, I got the following :
0110000101100010
[97, 98]
I was expecting this :
0110000101100010
[a, b]
You need to convert from byte to char if you plan to display numeric values as their corresponding ASCII character:
char[] chars = new char[results.length];
for (int i = 0; i < results.length; i++) {
chars[i] = (char) results[i];
}
System.out.println(Arrays.toString(chars));
To convert from byte[] to String you should use new String(byte[]) constructor and specify the right charset. Arrays.toString() exists only to print a sequence of elements.
I have a string "1234567(Asics (アシックスワーキング) )". It has unicode character, some are a part of ASCII and some are not. What java does is that it takes one byte for ASCII character and two bytes for other unicode characters.
Some part of my program is unable to process the string in this format. So I wanted to encode the values into escaped sequences.
So the string
"1234567(Asics (アシックスワーキング) )"
would map to
"\u0031\u0032\u0033\u0034\u0035\u0036\u0037\u0028\u0041\u0073\u0069\u0063\u0073\u0020\u0028\u30a2\u30b7\u30c3\u30af\u30b9\u30ef\u30fc\u30ad\u30f3\u30b0\u0029\u0020\u0029"
.
I wrote this function to do this :-
public static String convertToEscaped(String utf8) throws java.lang.Exception
{
char[] str = utf8.toCharArray();
StringBuilder unicodeStringBuilder = new StringBuilder();
for(int i = 0; i < str.length; i++){
char charValue = str[i];
int intValue = (int) charValue;
String hexValue = Integer.toHexString(intValue);
unicodeStringBuilder.append("\\u");
for (int length = hexValue.length(); length < 4; length++) {
unicodeStringBuilder.append("0");
}
unicodeStringBuilder.append(hexValue);
}
return unicodeStringBuilder.toString();
}
This was working fine outside of my program but caused issues inside my program. This was happening to the line char[] str = utf8.toCharArray();
Somehow I was loosing my japanese unicode characters and this was happening because t was dividing these characters into 2 in the char array.
So I decided to go with byte [] instead.
public static String convertToEscaped(String utf8) throws java.lang.Exception
{
byte str[] = utf8.getBytes();
StringBuilder unicodeStringBuilder = new StringBuilder();
for(int i = 0; i < str.length - 1 ; i+=2){
int intValue = (int) str[i]* 256 + (int)str[i+1];
String hexValue = Integer.toHexString(intValue);
unicodeStringBuilder.append("\\u");
for (int length = hexValue.length(); length < 4; length++) {
unicodeStringBuilder.append("0");
}
unicodeStringBuilder.append(hexValue);
}
return unicodeStringBuilder.toString();
}
Output :
\u3132\u3334\u3536\u3738\u2841\u7369\u6373\u2028\uffffe282\uffffa1e3\uffff81b7\uffffe283\uffff82e3\uffff81af\uffffe282\uffffb8e3\uffff82af\uffffe283\uffffbbe3\uffff81ad\uffffe283\uffffb2e3\uffff81b0\u2920
But this is also wrong as I am merging two single byte characters into one. What can I do to overcome this?
I don't know your other code's specific requirements. But my advice is to not reinvent the wheel and use the built-in encoding capabilities of the API.
For instance call getBytes with either StandardCharsets.UTF_16BE or StandardCharsets.UTF_16LE based on the endian-ness you need:
String s = "1234567(Asics (アシックスワーキング) )";
byte[] utf8 = s.getBytes(StandardCharsets.UTF_8);
byte[] utf16 = s.getBytes(StandardCharsets.UTF_16BE); // high order byte first
System.out.println(s.length()); // 28
System.out.println(utf8.length); // 48
System.out.println(utf16.length); // 56 (2 bytes for each char)
As they commented above the internal representation of string in java is utf-16. Found
Character.codePointAt() and Integer.toHexString() that are helpful in your case.
Renamed the parameter to just theString, also removed the throws Exception clause from your original method since no exception was thrown. (it is bad practice in general to throw these generic exceptions)
public static String convertToEscaped(String theString) {
char[] charArr = theString.toCharArray();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < charArr.length; i++) {
String hexString = Integer.toHexString(Character.codePointAt(charArr, i));
sb.append("\\u");
if (hexString.length() == 2) {
sb.append("00");
}
sb.append(hexString);
}
return sb.toString();
}
I am probably overlooking something silly, but I've never had to deal with binary in code and thought it'd be a good idea to practice it in an encryption program, for kicks.
Long story short, I'm able to convert a string into binary (in the form of a string), but can't figure out how to do the reverse.
Right now, I have something like this:
public static String bytesToString(String bytes){
int i = bytes.length()/8;
int pos = 0;
String result = "";
for(int j=0; j<i; j++){
String temp = bytes.substring(pos,pos+8);
byte b = (byte) Integer.parseInt(temp);
result = result + Byte.toString(b);
pos++;
}
System.out.println("Result: " + result);
return result;
}
I think the bytes are being parsed as literal numbers. What am I missing?
Edit: To clarify, I will previously have parsed a string of text into bits and written them to a string. I want to split this string into bytes and parse them back into letters. It would take "011010000110010101111001" and return "hey".
How about using Integer.parseInt(text, 2)? As in,
public static int binaryToInt(String binary)
{
return Integer.parseInt(binary, 2);
}
I'm not sure why your binaryToString method both takes and returns a string.
Integer.parseInt(temp) will attempt to read temp as a number and return the corresponding int. For example, Integer.parseInt("123") returns 123
EDIT: Be aware that the binary value of a character or text depends on the encoding you are using. For example "hi" is 0110100001101001 in ASCII but it may not in UTF-16 or UTF-32. And Java encodes characters into UTF-16 characters: see http://download.oracle.com/javase/6/docs/api/java/lang/String.html
(for this reason Java chars are 16-bit unsigned integers).
So your bytesToString method must treat input differently depending on the encoding of the input. Or you may write it specifically for ASCII characters, and maybe rename it to, say, asciiBytesToString
You'd better see:
constructor String(byte[])
http://download.oracle.com/javase/6/docs/api/java/lang/String.html
Integer.parseInt(String s, int radix) http://download.oracle.com/javase/6/docs/api/java/lang/Integer.html
public class BinaryStringToChars {
public static void main(String[] args) {
String bin = "011010000110010101111001";
StringBuilder b = new StringBuilder();
int len = bin.length();
int i = 0;
while (i + 8 <= len) {
char c = convert(bin.substring(i, i+8));
i+=8;
b.append(c);
}
System.out.println(b.toString());
}
private static char convert(String bs) {
return (char)Integer.parseInt(bs, 2);
}
}
You need to advance 8 digits at a time, not digit by digit. Otherwise you are reusing bits. Also, you need to tell Integer.parseInt() what radix you want to use, since parseInt(String val) cannot really detect binary (you want Integer.parseInt(String val, int radix). You also need to choose a character encoding to convert bytes into characters (they are not the same thing!). Assuming ISO-8859-1 is ok:
public static String bytesToString(String bytes){
int i = bytes.length()/8;
int pos = 0;
String result = "";
byte[] buffer = new byte[i];
for(int j=0; j<i; j++){
String temp = bytes.substring(pos,pos+8);
buffer[j] = (byte) Integer.parseInt(temp, 2);
pos+=8;
}
result = new String(buffer, "ISO-8859-1");
System.out.println("Result: " + result);
return result;
}