Length of Strings regarding XOR operation for byte array - java

I am creating an encryption algorithm and is to XOR two strings. While I know how to XOR the two strings the problem is the length. I have two byte arrays one for the plain text which is of a variable size and then the key which is of 56 bytes lets say. What I want to know is what is the correct method of XORing the two strings. Concatenate them into one String in Binary and XOR the two values? Have each byte array position XOR a concatenated Binary value of the key and such. Any help is greatly appreciated.
Regards,
Milinda

To encode just move through the array of bytes from the plain text, repeating the key as necessary with the mod % operator. Be sure to use the same character set at both ends.
Conceptually we're repeating the key like this, ignoring encoding.
hello world, there are sheep
secretsecretsecretsecretsecr
Encrypt
String plainText = "hello world, there are sheep";
Charset charSet = Charset.forName("UTF-8");
byte[] plainBytes = plainText.getBytes(charSet);
String key = "secret";
byte[] keyBytes = key.getBytes(charSet);
byte[] cipherBytes = new byte[plainBytes.length];
for (int i = 0; i < plainBytes.length; i++) {
cipherBytes[i] = (byte) (plainBytes[i] ^ keyBytes[i
% keyBytes.length]);
}
String cipherText = new String(cipherBytes, charSet);
System.out.println(cipherText);
To decrypt just reverse the process.
// decode
for (int i = 0; i < cipherBytes.length; i++) {
plainBytes[i] = (byte) (cipherBytes[i] ^ keyBytes[i
% keyBytes.length]);
}
plainText = new String(plainBytes, charSet); // <= make sure same charset both ends
System.out.println(plainText);

(As noted in comments, you shouldn't use this for anything real. Proper cryptography is incredibly hard to do properly from scratch - don't do it yourself, use existing implementations.)
There's no such concept as "XOR" when it comes to strings, really. XOR specifies the result given two bits, and text isn't made up of bits - it's made up of characters.
Now you could just take the Unicode representation of each character (an integer) and XOR those integers together - but the result may well be a sequence of integers which is not a valid Unicode representation of any valid string.
It's not clear that you're even thinking in the right way to start with - you talk about having strings, but also having 56 bytes. You may have an encoded representation of a string (e.g. the result of converting a string to UTF-8) but that's not the same thing.
If you've got two byte arrays, you can easily XOR those together - and perhaps cycle back to the start of one of them if it's shorter than the other, so that the result is always the same length as the longer array. However, even if both inputs are (say) UTF-8 encoded text, the result often won't be valid UTF-8 encoded text. If you must have the result in text form, I'd suggest using Base64 at that point - there's a public domain base64 encoder which has a simple API.

Related

Converting a byte array with range of -128 to 127 to String array

I have a function for hashing passwords, that returns a byte[] with entries using the full range of the byte datatype from -128 to 127. I have tried to convert the byte[] to a String using new String(byte_array, StandardCharsets.UTF_8);. This does return a String - however it can not properly encode negative numbers - hence it encodes them to a "�" character. When comparing two of those characters using: new String(new byte[]{-1}, StandardCharsets.UTF_8).equals(new String(new byte[]{-2}, StandardCharsets.UTF_8)) it turns out the String representation for all negative numbers is equal as the expression above returns true. While this doesn't fully ruin my hashing functionality as the hash of the same expression will still always yield the same result, this is obviously not what I want as it increases the chance of two different inputs yielding the same output drastically.
Is there some easy fix for this or any alternative idea how to convert the byte[] to a String? For context I want to use the String to later write it to a file to store it in a file and later read it again to compare it to other hashes.
Edit: After a bit of trying around with the tips from the comments my solution is to convert the byte[] to a char[] and add 128 to every value. The char array can then easily be converted to a String or be written to a file directly (byteHash is the byte[]):
char[] charHash = new char[byteHash.length];
for(int i = 0; i < byteHash.length; i++){
charHash[i] = (char) (byteHash[i]+128);
}
return new String(charHash);
I do not really like the solution but it works.
The appropriate solution to this is to use an encoding like hexadecimal (https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/HexFormat.html) or Base64 (https://docs.oracle.com/javase/8/docs/api/java/util/Base64.html) to convert an arbitrary byte sequence to a string reversibly.

How can I save a String Byte without losing information?

I'm developing a JPEG decoder(I'm in the Huffman phase) and I want to write BinaryString's into a file.
For example, let's say we've this:
String huff = "00010010100010101000100100";
I've tried to convert it to an integer spliting it by 8 and saving it integer represantation, as I can't write bits:
huff.split("(?<=\\G.{8})"))
int val = Integer.parseInt(str, 2);
out.write(val); //writes to a FileOutputStream
The problem is that, in my example, if I try to save "00010010" it converts it to 18 (10010), and I need the 0's.
And finally, when I read :
int enter;
String code = "";
while((enter =in.read())!=-1) {
code+=Integer.toBinaryString(enter);
}
I got :
Code = 10010
instead of:
Code = 00010010
Also I've tried to convert it to bitset and then to Byte[] but I've the same problem.
Your example is that you have the string "10010" and you want the string "00010010". That is, you need to left-pad this string with zeroes. Note that since you're joining the results of many calls to Integer.toBinaryString in a loop, you need to left-pad these strings inside the loop, before concatenating them.
while((enter = in.read()) != -1) {
String binary = Integer.toBinaryString(enter);
// left-pad to length 8
binary = ("00000000" + binary).substring(binary.length());
code += binary;
}
You might want to look at the UTF-8 algorithm, since it does exactly what you want. It stores massive amounts of data while discarding zeros, keeping relevant data, and encoding it to take up less disk space.
Works with: Java version 7+
import java.nio.charset.StandardCharsets;
import java.util.Formatter;
public class UTF8EncodeDecode {
public static byte[] utf8encode(int codepoint) {
return new String(new int[]{codepoint}, 0, 1).getBytes(StandardCharsets.UTF_8);
}
public static int utf8decode(byte[] bytes) {
return new String(bytes, StandardCharsets.UTF_8).codePointAt(0);
}
public static void main(String[] args) {
System.out.printf("%-7s %-43s %7s\t%s\t%7s%n",
"Char", "Name", "Unicode", "UTF-8 encoded", "Decoded");
for (int codepoint : new int[]{0x0041, 0x00F6, 0x0416, 0x20AC, 0x1D11E}) {
byte[] encoded = utf8encode(codepoint);
Formatter formatter = new Formatter();
for (byte b : encoded) {
formatter.format("%02X ", b);
}
String encodedHex = formatter.toString();
int decoded = utf8decode(encoded);
System.out.printf("%-7c %-43s U+%04X\t%-12s\tU+%04X%n",
codepoint, Character.getName(codepoint), codepoint, encodedHex, decoded);
}
}
}
https://rosettacode.org/wiki/UTF-8_encode_and_decode#Java
UTF-8 is a variable width character encoding capable of encoding all 1,112,064[nb 1] valid code points in Unicode using one to four 8-bit bytes.[nb 2] The encoding is defined by the Unicode Standard, and was originally designed by Ken Thompson and Rob Pike.[1][2] The name is derived from Unicode (or Universal Coded Character Set) Transformation Format – 8-bit.[3]
It was designed for backward compatibility with ASCII. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. The first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with the same binary value as ASCII, so that valid ASCII text is valid UTF-8-encoded Unicode as well. Since ASCII bytes do not occur when encoding non-ASCII code points into UTF-8, UTF-8 is safe to use within most programming and document languages that interpret certain ASCII characters in a special way, such as "/" (slash) in filenames, "\" (backslash) in escape sequences, and "%" in printf.
https://en.wikipedia.org/wiki/UTF-8
Binary 11110000 10010000 10001101 10001000 becomes F0 90 8D 88 in UTF-8. Since you are storing it as text, you go from having to store 32 characters to storing 8. And because it's a well known and well designed encoding, you can reverse it easily. All the math is done for you.
Your example of 00010010100010101000100100 (or rather 00000001 0010100 0101010 00100100) converts to *$ (two unprintable characters on my machine). That's the UTF-8 encoding of the binary. I had mistakenly used a different site that was using the data I put in as decimal instead of binary.
https://onlineutf8tools.com/convert-binary-to-utf8
For a really good explanation of UTF-8 and how it can apply to the answer:
https://hackaday.com/2013/09/27/utf-8-the-most-elegant-hack/
Edit:
I took this question as a way to reduce the amount of characters needed to store values, which is a type of encoding. UTF-8 is a type of encoding. Used in a "non-standard" way, the OP can use UTF-8 to encode their strings of 0's & 1's in a much shorter format. That's how this answer is relevant.
If you concatenate the characters, you can go from 4x 8 bits (32 bits) to 8x 8 bits (64 bits) easily and encode a value as large as 9,223,372,036,854,775,807.

Converting string to binary and back again does not give the same string

I'm writing a Simplified DES algorithm to encrypt and subsequently decrypt a string. Suppose I have the initial character ( which has the binary value 00101000 which I get using the following algorithm:
public void getBinary() throws UnsupportedEncodingException {
byte[] plaintextBinary = text.getBytes("UTF-8");
for(byte b : plaintextBinary){
int val = b;
int[] tempBinRep = new int[8];
for(int i = 0; i<8; i++){
tempBinRep[i] = (val & 128) == 0 ? 0 : 1;
val <<= 1;
}
binaryRepresentations.add(tempBinRep);
}
}
After I perform the various permutations and shifts, ( and it's binary equivalent is transformed into 10001010 and it's ASCII equivalent Š. When I come around to decryption I pass the same character through the getBinary() method I now get the binary string 11000010 and another binary string 10001010 which translates into ASCII as x(.
Where is this rogue x coming from?
Edit: The full class can be found here.
You haven't supplied the decrypting code, so we can't know for sure, but I would guess you missed the encoding either when populating your String. Java Strings are encoded in UTF-16 by default. Since you're forcing UTF-8 when encrypting, I'm assuming you're doing the same when decrypting. The problem is, when you convert your encrypted bytes to a String for storage, if you let it default to UTF-16, you're probably ending up with a two-byte character because the 10001010 is 138, which is beyond the 127 range for ASCII charaters that get represented with a single byte.
So the "x" you're getting is the byte for the code page, followed by the actual character's byte. As suggested in the comments, you'd do better to just store the encrypted bytes as bytes, and not convert them to Strings until they're decrypted.

Standard way to create a hash in Java

The question is about the correct way of creating a hash in Java:
Lets assume I have a positive BigInteger value that I would like to create a hash from. Lets assume that below instance of the messageDigest is a valid instance of (SHA-256)
public static final BigInteger B = new BigInteger("BD0C61512C692C0CB6D041FA01BB152D4916A1E77AF46AE105393011BAF38964DC46A0670DD125B95A981652236F99D9B681CBF87837EC996C6DA04453728610D0C6DDB58B318885D7D82C7F8DEB75CE7BD4FBAA37089E6F9C6059F388838E7A00030B331EB76840910440B1B27AAEAEEB4012B7D7665238A8E3FB004B117B58", 16);
byte[] byteArrayBBigInt = B.toByteArray();
this.printArray(byteArrayBBigInt);
messageDigest.reset();
messageDigest.update(byteArrayBBigInt);
byte[] outputBBigInt = messageDigest.digest();
Now I only assume that the code below is correct, as according to the test the hashes I produce match with the one produced by:
http://www.fileformat.info/tool/hash.htm?hex=BD0C61512C692C0CB6D041FA01BB152D4916A1E77AF46AE105393011BAF38964DC46A0670DD125B95A981652236F99D9B681CBF87837EC996C6DA04453728610D0C6DDB58B318885D7D82C7F8DEB75CE7BD4FBAA37089E6F9C6059F388838E7A00030B331EB76840910440B1B27AAEAEEB4012B7D7665238A8E3FB004B117B58
However I am not sure why we are doing the step below i.e.
because the returned byte array after the digest() call is signed and in this case it is a negative, I suspect that we do need to convert it to a positive number i.e. we can use a function like that.
public static String byteArrayToHexString(byte[] b) {
String result = "";
for (int i=0; i < b.length; i++) {
result += Integer.toString((b[i] & 0xff) + 0x100, 16).substring(1);
}
return result;
}
thus:
String hex = byteArrayToHexString(outputBBigInt)
BigInteger unsignedBigInteger = new BigInteger(hex, 16);
When I construct a BigInteger from the new hex string and convert it back to byte array then I see that the sign bit, that is most significant bit i.e. the leftmost bit, is set to 0 which means that the number is positive, moreover the whole byte is constructed from zeros ( 00000000 ).
My question is: Is there any RFC that describes why do we need to convert the hash always to a "positive" unsigned byte array. I mean even if the number produced after the digest call is negative it is still a valid hash, right? thus why do we need that additional procedure. Basically, I am looking for a paper: standard or rfc describing that we need to do so.
A hash consists of an octet string (called a byte array in Java). How you convert it to or from a large number (a BigInteger in Java) is completely out of the scope for cryptographic hash algorithms. So no, there is no RFC to describe it as there is (usually) no reason to treat a hash as a number. In that sense a cryptographic hash is rather different from Object.hashCode().
That you can only treat hexadecimals as unsigned is a bit of an issue, but if you really want to then you can first convert it back to a byte array, and then perform new BigInteger(result). That constructor does threat the encoding within result as signed. Note that in protocols it is often not needed to convert back and forth to hexadecimals; hexadecimals are mainly for human consumption, a computer is fine with bytes.

My java class implementation of XOR encryption has gone wrong

I am new to java but I am very fluent in C++ and C# especially C#. I know how to do xor encryption in both C# and C++. The problem is the algorithm I wrote in Java to implement xor encryption seems to be producing wrong results. The results are usually a bunch of spaces and I am sure that is wrong. Here is the class below:
public final class Encrypter {
public static String EncryptString(String input, String key)
{
int length;
int index = 0, index2 = 0;
byte[] ibytes = input.getBytes();
byte[] kbytes = key.getBytes();
length = kbytes.length;
char[] output = new char[ibytes.length];
for(byte b : ibytes)
{
if (index == length)
{
index = 0;
}
int val = (b ^ kbytes[index]);
output[index2] = (char)val;
index++;
index2++;
}
return new String(output);
}
public static String DecryptString(String input, String key)
{
int length;
int index = 0, index2 = 0;
byte[] ibytes = input.getBytes();
byte[] kbytes = key.getBytes();
length = kbytes.length;
char[] output = new char[ibytes.length];
for(byte b : ibytes)
{
if (index == length)
{
index = 0;
}
int val = (b ^ kbytes[index]);
output[index2] = (char)val;
index++;
index2++;
}
return new String(output);
}
}
Strings in Java are Unicode - and Unicode strings are not general holders for bytes like ASCII strings can be.
You're taking a string and converting it to bytes without specifying what character encoding you want, so you're getting the platform default encoding - probably US-ASCII, UTF-8 or one of the Windows code pages.
Then you're preforming arithmetic/logic operations on these bytes. (I haven't looked at what you're doing here - you say you know the algorithm.)
Finally, you're taking these transformed bytes and trying to turn them back into a string - that is, back into characters. Again, you haven't specified the character encoding (but you'll get the same as you got converting characters to bytes, so that's OK), but, most importantly...
Unless your platform default encoding uses a single byte per character (e.g. US-ASCII), then not all of the byte sequences you will generate represent valid characters.
So, two pieces of advice come from this:
Don't use strings as general holders for bytes
Always specify a character encoding when converting between bytes and characters.
In this case, you might have more success if you specifically give US-ASCII as the encoding. EDIT: This last sentence is not true (see comments below). Refer back to point 1 above! Use bytes, not characters, when you want bytes.
If you use non-ascii strings as keys you'll get pretty strange results. The bytes in the kbytes array will be negative. Sign-extension then means that val will come out negative. The cast to char will then produce a character in the FF80-FFFF range.
These characters will certainly not be printable, and depending on what you use to check the output you may be shown "box" or some other replacement characters.

Categories