My java class implementation of XOR encryption has gone wrong - java

I am new to java but I am very fluent in C++ and C# especially C#. I know how to do xor encryption in both C# and C++. The problem is the algorithm I wrote in Java to implement xor encryption seems to be producing wrong results. The results are usually a bunch of spaces and I am sure that is wrong. Here is the class below:
public final class Encrypter {
public static String EncryptString(String input, String key)
{
int length;
int index = 0, index2 = 0;
byte[] ibytes = input.getBytes();
byte[] kbytes = key.getBytes();
length = kbytes.length;
char[] output = new char[ibytes.length];
for(byte b : ibytes)
{
if (index == length)
{
index = 0;
}
int val = (b ^ kbytes[index]);
output[index2] = (char)val;
index++;
index2++;
}
return new String(output);
}
public static String DecryptString(String input, String key)
{
int length;
int index = 0, index2 = 0;
byte[] ibytes = input.getBytes();
byte[] kbytes = key.getBytes();
length = kbytes.length;
char[] output = new char[ibytes.length];
for(byte b : ibytes)
{
if (index == length)
{
index = 0;
}
int val = (b ^ kbytes[index]);
output[index2] = (char)val;
index++;
index2++;
}
return new String(output);
}
}

Strings in Java are Unicode - and Unicode strings are not general holders for bytes like ASCII strings can be.
You're taking a string and converting it to bytes without specifying what character encoding you want, so you're getting the platform default encoding - probably US-ASCII, UTF-8 or one of the Windows code pages.
Then you're preforming arithmetic/logic operations on these bytes. (I haven't looked at what you're doing here - you say you know the algorithm.)
Finally, you're taking these transformed bytes and trying to turn them back into a string - that is, back into characters. Again, you haven't specified the character encoding (but you'll get the same as you got converting characters to bytes, so that's OK), but, most importantly...
Unless your platform default encoding uses a single byte per character (e.g. US-ASCII), then not all of the byte sequences you will generate represent valid characters.
So, two pieces of advice come from this:
Don't use strings as general holders for bytes
Always specify a character encoding when converting between bytes and characters.
In this case, you might have more success if you specifically give US-ASCII as the encoding. EDIT: This last sentence is not true (see comments below). Refer back to point 1 above! Use bytes, not characters, when you want bytes.

If you use non-ascii strings as keys you'll get pretty strange results. The bytes in the kbytes array will be negative. Sign-extension then means that val will come out negative. The cast to char will then produce a character in the FF80-FFFF range.
These characters will certainly not be printable, and depending on what you use to check the output you may be shown "box" or some other replacement characters.

Related

How to convert each char in string to 8 bit int? JAVA

I've been suggested a TCP-like checksum, which consists of the sum of the (integer) sequence and ack field values, added to a character-by-character sum of the payload field of the packet (i.e., treat each character as if it were an 8 bit integer and just add them together).
I'm assuming it would go along the lines of:
char[] a = data.toCharArray();
for (int i = 0; int < len; i++) {
...
}
Though I'm pretty clueless as to how I could complete the actual conversion?
My data is string, and I wish to go through the string (converted to a char array (though if there's a better way to do this let me know!)) and now I'm ready to iterate though how does one convert each character to an int. I will then be summing the total.
As String contains Unicode, and char is a two-byte UTF-16 implementation of Unicode, it might be better to first convert the String to bytes:
byte[] bytes = data.getBytes(StandardCharsets.UTF_8);
data = new String(bytes, StandardCharsets.UTF_8); // Inverse.
int crc = 0;
for (byte b : bytes) {
int n = b & 0xFF; // An int 0 .. 255 without sign extension
crc ^= n;
}
Now you can handle any Unicode content of a String. UTF-8 is optimal when sufficient ASCII letters are used, like Chinese HTML pages. (For a Chinese plain text UTF-16 might be better.)

Convert String to/from byte array without encoding

I have a byte array read over a network connection that I need to transform into a String without any encoding, that is, simply by treating each byte as the low end of a character and leaving the high end zero. I also need to do the converse where I know that the high end of the character will always be zero.
Searching the web yields several similar questions that have all got responses indicating that the original data source must be changed. This is not an option so please don't suggest it.
This is trivial in C but Java appears to require me to write a conversion routine of my own that is likely to be very inefficient. Is there an easy way that I have missed?
No, you aren't missing anything. There is no easy way to do that because String and char are for text. You apparently don't want to handle your data as text—which would make complete sense if it isn't text. You could do it the hard way that you propose.
An alternative is to assume a character encoding that allows arbitrary sequences of arbitrary byte values (0-255). ISO-8859-1 or IBM437 both qualify. (Windows-1252 only has 251 codepoints. UTF-8 doesn't allow arbitrary sequences.) If you use ISO-8859-1, the resulting string will be the same as your hard way.
As for efficiency, the most efficient way to handle an array of bytes is to keep it as an array of bytes.
This will convert a byte array to a String while only filling the upper 8 bits.
public static String stringFromBytes(byte byteData[]) {
char charData[] = new char[byteData.length];
for(int i = 0; i < charData.length; i++) {
charData[i] = (char) (((int) byteData[i]) & 0xFF);
}
return new String(charData);
}
The efficiency should be quite good. Like Ben Thurley said, if performance is really such an issue don't convert to a String in the first place but work with the byte array instead.
Here is a sample code which will convert String to byte array and back to String without encoding.
public class Test
{
public static void main(String[] args)
{
Test t = new Test();
t.Test();
}
public void Test()
{
String input = "Hèllo world";
byte[] inputBytes = GetBytes(input);
String output = GetString(inputBytes);
System.out.println(output);
}
public byte[] GetBytes(String str)
{
char[] chars = str.toCharArray();
byte[] bytes = new byte[chars.length * 2];
for (int i = 0; i < chars.length; i++)
{
bytes[i * 2] = (byte) (chars[i] >> 8);
bytes[i * 2 + 1] = (byte) chars[i];
}
return bytes;
}
public String GetString(byte[] bytes)
{
char[] chars = new char[bytes.length / 2];
char[] chars2 = new char[bytes.length / 2];
for (int i = 0; i < chars2.length; i++)
chars2[i] = (char) ((bytes[i * 2] << 8) + (bytes[i * 2 + 1] & 0xFF));
return new String(chars2);
}
}
Using deprecated constructor String(byte[] ascii, int hibyte)
String string = new String(byteArray, 0);
String is already encoded as Unicode/UTF-16. UTF-16 means that it can take up to 2 string "characters"(char) to make one displayable character. What you really want is to use is:
byte[] bytes = System.Text.Encoding.Unicode.GetBytes(myString);
to convert a String to an array of bytes. This does exactly what you did above except it is 10 times faster in performance. If you would like to cut the transmission data nearly in half, I would recommend converting it to UTF8 (ASCII is a subset of UTF8) - the format the internet uses 90% of the time, by calling:
byte[] bytes = Encoding.UTF8.GetBytes(myString);
To convert back to a string use:
String myString = Encoding.Unicode.GetString(bytes);
or
String myString = Encoding.UTF8.GetString(bytes);

Converting string to binary and back again does not give the same string

I'm writing a Simplified DES algorithm to encrypt and subsequently decrypt a string. Suppose I have the initial character ( which has the binary value 00101000 which I get using the following algorithm:
public void getBinary() throws UnsupportedEncodingException {
byte[] plaintextBinary = text.getBytes("UTF-8");
for(byte b : plaintextBinary){
int val = b;
int[] tempBinRep = new int[8];
for(int i = 0; i<8; i++){
tempBinRep[i] = (val & 128) == 0 ? 0 : 1;
val <<= 1;
}
binaryRepresentations.add(tempBinRep);
}
}
After I perform the various permutations and shifts, ( and it's binary equivalent is transformed into 10001010 and it's ASCII equivalent Š. When I come around to decryption I pass the same character through the getBinary() method I now get the binary string 11000010 and another binary string 10001010 which translates into ASCII as x(.
Where is this rogue x coming from?
Edit: The full class can be found here.
You haven't supplied the decrypting code, so we can't know for sure, but I would guess you missed the encoding either when populating your String. Java Strings are encoded in UTF-16 by default. Since you're forcing UTF-8 when encrypting, I'm assuming you're doing the same when decrypting. The problem is, when you convert your encrypted bytes to a String for storage, if you let it default to UTF-16, you're probably ending up with a two-byte character because the 10001010 is 138, which is beyond the 127 range for ASCII charaters that get represented with a single byte.
So the "x" you're getting is the byte for the code page, followed by the actual character's byte. As suggested in the comments, you'd do better to just store the encrypted bytes as bytes, and not convert them to Strings until they're decrypted.

How to replace/remove 4(+)-byte characters from a UTF-8 string in Java?

Because MySQL 5.1 does not support 4 byte UTF-8 sequences, I need to replace/drop the 4 byte sequences in these strings.
I'm looking a clean way to replace these characters.
Apache libraries are replacing the characters with a question-mark is fine for this case, although ASCII equivalent would be nicer, of course.
N.B. The input is from external sources (e-mail names) and upgrading the database is not a solution at this point in time.
We ended up implementing the following method in Java for this problem.
Basicaly replacing the characters with a higher codepoint then the last 3byte UTF-8 char.
The offset calculations are to make sure we stay on the unicode code points.
public static final String LAST_3_BYTE_UTF_CHAR = "\uFFFF";
public static final String REPLACEMENT_CHAR = "\uFFFD";
public static String toValid3ByteUTF8String(String s) {
final int length = s.length();
StringBuilder b = new StringBuilder(length);
for (int offset = 0; offset < length; ) {
final int codepoint = s.codePointAt(offset);
// do something with the codepoint
if (codepoint > CharUtils.LAST_3_BYTE_UTF_CHAR.codePointAt(0)) {
b.append(CharUtils.REPLACEMENT_CHAR);
} else {
if (Character.isValidCodePoint(codepoint)) {
b.appendCodePoint(codepoint);
} else {
b.append(CharUtils.REPLACEMENT_CHAR);
}
}
offset += Character.charCount(codepoint);
}
return b.toString();
}
Another simple solution is to use regular expression [^\u0000-\uFFFF]. For example in java:
text.replaceAll("[^\\u0000-\\uFFFF]", "\uFFFD");
5 byte utf-8 sequences begin with a 111110xx-byte and 6 byte utf-8 sequences begin with a 1111110x-byte. Important to note is, that no follow-up bytes of 1-4-byte utf-8 sequences contain bytes that large because follow-up bytes are always of the form 10xxxxxx.
Therefore you can just go through the bytes and every time you see a byte of kind 111110xx then only emit a '?' to the output-stream/array while skipping the next 4 bytes from the input; analogue for the 6-byte-sequences.

Byte array - locating a character position

I have a byte array in java. That array contains '%' symbol somewhere in it. I want to find the position of that symbol in that array. Is there any way to find this?
Thanks in Advance!
[EDIT]
I tried below code and it worked fine.
byte[] b = {55,37,66};
String s = new String(b);
System.out.println(s.indexOf("%"));
I have a doubt. Is every character takes exactly one byte in java?
A correct and more direct Guava solution:
Bytes.indexOf(byteArray, (byte) '%');
using Google Guava:
com.google.common.primitives.Bytes.asList(byteArray).indexOf(Byte.valueOf('%'))
I come from the future with some streaming and lambda stuff.
If it's just a matter of finding a byte in a byte[]:
Input:
byte[] bytes = {55,37,66};
byte findByte = '%';
With streaming and lambda stuff:
OptionalInt firstMatch = IntStream.range(0, bytes.length).filter(i -> bytes[i] == findByte).findFirst();
int index = firstMatch.isPresent ? firstMatch.getAsInt() : -1;
Which is pretty much the same as:
Actually, I think I still just prefer this. (e.g. and put it in some utility class).
int index = -1;
for (int i = 0 ; i < bytes.length ; i++)
if (bytes[i] == findByte)
{
index = i;
break;
}
EDIT
Your question is actually more about finding a character rather than finding a byte.
What could be improved in your solution:
String s = new String(bytes); // will not always give the same result
// there is an invisible 2nd argument : i.e. charset
String s = new String(bytes, charset); // default charset depends on your system.
So, your program may act different on different platforms.
Some charsets use 1 byte per character, others use 2, 3, ... or are irregular.
So, the size of your string may vary from platform to platform.
Secondly, some byte sequences cannot be represented as strings at all. i.e. if the charset does not have a character for the matching value.
So, how could you improve it:
If you just know that your byte array will always contain plain old ascii values, you could use this:
byte[] b = {55,37,66};
String s = new String(b, StandardCharsets.US_ASCII);
System.out.println(s.indexOf("%"));
On the other hand, if you know that your content contains UTF-8 characters, use :
byte[] b = {55,37,66};
String s = new String(b, StandardCharsets.UTF-8);
System.out.println(s.indexOf("%"));
etc ...

Categories