Java: How to get a bi-directional numeric representation of a string? - java

I need to compute a numeric representation of a string which is bi-direction. For example, If I have a string "US" I would like an algorithm which when applied to "US" generates a number X (int or long). When another algorithm is applied to X, I want to get "US". Each string consists of two characters.
Thanks in advance.

The following does it easily by using DataInputStream and DataOutputStream to read/write to an underlying byte array.
public static void main(String[] args) {
String original = "US";
int i = stringToInt(original);
String copy = intToString(i);
System.out.println("original: "+original);
System.out.println("i: "+i);
System.out.println("copy: "+copy);
}
static int stringToInt(String s) {
byte[] bytes = s.getBytes();
if (bytes.length > 4) {
throw new IllegalArgumentException("String too large to be" +
" stored in an int");
}
byte[] fourBytes = new byte[4];
System.arraycopy(bytes, 0, fourBytes, 0, bytes.length);
try {
return new DataInputStream(new ByteArrayInputStream(fourBytes))
.readInt();
} catch (IOException e) {
throw new RuntimeException("impossible");
}
}
static String intToString(int i) {
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
try {
new DataOutputStream(byteArrayOutputStream).writeInt(i);
} catch (IOException e) {
throw new RuntimeException("impossible");
}
return new String(byteArrayOutputStream.toByteArray());
}

This is in the general sense impossible; there are only 2^64 long values, and there are more than 2^64 64-character strings consisting only of the characters X, Y and Q.
Maybe you want to have a pair of hash tables A and B and a counter; if you're given a string you check whether it's in the first hash table, if so return the value you stored there, if not then you set
A[string]=counter; B[counter]=string; counter=1+counter;

What you're describing is bidirectional encryption. Something like this may help you. Another way to do this if you specifically want a numerical value, is to store the character codes (ASCII codes) of each letter. However the resulting number is going to be huge (especially for really long strings) and you probably won't be able to store it in an 32 or 64-bit integer. Even a long won't help you here.
UPDATE
According to your edit, which says that you only need two characters, you can use the ASCII codes by using getBytes() on the String. When you need to convert it back, the first two digits will correspond to the first character, whereas the last two will correspond to the second character.

This could do, assuming your Strings have length 2, i.e. consist of two Java char values:
public int toNumber(String s) {
return s.charAt(0) + s.charAt(1) << 16;
}
public String toString(int number) {
return (char)number + "" + (char)(number >> 16);
}
There are Unicode characters (those with numbers over 216) that do not fit into a single Java char, but are represented (by UTF-16) by two consecutive surrogates. This algorithm would work for a single-Character-string consisting of these two surrogates, but not for longer strings consisting of more than one such character.
Also, there are int values which do not map back to valid Unicode (or UTF-16) strings (e.g. which produce unpaired surrogates instead of valid characters). But each normal string gets converted to an int and back to the same string.

Related

Encrypt using reapeting XOR

i have to encrypy a string using repeating XOR with the KEY:"ICE".
I think that i made a correct algorith to do it but the solution of the problem has 5 byte less then my calculated Hex string, why? Until this 5 bytes more the string are equals.
Did i miss something how to do repeating XOR?
public class ES5 {
public static void main(String[] args) throws UnsupportedEncodingException {
String str1 = "Burning 'em, if you ain't quick and nimble";
String str2 = "I go crazy when I hear a cymbal";
String correct1 = "0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a2622632427276527";
byte[] cr = Encript(str1.getBytes(StandardCharsets.UTF_8),"ICE");
String cr22 = HexFormat.of().formatHex(cr);
System.out.println(cr22);
System.out.println(correct1);
}
private static byte doXOR(byte b, byte b1) {
return (byte) (b^b1);
}
private static byte[] Encript(byte[] bt1, String ice) {
int x = 0;
byte[] rt = new byte[bt1.length];
for (int i=0;i< bt1.length;i++){
rt[i] = doXOR(bt1[i],(byte) (ice.charAt(x) & 0x00FF));
x++;
if(x==3)x=0;
}
return rt;
}
}
Hmmm. The String contains characters, and XOR works on bytes.
That's why the first thing is to run String.getBytes() to receive a byte array.
Here, depending on the characters and their encoding the amount of bytes can be more than the amount of characters. You may want to print and compare the numbers already.
Then you perform XOR on the bytes, which may bring you into a completely different area for characters - so you cannot rely on new String(byte[]) at all. Instead you have to create a HEX string representation of the byte[].
Finally compare this HEX string with the value in correct. To me that string already looks like a HEX representation, so do not apply HEX again.

Is there some sort of functionality in Java that converts a char into a bit?

I'm trying to find a way to convert a char (Precondition is the char can only be '0' or '1') into an actual bit in Java. I'm not sure if Java has some built-in functionality for this, or if there is an algorithm that can be implemented to do so.
I need to implement the following class:
public void writeBit(char bit) {
//PRE:bit == '0' || bit == '1'
try {
} catch (IOException e) {
System.out.println(e);
}
}
I cannot change the method structure in any way. I am implementing Huffman Encoding and have an array of Strings that represent the encodings for every character within an input file. For example, 'A' or array[65] contains the String: "01011". So if I see the letter A in my file, I need to use writeBit to write out A's respective String to a binary file. Every time I reach 8 bits (one byte) I will call writeByte to send those 8 bits to the binary file, then reset some sort of counter variable to 0 and continue.
What I'm stuck on is how I am supposed to convert the char bit into an actual bit, so that it can be properly written out to a binary file.
Java does not have a primitive data type representing a single bit. On many hardware architectures, it is not even possible to access memory with that granularity.
When you say "an actual bit", then, I can only assume that you mean an integer value that is either 0 or 1, as opposed to char values '0' and '1'. There are numerous ways to perform such a conversion, among them:
byte the_bit = bit - '0';. This takes advantage of the fact that char is an integer type, and that the decimal digits zero and one are encoded in Java with consecutive character codes.
byte the_bit = (bit == '0') ? 0 : 1;. This just explicitly tests whether bit contains the value '0', evaluating to 0 if so or 1 if not.
It gets more complicated from there, for example:
byte the_bit = Byte.parseByte(String.valueOf(bit));. This converts the char to a string containing (only) that char, and then parses it as the string representation of a byte.
All of the above rely to one degree or another on the precondition given: that bit does not have any value other than '0' or '1'.
With that said, I think anything like this is probably the wrong approach for implementing a Huffman encoding, because Java Strings are an unlikely, very heavyweight, representation for the bit strings involved.
You can use Integer.parseInt(String s, int radix) or Integer.parseUnsignedInt(String s, int radix) with radix 2, to convert from a "binary digits string" to internal int java integer form.
public static void main(String[] args) {
int num = Integer.parseInt("101010", 2);
// print 42
System.out.println(num);
}
And reversely with method Integer.toBinaryString(int i) you can generate the binary string representation:
// print 101010
System.out.println(Integer.toBinaryString(42));
Similarly you can use Byte.parseByte(String s, int radix) to parse a byte:
public static void main(String[] args) {
byte num = Byte.parseByte("101010", 2);
// print 42
System.out.println(num);
}

Converting from Hexadecimal to Bytes

I have some Java code that converts a Hexadecimal string into bytes. It seems to work okay for very short hexadecimal strings but flags an error if I use a long string, but I cant figure out why. I'm new to Java and programming in general. Feel free to point out any other areas which I could improve.
Here is my code:
public class Hextobinary {
static String hexToBinary(String hex) {
int i = Integer.parseInt(hex, 16);
String bin = Integer.toBinaryString(i);
return bin;
}
public static void main(String[] args) {
String h = "5F";
String x = hexToBinary(h);
System.out.println(x);
}
}
Many Thanks
There is a built-in for this using DatatypeConverter, so you may not have to do it yourself.
import javax.xml.bind.DatatypeConverter;
public class HexUtils {
public String toHex(final byte[] arr) {
return DatatypeConverter.printHexBinary(arr);
}
public byte[] fromHex(final String str) {
return DatatypeConverter.parseHexBinary(str);
}
}
You are parsing your string to an int. That will work for short hex strings, but not for longer ones. An int is 32 bits, or 8 hex characters. Any string longer than that will not fit into an int.
If you do write your own method, then split the hex string up into two character chunks, and process each pair of characters separately into a byte, and store the bytes in a byte array. That will allow you to deal with longer hex strings.
If you are using huge strings, the type int (Integer) of the variable i cannot store the value contained in the string hex. An Integer can only store values ranging from -80000000 (hexadecimal) to +7FFFFFFF. Any longer string will cause your function to produce false results.
One quick solution is to use the type Long (and the function parseLong) instead of Integer. The type Long can hold values ranging from -8000000000000000 (hexadecimal) to +7FFFFFFFFFFFFFFF. But if you need to convert longer strings, this is not going to work anymore.

Why does the character stream read ints?

In the example given in the Oracle Java Tutorial they are trying to read characters as integers... .
Why and how does that work?
try {
inputStream = new FileReader("xanadu.txt");
outputStream = new FileWriter("characteroutput.txt");
int c;
while ((c = inputStream.read()) != -1) {
outputStream.write(c);
}
If you read char, there would be no value you could use for end of file.
By using a larger type int, its possible to have every possible character AND another symbol which means end of file.
This is because characters ARE integers. Each character has a unicode equivalent.
Basically a char is an int. Try the following:
char c = 'c';
int i = c;
This will not cause a compile error.
Behind the scenes in java, a char is just a 16-bit unsigned value. An int is a 32-bit unsigned value.
chars are a subset of ints whose values have meaning on the ASCII tables.
Because of this relationship, it is a convenience for syntax to allow the two types to easily converted to the other.
Well, if you read the documentation for Reader/Writer you can see the following explanation:
Writer Class - write Method
Writes a single character. The character to be written is contained
in the 16 low-order bits of the given integer value; the 16
high-order bits are ignored.
And the code simply does:
public void write(int c) throws IOException {
synchronized (lock) {
if (writeBuffer == null){
writeBuffer = new char[writeBufferSize];
}
writeBuffer[0] = (char) c;
write(writeBuffer, 0, 1);
}
}
So, in the case of Writer, and as far as I can see this could have been done with a char data type.
The Reader, on the other hand, int its read method has the responsibility of returning a character or the end of the stream indicator.
The documentation says:
Reader Class read Method
The character read, as an integer in the range 0 to 65535
or -1 if the end of the stream has been reached.
As such, a data type bigger than just a char is needed, and in this case int is used.
And it is implemented as follows:
public int read() throws IOException {
char cb[] = new char[1];
if (read(cb, 0, 1) == -1)
return -1;
else
return cb[0];
}
So, this second case justifies the use of a bigger data type.
The reason why they use an int in both classes could be just a matter of consistency.

My java class implementation of XOR encryption has gone wrong

I am new to java but I am very fluent in C++ and C# especially C#. I know how to do xor encryption in both C# and C++. The problem is the algorithm I wrote in Java to implement xor encryption seems to be producing wrong results. The results are usually a bunch of spaces and I am sure that is wrong. Here is the class below:
public final class Encrypter {
public static String EncryptString(String input, String key)
{
int length;
int index = 0, index2 = 0;
byte[] ibytes = input.getBytes();
byte[] kbytes = key.getBytes();
length = kbytes.length;
char[] output = new char[ibytes.length];
for(byte b : ibytes)
{
if (index == length)
{
index = 0;
}
int val = (b ^ kbytes[index]);
output[index2] = (char)val;
index++;
index2++;
}
return new String(output);
}
public static String DecryptString(String input, String key)
{
int length;
int index = 0, index2 = 0;
byte[] ibytes = input.getBytes();
byte[] kbytes = key.getBytes();
length = kbytes.length;
char[] output = new char[ibytes.length];
for(byte b : ibytes)
{
if (index == length)
{
index = 0;
}
int val = (b ^ kbytes[index]);
output[index2] = (char)val;
index++;
index2++;
}
return new String(output);
}
}
Strings in Java are Unicode - and Unicode strings are not general holders for bytes like ASCII strings can be.
You're taking a string and converting it to bytes without specifying what character encoding you want, so you're getting the platform default encoding - probably US-ASCII, UTF-8 or one of the Windows code pages.
Then you're preforming arithmetic/logic operations on these bytes. (I haven't looked at what you're doing here - you say you know the algorithm.)
Finally, you're taking these transformed bytes and trying to turn them back into a string - that is, back into characters. Again, you haven't specified the character encoding (but you'll get the same as you got converting characters to bytes, so that's OK), but, most importantly...
Unless your platform default encoding uses a single byte per character (e.g. US-ASCII), then not all of the byte sequences you will generate represent valid characters.
So, two pieces of advice come from this:
Don't use strings as general holders for bytes
Always specify a character encoding when converting between bytes and characters.
In this case, you might have more success if you specifically give US-ASCII as the encoding. EDIT: This last sentence is not true (see comments below). Refer back to point 1 above! Use bytes, not characters, when you want bytes.
If you use non-ascii strings as keys you'll get pretty strange results. The bytes in the kbytes array will be negative. Sign-extension then means that val will come out negative. The cast to char will then produce a character in the FF80-FFFF range.
These characters will certainly not be printable, and depending on what you use to check the output you may be shown "box" or some other replacement characters.

Categories