How can I create a variable character that can hold a four byte value?
I am trying to write an program to encrypt messages in java, for fun. I figured out how to use RSA, and managed to write a program that will encrypt a message and save it to a .txt file.
For example if "Quiet" is entered the outcome will be "041891090280". I wrote my code so that the number would always have length that is a multiple of six. So I thought that I could convert the numbers into a hash code. The first three letters are "041" so I could convert that into ")".
However I am having trouble created a char with a number greater than 255. I have looked around online and found a few examples, but I can't figure out how to implement them. I created a new method just to test them.
int a = 256;
char b = (char) a;
char c = 0xD836;
char[] cc = Character.toChars(0x1D50A);
System.out.println(b);
System.out.println(c);
System.out.println(cc);
The program outputs
?
?
?
I am only getting two bytes. I read that Java uses Unicode which should go up to 65535 which is four bytes. I am using eclipse if that makes a difference.
I apologize for the noob question.
And thanks in advance.
edit
I am sorry, I think I gave too much information and ended up being confusion.
What I want to do is store a string of numbers as a string of unicode characters. the only way I know how to do that is to break up the number string small enough to fit it into a character. then add the characters one by one to a new string. But I don't know how to add a variable unicode character to a string.
All chars are 16-bit already. 0 to 65535 only need 16-bit and 2^16 = 65536.
Note: not all characters are valid and in particular, 0xD800 to 0xDFFF are used for encoding code points (characters beyond 65536)
If you want to be able to store all possible 16-bit values I suggest you use short instead. You can store the same values but it may be less confusing to use.
Related
I am trying to convert a string like "password" to hex values, then have it inside a long array, the loop working fine till reaching the value "6F" (hex value for o char) then I have an exception java.lang.NumberFormatException
String password = "password";
char array[] = password.toCharArray();
int index = 0;
for (char c : array) {
String hex = (Integer.toHexString((int) c));
data[index] = Long.parseLong(hex);
index++;
}
how can I store the 6F values inside Byte array, as the 6F is greater than 1 byte ?. Please help me on this
Long.parseLong parses decimal numbers. It turns the string "10" into the number 10. If the input is hex, that is incorrect - the string "10" is supposed to be turned into the number 16. The fix is to use the Long.parseLong(String input, int radix) method. the radix you want is 16, though writing that as 0x10 may be more readable - it's the same thing to the compiler, purely a personal style choice. Thus, Long.parseLong(hex, 0x10) is what you want.
Note that in practice char has numbers that go from 0 to 65535, which doesn't fit in bytes. In effect, you must put a marker down that passwords must not contain any characters that aren't ASCII characters (so no umlauts, snowmen, emoji, funny quotes, etc).
If you fail to check this, Integer.toHexString((int) c) will turn into something like 16F or worse (3 to 4 characters), and it may also turn into a single character.
More generally, converting from char c to a hex string, and then parse the hex string into a number, is completely pointless. It's turning 15 into "F" and then turning "F" into 15. If you just want to shove a char into a byte: data[index++] = (byte) c; is all you need - that is the only line you need in your for loop.
But, heed this:
This really isn't how you're supposed to do that!
What you're doing is converting character data to a byte array. This is not actually simple - there are only 256 possible bytes, and there are way more characters that folks have invented. Literally hundreds of thousands of them.
Thus, to convert characters to bytes or vice versa, you must apply an encoding. Encodings have wildly varying properties. The most commonly used encoding, however, is 'UTF-8'. It represent every unicode symbol, and has the interesting property that basic ASCII characters look the exact same. However, it has the downside that any given character is smeared out into 1, 2, 3, or even 4 bytes, depending on what character it is. Fortunately, java has plenty of tools for this, thus, you don't need to care. What you really want, is this:
byte[] data = password.getBytes(StandardCharsets.UTF8);
That's asking the string to turn itself into a byte array, using UTF8 encoding. That means "password" turns into the sequence '112 97 115 115 119 111 114 100' which is no doubt what you want, but you can also have as password, say, außgescheignet ☃, and that works too - it's turned into bytes, and you can get back to your snowman enabled password:
String in = "außgescheignet ☃";
byte[] data = in.getBytes(StandardCharsets.UTF8);
String andBackAgain = new String(data, StandardCharsets.UTF8);
assert in.equals(andBackAgain); // true
if you stick this in a source file, make sure you save it in whatever text editor you use to do this as UTF8, and that javac compiles it that way too (javac has an -encoding parameter to enforce this).
If you think this is going to cause issues on whatever you send this to, and you want to restrict it to what someone with a rather USA-centric view would call 'normal' characters, then you want the exact same code as showcased here, but use StandardCharsets.ASCII instead. Then, that line (password.getBytes(StandardCharsets.ASCII)) will flat out error if it includes non-ASCII characters. That's a good thing: Your infrastructure would not deal with it correctly, we just posited that in this hypothetical exercise. Throwing an exception early in the process on a relevant line is exactly what you want.
I am trying to implement Huffman Tree compression. Pretty much how it works is giving < 8-bit codes to the most common characters in text documents and larger codes to the less common characters. Then there is a binary tree encoded that lets you navigate down with 1's telling you to go left and 0's telling you to go right which leads you to the characters.
So obviously there are chunks that aren't 8 bytes long. I have been rounding them off as need be with 0's at the end and converting them to characters. However, I just found that java writes in 3 bytes per characters. Because this is about compression I obviously want one byte.
The problem is that I don't know what bytes are going to end up trying to be written. Three different < 8-bit codes might get mushed together. I need to be able to write any code to the text file. There are invalid byte sequences however and so my entire approach is all gummed up.
Is there any way that I can let any byte sequence be valid in a certain section of the file and just store it as it literally is and not worry about a character ending the file prematurely or causing another mischief? I am coding on a Mac so that is a problem unlike in windows where they just have the length of the file at the beginning so that they don't need an end of file character.
If there is not a direct solution here then perhaps I could make my own encoding that would not exit the file and nest that inside a more common one?
This looks like a good use case for Java's Bitset: https://docs.oracle.com/javase/8/docs/api/java/util/BitSet.html
When writing out the data to a file, you should output the number of values which were encoded and afterwards you only need the serialized stream of bits.
I have searched some time about this matter and didn't find proper answer anywhere.
Let's say I have a string:
"The quick brown fox jumps over the lazy dog"
I need to find unique words in this string and their byte positions and also byte distance between same words.
Ok I can manage to find words, but what is their byte position and any ideas to track distance in bytes? Is for example: 5 is the position of string quick and converted to bytes?
I hope this doesn't sound too stupid (I am fairly new to Java).
Finding unique words should be fairly easy; split on whitespace, add strings to a Set, and whatever's in the Set at the end of the method will be the unique words in the file. this can be made arbitrary complex though, depending on what defines a unique word, and if characters other than whitespace separate words.
The byte position/distance question is a bit harder. If memory serves, String objects in Java are wrappers around char[] objects, and chars are 16-bit unicode characters in Java (http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html).
So I'm guessing byte distance is just a linear function of the character position?
If you're working with other encodings though the getBytes() method might be useful.
http://docs.oracle.com/javase/tutorial/i18n/text/string.html
So for something like that, a naive solution would be to determine the number of bytes for each character, which would allow for really easy calculation of byte positions/distances, but determining that probably isn't that efficient. It should, however, yield correct results if done correctly.
Positions are counted from 0, not 1. So "quick" would have character position 5, which for US-ASCII is also the byte position. Maybe character positions suffice.
String s = "The quick brown fox jumps over the lazy dog";
int charsIndex = s.indexOf("quick"); // 4
int charsLength = "The ".length(); // 4
int bytesLength = "The ".getBytes("UTF-8").length; // 4
char ch = s.charAt(4); // 'q'
int c = s.codePointAt(4); // (int) 'q'
In Java text (String) is always in Unicode, hence all chars are possible and combinable.
Bytes (byte[]) are in some encoding and may vary per encoding.
I'm now working on a challenge from website http://www.net-force.nl/challenges/ and I stand before an interesting problem I can't solve. I'm not asking for the whole result (as it would be breaking the rules), but I need help with the programming theory of hash function.
Basically, it's based on Java applet with one textfield, where user has to enter the right password. When I decompile the .class file, one of the methods I get is this hash method.
string s contains entered password, immediately given to the method:
private int hash(string s)
{
int i = 0;
for(int j = 0; j < s.length(); j++)
i += s.charAt(j);
return i;
}
The problem is that the method returns integer as the "hash", but how can characters be converted to integer at all? I got an idea that maybe the password is a number, but it doesn't lead anywhere at all. Another idea talks about ASCII, but still nothing.
Thanks for any help or tips.
The trick is that it's converting each character into an integer. Each character (char) in Java is a UTF-16 code unit. For the most part1, you can just think of that as each character is mapped to a number between 0 and 65535 inclusive, in a scheme called Unicode. For example, 65 is the number for 'A', and if you'd typed in the Euro symbol, that would map to Unicode U+20AC (8364).
Your hashing function basically adds together the numbers for each character in the string. It's a very poor hash (in particular it gives the same results for the same characters regardless of ordering), but hopefully you'll get the idea.
1 Things get trickier when you need to bear in mind surrogate pairs, where a single Unicode character is actually made up of two UTF-16 code units - that's for characters with a Unicode number of more than 65535. Let's stick to the basics for the moment though :)
The hash function you present is the simplest hashing function you could possibly right for a string.
It is easy to implement and really fast in its computation.
It is problematic though since it doesn't distributes the input well.
Assuming ASCII chars the hash can take values from 0 to 1016 since an ASCII char is between 0 - 127.
I.e. each character in the string is "treated" as its ASCII equivalent (For more advance analysis check #John's answer).
Anyway you should note that strings containing the same characters but in different order map to the same hash value with this function.Perhaps this is of interest to you in the challenge you are trying to attack (??)
Q: When casting an int to a char in Java, it seems that the default result is the ASCII character corresponding to that int value. My question is, is there some way to specify a different character set to be used when casting?
(Background info: I'm working on a project in which I read in a string of binary characters, convert it into chunks, and convert the chunks into their values in decimal, ints, which I then cast as chars. I then need to be able to "expand" the resulting compressed characters back to binary by reversing the process.
I have been able to do this, but currently I have only been able to compress up to 6 "bits" into a single character, because when I allow for larger amounts, there are some values in the range which do not seem to be handled well by ASCII; they become boxes or question marks and when they are cast back into an int, their original value has not been preserved. If I could use another character set, I imagine I could avoid this problem and compress the binary by 8 bits at a time, which is my goal.)
I hope this was clear, and thanks in advance!
Your problem has nothing to do with ASCII or character sets.
In Java, a char is just a 16-bit integer. When casting ints (which are 32-bit integers) to chars, the only thing you are doing is keeping the 16 least significant bits of the int, and discarding the upper 16 bits. This is called a narrowing conversion.
References:
http://java.sun.com/docs/books/jls/second_edition/html/conversions.doc.html#20232
http://java.sun.com/docs/books/jls/second_edition/html/conversions.doc.html#25363
The conversion between characters and integers uses the Unicode values, of which ASCII is a subset. If you are handling binary data you should avoid characters and strings and instead use an integer array - note that Java doesn't have unsigned 8-bit integers.
What you search for in not a cast, it's a conversion.
There is a String constructor that takes an array of byte and a charset encoding. This should help you.
I'm working on a project in which I
read in a string of binary characters,
convert it into chunks, and convert
the chunks into their values in
decimal, ints, which I then cast as
chars. I then need to be able to
"expand" the resulting compressed
characters back to binary by reversing
the process.
You don't mention why you are doing that, and (to be honest) it's a little hard to follow what you're trying to describe (for one thing, I don't see why the resulting characters would be "compressed" in any way.
If you just want to represent binary data as text, there are plenty of standard ways of accomplishing that. But it sounds like you may be after something else?