I have a string in Radix64 characters:
HR5nYD8xGrw
and I need to be able to perform bitwise operations on the bits in this string, but preserve the Radix64 encoding. For example, if I do a left shift, have it drop the overflow bit, and stay inside the character set of Radix64, not turn into some random ASCII character. Aside from manually converting them to binary and writing my own versions of all of the operators I would need, is there a way to do this?
You just convert them to plain numbers, apply the shift to them and convert back to "base64".
It's not different to applying bit operators to numbers written in base 10, you don't use the string, you use the number corrresponding to the string, and then print it back to a string.
9 << 1 == 18
but "9" and "18" are not really related as strings...
Related
I am trying to convert a string like "password" to hex values, then have it inside a long array, the loop working fine till reaching the value "6F" (hex value for o char) then I have an exception java.lang.NumberFormatException
String password = "password";
char array[] = password.toCharArray();
int index = 0;
for (char c : array) {
String hex = (Integer.toHexString((int) c));
data[index] = Long.parseLong(hex);
index++;
}
how can I store the 6F values inside Byte array, as the 6F is greater than 1 byte ?. Please help me on this
Long.parseLong parses decimal numbers. It turns the string "10" into the number 10. If the input is hex, that is incorrect - the string "10" is supposed to be turned into the number 16. The fix is to use the Long.parseLong(String input, int radix) method. the radix you want is 16, though writing that as 0x10 may be more readable - it's the same thing to the compiler, purely a personal style choice. Thus, Long.parseLong(hex, 0x10) is what you want.
Note that in practice char has numbers that go from 0 to 65535, which doesn't fit in bytes. In effect, you must put a marker down that passwords must not contain any characters that aren't ASCII characters (so no umlauts, snowmen, emoji, funny quotes, etc).
If you fail to check this, Integer.toHexString((int) c) will turn into something like 16F or worse (3 to 4 characters), and it may also turn into a single character.
More generally, converting from char c to a hex string, and then parse the hex string into a number, is completely pointless. It's turning 15 into "F" and then turning "F" into 15. If you just want to shove a char into a byte: data[index++] = (byte) c; is all you need - that is the only line you need in your for loop.
But, heed this:
This really isn't how you're supposed to do that!
What you're doing is converting character data to a byte array. This is not actually simple - there are only 256 possible bytes, and there are way more characters that folks have invented. Literally hundreds of thousands of them.
Thus, to convert characters to bytes or vice versa, you must apply an encoding. Encodings have wildly varying properties. The most commonly used encoding, however, is 'UTF-8'. It represent every unicode symbol, and has the interesting property that basic ASCII characters look the exact same. However, it has the downside that any given character is smeared out into 1, 2, 3, or even 4 bytes, depending on what character it is. Fortunately, java has plenty of tools for this, thus, you don't need to care. What you really want, is this:
byte[] data = password.getBytes(StandardCharsets.UTF8);
That's asking the string to turn itself into a byte array, using UTF8 encoding. That means "password" turns into the sequence '112 97 115 115 119 111 114 100' which is no doubt what you want, but you can also have as password, say, außgescheignet ☃, and that works too - it's turned into bytes, and you can get back to your snowman enabled password:
String in = "außgescheignet ☃";
byte[] data = in.getBytes(StandardCharsets.UTF8);
String andBackAgain = new String(data, StandardCharsets.UTF8);
assert in.equals(andBackAgain); // true
if you stick this in a source file, make sure you save it in whatever text editor you use to do this as UTF8, and that javac compiles it that way too (javac has an -encoding parameter to enforce this).
If you think this is going to cause issues on whatever you send this to, and you want to restrict it to what someone with a rather USA-centric view would call 'normal' characters, then you want the exact same code as showcased here, but use StandardCharsets.ASCII instead. Then, that line (password.getBytes(StandardCharsets.ASCII)) will flat out error if it includes non-ASCII characters. That's a good thing: Your infrastructure would not deal with it correctly, we just posited that in this hypothetical exercise. Throwing an exception early in the process on a relevant line is exactly what you want.
Recently I found some negative bytes hidden in a Java string in my code which was causing a .equals String comparison to fail.
What is the significance of negative byte values in Strings? Can they mean anything? It there any situation in which a negative byte value in a String could be interpreted as anything? I'm noob at this encoding business so if it requires explanation into different encoding schemes please feel free.
A Java string contains characters, BUT you can interpret them in different ways. If each character is a byte, then it can range from 0-255, inclusive. That's 8 bits.
Now, the leftmost bit can be interpreted as a sign bit or as part of the magnitude of the character. If that bit is interpreted as a sign bit then you will have data items ranging from -128 to +127, inclusive.
You didn't post the code you used to print the characters but if you used logic that interpreted the characters as signed data items then you will get negative numbers in out output.
I know that ASCII codes are between 0-127 in decimal and 0000 0000 to 0111 1111 in binary, and that values between 128-255 are extended ASCII.
I also know that int accepts 9 digits(which I was wrong the range int is between(-2,147,483,648 to 2,147,483,647)), so if we cast every number between (0-MaxintRange) to a char, there will be many many symbols; for example:
(char)999,999,999 gives 짿 which is a Korean symbol (I don't know what it even means; Google Translate can't find any meaning!).
The same thing happens with values between minintrange to 0.
It doesn't make sense that those symbols were input one by one.
I don't understand - how could they assign those big numbers to have its own character?
I don't understand how they assign those big numbers to have it's own symbol?
The assignments are made by the Unicode consortium. See http://unicode.org for details.
In your particular case however you are doing something completely nonsensical. You have the integer 999999999 which in hex is 0x3B9AC9FF. You then cast that to char, which discards the top four bytes and gives you 0xC9FF. If you then look that up at Unicode.org: http://www.unicode.org/cgi-bin/Code2Chart.pl and discover that yes, it is a Korean character.
Unicode code points can in fact be quite large; there are over a million of them. But you can't get to them just by casting. To get to Unicode code points that are outside of the "normal" range using UTF-16 (as C# does), you need to use two characters. See http://en.wikipedia.org/wiki/UTF-16, the section on surrogate pairs.
To address some of the other concerns in your question:
I know that ACCII codes are between (0-127) in decimal and (0000 0000 to 0000 1111) in binary.
That's ASCII, not ACCII, and 127 in binary is 01111111, not 00001111
Also we know that int accepts 9 digits, so if we cast every number between
The range of an int is larger than that.
don't know what it mean even Google translate can't find any meaning
Korean is not like Chinese, where each glyph represents a word. Those are letters. They don't have a meaning unless they happen to accidentally form a word. You'd have about as much luck googling randomly chosen English letters and trying to find their meaning; maybe sometimes you'd choose CAT at random, but most of the time you'd choose XRO or some such thing that is not a word.
Read this if you want to understand how the Korean alphabet works: http://en.wikipedia.org/wiki/Hangul
Reworded question as it seems I wasn't specific enough;
Given a RSA system with p = 263, q = 587, public key e = 683 and private key d = 81599. Therefore n = pq = 154381. For a message, say "I AM A STUDENT", the encryption is conducted as follows:
Convert any letter (including blank space) into a 3-digit ASCII code, i.e. 073 032 065 077 032 065 032 083 084 085 068 069 078 084.
Join every two adjacent ASCII codes to form a block, i.e. 073032 065077 032065 032083 084085 068069 078084. (use 000 if last letter has nothing to join with).
Using the encryption algorithm c = me mod n to encrypt every block; c1 = 73032683 mod 154381 = 103300, etc.
Assume you are the receiver of a message: 33815872282353670979238213794429016637939017111351. What is the content?
After a bit more consideration, I'm thinking that since I have to decode in parts, i.e. decode 33815 then 87228, etc., etc. That I should just split the decoded part in half, and check if each half is in the ascii range, if not, go back to the original and split it differently. Does this sound like a better solution than trying to hack something out with regex?
P.S. The decoding is considered homework, I have done this by hand and know that the message decodes to "i hate cryptography" (it seems my lecturer has a sense of humor), so you're not helping me do my homework. Turning this into a program is just something extra curricular that I thought might be fun/interesting.
It is generally an incredibly bad idea to have variable length records without a delimeter or index. In this case, the best approach is having a fixed width integer, with leading zeros.
That said, you do actually have an implicit delimiter, assuming you're always reading from start to end of the string without skipping at all. If you take 0 or 1 to indicate that it is a 3 digit number, and 2-9 to indicate a 2 digit number. Something like this would work:
[01][0-9][0-9]|[2-9][0-9]
But really - just print your numbers into the string with leading zeros. Or look into 2 character hexadecimal encoding if you're worried about space. Or Base 64, or one of the other printable encodings.
Q: When casting an int to a char in Java, it seems that the default result is the ASCII character corresponding to that int value. My question is, is there some way to specify a different character set to be used when casting?
(Background info: I'm working on a project in which I read in a string of binary characters, convert it into chunks, and convert the chunks into their values in decimal, ints, which I then cast as chars. I then need to be able to "expand" the resulting compressed characters back to binary by reversing the process.
I have been able to do this, but currently I have only been able to compress up to 6 "bits" into a single character, because when I allow for larger amounts, there are some values in the range which do not seem to be handled well by ASCII; they become boxes or question marks and when they are cast back into an int, their original value has not been preserved. If I could use another character set, I imagine I could avoid this problem and compress the binary by 8 bits at a time, which is my goal.)
I hope this was clear, and thanks in advance!
Your problem has nothing to do with ASCII or character sets.
In Java, a char is just a 16-bit integer. When casting ints (which are 32-bit integers) to chars, the only thing you are doing is keeping the 16 least significant bits of the int, and discarding the upper 16 bits. This is called a narrowing conversion.
References:
http://java.sun.com/docs/books/jls/second_edition/html/conversions.doc.html#20232
http://java.sun.com/docs/books/jls/second_edition/html/conversions.doc.html#25363
The conversion between characters and integers uses the Unicode values, of which ASCII is a subset. If you are handling binary data you should avoid characters and strings and instead use an integer array - note that Java doesn't have unsigned 8-bit integers.
What you search for in not a cast, it's a conversion.
There is a String constructor that takes an array of byte and a charset encoding. This should help you.
I'm working on a project in which I
read in a string of binary characters,
convert it into chunks, and convert
the chunks into their values in
decimal, ints, which I then cast as
chars. I then need to be able to
"expand" the resulting compressed
characters back to binary by reversing
the process.
You don't mention why you are doing that, and (to be honest) it's a little hard to follow what you're trying to describe (for one thing, I don't see why the resulting characters would be "compressed" in any way.
If you just want to represent binary data as text, there are plenty of standard ways of accomplishing that. But it sounds like you may be after something else?