I know that ASCII codes are between 0-127 in decimal and 0000 0000 to 0111 1111 in binary, and that values between 128-255 are extended ASCII.
I also know that int accepts 9 digits(which I was wrong the range int is between(-2,147,483,648 to 2,147,483,647)), so if we cast every number between (0-MaxintRange) to a char, there will be many many symbols; for example:
(char)999,999,999 gives 짿 which is a Korean symbol (I don't know what it even means; Google Translate can't find any meaning!).
The same thing happens with values between minintrange to 0.
It doesn't make sense that those symbols were input one by one.
I don't understand - how could they assign those big numbers to have its own character?
I don't understand how they assign those big numbers to have it's own symbol?
The assignments are made by the Unicode consortium. See http://unicode.org for details.
In your particular case however you are doing something completely nonsensical. You have the integer 999999999 which in hex is 0x3B9AC9FF. You then cast that to char, which discards the top four bytes and gives you 0xC9FF. If you then look that up at Unicode.org: http://www.unicode.org/cgi-bin/Code2Chart.pl and discover that yes, it is a Korean character.
Unicode code points can in fact be quite large; there are over a million of them. But you can't get to them just by casting. To get to Unicode code points that are outside of the "normal" range using UTF-16 (as C# does), you need to use two characters. See http://en.wikipedia.org/wiki/UTF-16, the section on surrogate pairs.
To address some of the other concerns in your question:
I know that ACCII codes are between (0-127) in decimal and (0000 0000 to 0000 1111) in binary.
That's ASCII, not ACCII, and 127 in binary is 01111111, not 00001111
Also we know that int accepts 9 digits, so if we cast every number between
The range of an int is larger than that.
don't know what it mean even Google translate can't find any meaning
Korean is not like Chinese, where each glyph represents a word. Those are letters. They don't have a meaning unless they happen to accidentally form a word. You'd have about as much luck googling randomly chosen English letters and trying to find their meaning; maybe sometimes you'd choose CAT at random, but most of the time you'd choose XRO or some such thing that is not a word.
Read this if you want to understand how the Korean alphabet works: http://en.wikipedia.org/wiki/Hangul
Related
In Java, for Double, we have a value for NaN (Not A Number).
Now, for Character, do we have a similar equivalent for "Not A Character"?
If the answer is no, then I think a safe substitute may be Character.MIN_VALUE (which is of type char and has value \u0000). Do you think this substitute is safe enough? Or do you have another suggestion?
In mathematics, there is a concept of "not a number" - 5 divided by 0 is not a number. Since this concept exists, there is NaN for the double type.
Characters are an abstract concept of mapping numbers to characters. The idea of "not a character" doesn't really exist, since the charset in use can vary (UTF-8, UTF-16, etc.).
Think of it this way. If I ask you, "what is 5 divided by 0?", you would say it's "not a number". But, we do have a defined way to represent the value, even though it's not a number. If I draw a random squiggle and ask you, "what letter is this?", you would say "it's not a letter". But, we don't have a way to actually represent that squiggle outside of what I just drew. There's no real way to communicate the "non-character" I've just drawn, but there is a way to communicate the "non-number" of 5 divided by 0.
\u0000 is the null character, which is still a character. What exactly are you trying to achieve? Depending on your goal \u0000 may suffice.
The "not-a-number" concept does not really belong to Java; rather, Java defines double as being IEEE 754 double precision floating-point numbers, which have that concept. (That said, if I recall correctly, Java does specify some details about NaN that IEEE 754 leaves open to implementations.)
The analogous standard for Java char is Unicode: Java defines char as being UTF-16 code units.
Unicode does have various reserved-undefined characters that you could use; for example, U+FFFF ('\uFFFF') will never be a character. Alternatively, you could use U+FFFD ('\uFFFD'), which is a character, but is specifically the "replacement character" suitable for replacing garbage or invalid characters.
Depends what you're trying to do. If you're trying to represent the lack of a character you could do
Optional<Character> noCharacter = Optional.empty();
You could check if the character's code is greater than or equal to the value of 'a' and less than or equal to the value of 'Z'. That would qualify as not a character if by not a character, you mean an alphabet letter. You could extend it to symbols like question mark, full stop, comma etc, but if you want to go further than ASCII territory, I think it gets out of hand.
One other approach would be to check if something is a number. If it's not, you could check if it's a white character, then if it's not, everything else qualifies as a character, therefore you get your answer.
It's a long discussion IMO, because answers vary, depending on your view on what's a character.
Recently I found some negative bytes hidden in a Java string in my code which was causing a .equals String comparison to fail.
What is the significance of negative byte values in Strings? Can they mean anything? It there any situation in which a negative byte value in a String could be interpreted as anything? I'm noob at this encoding business so if it requires explanation into different encoding schemes please feel free.
A Java string contains characters, BUT you can interpret them in different ways. If each character is a byte, then it can range from 0-255, inclusive. That's 8 bits.
Now, the leftmost bit can be interpreted as a sign bit or as part of the magnitude of the character. If that bit is interpreted as a sign bit then you will have data items ranging from -128 to +127, inclusive.
You didn't post the code you used to print the characters but if you used logic that interpreted the characters as signed data items then you will get negative numbers in out output.
I have a very simple java class that I am trying to port to Python, but I'm stuck on the line:
int value = Character.getNumericValue(character);
I have found this example, but I don't understand how Character.getNumericValue(character) could return 10 (since the ASCII value is 65 for instance).
Without understanding this line, it's difficult to port it.
Character.getNumericValue() returns 10 for A because it is digit in number systems with bases beyond the decimal system (such as hexadecimal, all the way up to 35-base):
>>> int('A', 36)
10
>>> int('Z', 36)
35
From the Character.getNumericValue() documenation:
The letters A-Z in their uppercase ('\u0041' through '\u005A'), lowercase ('\u0061' through '\u007A'), and full width variant ('\uFF21' through '\uFF3A' and '\uFF41' through '\uFF5A') forms have numeric values from 10 through 35. This is independent of the Unicode specification, which does not assign numeric values to these char values.
There is no direct Python equivalent; Character.getNumericValue() also interprets roman numerals, for example, you'd have to code such mappings yourself.
There is no single ready-made Python function that's equivalent, but I think two functions combined cover the whole range. For ASCII characters 0..9 and A..Z, int() with a proper base argument (36) suffices. For getting the numeric value of a unicode code point, there's unicodedata.numeric, discounting the fact that it returns a float.
Reworded question as it seems I wasn't specific enough;
Given a RSA system with p = 263, q = 587, public key e = 683 and private key d = 81599. Therefore n = pq = 154381. For a message, say "I AM A STUDENT", the encryption is conducted as follows:
Convert any letter (including blank space) into a 3-digit ASCII code, i.e. 073 032 065 077 032 065 032 083 084 085 068 069 078 084.
Join every two adjacent ASCII codes to form a block, i.e. 073032 065077 032065 032083 084085 068069 078084. (use 000 if last letter has nothing to join with).
Using the encryption algorithm c = me mod n to encrypt every block; c1 = 73032683 mod 154381 = 103300, etc.
Assume you are the receiver of a message: 33815872282353670979238213794429016637939017111351. What is the content?
After a bit more consideration, I'm thinking that since I have to decode in parts, i.e. decode 33815 then 87228, etc., etc. That I should just split the decoded part in half, and check if each half is in the ascii range, if not, go back to the original and split it differently. Does this sound like a better solution than trying to hack something out with regex?
P.S. The decoding is considered homework, I have done this by hand and know that the message decodes to "i hate cryptography" (it seems my lecturer has a sense of humor), so you're not helping me do my homework. Turning this into a program is just something extra curricular that I thought might be fun/interesting.
It is generally an incredibly bad idea to have variable length records without a delimeter or index. In this case, the best approach is having a fixed width integer, with leading zeros.
That said, you do actually have an implicit delimiter, assuming you're always reading from start to end of the string without skipping at all. If you take 0 or 1 to indicate that it is a 3 digit number, and 2-9 to indicate a 2 digit number. Something like this would work:
[01][0-9][0-9]|[2-9][0-9]
But really - just print your numbers into the string with leading zeros. Or look into 2 character hexadecimal encoding if you're worried about space. Or Base 64, or one of the other printable encodings.
I'm now working on a challenge from website http://www.net-force.nl/challenges/ and I stand before an interesting problem I can't solve. I'm not asking for the whole result (as it would be breaking the rules), but I need help with the programming theory of hash function.
Basically, it's based on Java applet with one textfield, where user has to enter the right password. When I decompile the .class file, one of the methods I get is this hash method.
string s contains entered password, immediately given to the method:
private int hash(string s)
{
int i = 0;
for(int j = 0; j < s.length(); j++)
i += s.charAt(j);
return i;
}
The problem is that the method returns integer as the "hash", but how can characters be converted to integer at all? I got an idea that maybe the password is a number, but it doesn't lead anywhere at all. Another idea talks about ASCII, but still nothing.
Thanks for any help or tips.
The trick is that it's converting each character into an integer. Each character (char) in Java is a UTF-16 code unit. For the most part1, you can just think of that as each character is mapped to a number between 0 and 65535 inclusive, in a scheme called Unicode. For example, 65 is the number for 'A', and if you'd typed in the Euro symbol, that would map to Unicode U+20AC (8364).
Your hashing function basically adds together the numbers for each character in the string. It's a very poor hash (in particular it gives the same results for the same characters regardless of ordering), but hopefully you'll get the idea.
1 Things get trickier when you need to bear in mind surrogate pairs, where a single Unicode character is actually made up of two UTF-16 code units - that's for characters with a Unicode number of more than 65535. Let's stick to the basics for the moment though :)
The hash function you present is the simplest hashing function you could possibly right for a string.
It is easy to implement and really fast in its computation.
It is problematic though since it doesn't distributes the input well.
Assuming ASCII chars the hash can take values from 0 to 1016 since an ASCII char is between 0 - 127.
I.e. each character in the string is "treated" as its ASCII equivalent (For more advance analysis check #John's answer).
Anyway you should note that strings containing the same characters but in different order map to the same hash value with this function.Perhaps this is of interest to you in the challenge you are trying to attack (??)