Adding two unicode chars in java - java

I'm stuck in my program that emulates a 8bit processor. I have an array of characters in Unicode(hex) that I call memory. I'm trying to pass my operand of type int a two byte word which will be two unicode values in my array. Lets say myArray[1] has '\u00BF' and myArray[2] has '\u00FF' so I want to pass '\uBFFF' to my operand. I was thinking of parsing the char to its bytes value into a string, concatenating the two, and then passing it to an int. Any help would be GREATLY appreciated!

Related

java unicode value of char

When I do Collection.sort(List), it will sort based on String's compareTo() logic,where it compares both the strings char by char.
List<String> file1 = new ArrayList<String>();
file1.add("1,7,zz");
file1.add("11,2,xx");
file1.add("331,5,yy");
Collections.sort(file1);
My understanding is char means it specifies the unicode value, I want to know the unicode values of char like ,(comma) etc. How can I do it? Any url contains the numeric value of these?
My understanding is char means it specifies the unicode value, I want to know the unicode values of char like ,(comma) etc
Well there's an implicit conversion from char to int, which you can easily print out:
int value = ',';
System.out.println(value); // Prints 44
This is the UTF-16 code unit for the char. (As fge notes, a char in Java is a UTF-16 code unit, not a Unicode character. There are Unicode code points greater than 65535, which are represented as two UTF-16 code units.)
Any url contains the numeric value of these?
Yes - for more information about Unicode, go to the Unicode web site.
Uhm no, char is not a "unicode value" (and the word to use is Unicode code point).
A char is a code unit in the UTF-16 encoding. And it so happens that in Unicode's Basic Multilingual Plane (ie, Unicode code points ranging from U+0000 to U+FFFF, for code points defined in this range), yes, there is a 1-to-1 mapping between char and Unicode.
In order to know the numeric value of a code point you can just do:
System.out.println((int) myString.charAt(0));
But this IS NOT THE CASE for code points outside the BMP. For these, one code point translates to two chars. See Character.toChars(). And more generally, all static methods in Character relating to code points. There are quite a few!
This also means that String's .length() is actually misleading, since it returns the number of chars, not the number of graphemes.
Demonstration with one Unicode emoticon (the first in that page):
System.out.println(new String(Character.toChars(0x1f600)).length())
prints 2. Whereas:
final String s = new String(Character.toChars(0x1f600));
System.out.println(s.codePointCount(0, s.length());
prints 1.

String's CharAt method using long

I want to find the character at a particular position of a very large string. However i am unable to use charAt() method because the range exceeds that of int. Is there a tweak to this?
In Java, Strings are backed by a character array. The theoretical size of an array is limited by the maximum value of int, so it's impossible to have a String with over 231-1 characters to begin with.
To overcome this issue you can create a string class of your own that uses multiple arrays or strings as storage.
Would taking just a shorter substring from the large string and accessing the corresponding position help?
As the String is internally represented by array of chars its maximal length cannot be bigger than size of int. So in the first place you cannot have String that exceeds range of int.

I am having trouble creating a 16bit char in java

How can I create a variable character that can hold a four byte value?
I am trying to write an program to encrypt messages in java, for fun. I figured out how to use RSA, and managed to write a program that will encrypt a message and save it to a .txt file.
For example if "Quiet" is entered the outcome will be "041891090280". I wrote my code so that the number would always have length that is a multiple of six. So I thought that I could convert the numbers into a hash code. The first three letters are "041" so I could convert that into ")".
However I am having trouble created a char with a number greater than 255. I have looked around online and found a few examples, but I can't figure out how to implement them. I created a new method just to test them.
int a = 256;
char b = (char) a;
char c = 0xD836;
char[] cc = Character.toChars(0x1D50A);
System.out.println(b);
System.out.println(c);
System.out.println(cc);
The program outputs
?
?
?
I am only getting two bytes. I read that Java uses Unicode which should go up to 65535 which is four bytes. I am using eclipse if that makes a difference.
I apologize for the noob question.
And thanks in advance.
edit
I am sorry, I think I gave too much information and ended up being confusion.
What I want to do is store a string of numbers as a string of unicode characters. the only way I know how to do that is to break up the number string small enough to fit it into a character. then add the characters one by one to a new string. But I don't know how to add a variable unicode character to a string.
All chars are 16-bit already. 0 to 65535 only need 16-bit and 2^16 = 65536.
Note: not all characters are valid and in particular, 0xD800 to 0xDFFF are used for encoding code points (characters beyond 65536)
If you want to be able to store all possible 16-bit values I suggest you use short instead. You can store the same values but it may be less confusing to use.

Retrieving 1D arrays from NetCDF char array variables in HDF5 files using Java

Using the Java API for NetCDF, I have an HDF5 file with an array of type CHAR, which according to the documentation is similar to strings containing only ASCII characters: "The char type contains uninterpreted characters, one character per byte. Typically these contain 7-bit ASCII characters." In HDFView, an example of one of thee values in the array is "13".
I know that for an array of integers I can get them all as a Java array like this:
int[] data = (int[]) netCDFArray.get1DJavaArray(int.class);
But how do I get back an array of this CHAR type? Unfortunately the documentation I referenced is of no help.
The following cannot be correct, because some of the items are more than single characters:
char[] data = (char[]) netCDFArray.get1DJavaArray(char.class);
The following attempts all throw a ForbiddenConversionException:
char[] data = (char[]) netCDFArray.get1DJavaArray(char.class);
char[][] data = (char[][]) netCDFArray.get1DJavaArray(char[].class);
String[] data = (String[]) netCDFArray.get1DJavaArray(String.class);
If I use netCDFArray.toString() I see my array of strings, because ArrayChar uses a StringIterator. I too could use such an iterator and do something with each string, I suppose. But I don't need to get an int iterator to retrieve integers. How can I efficiently retrieve all strings of a CHAR type in one go, analogous to how I can retrieve integers (see above)? I would be content with retrieving a Java Strings[], CharSequence[], or char[][].
It seems that a NetCDF string of type CHAR is represented logically as a two-dimensional array of type char, but internally it is stored as a single array of type char. Therefore, the most efficient way to retrieve the data is the following:
char[] data = (char[]) netCDFArray.get1DJavaArray(char.class);
One must then extract the individual strings from this single array:
assert netCDFArray.getRank()==2 : "Expected a two-dimensional logical array of chars.";
int stringLength=netCDFArray.getShape()[1];
int stringCount=netCDFArray.getShape()[0];
//iterate through stringCount positions of stringLength length
The added twist is that apparently strings are stored as zero-terminated, that is, the supposedly fixed-length strings apparently may be variable-length strings of less than stringLength using ASCII 0 as a delimiter. I derived this from the code; I couldn't find it in the documentation.

Casting int to char in java- must I use ASCII?

Q: When casting an int to a char in Java, it seems that the default result is the ASCII character corresponding to that int value. My question is, is there some way to specify a different character set to be used when casting?
(Background info: I'm working on a project in which I read in a string of binary characters, convert it into chunks, and convert the chunks into their values in decimal, ints, which I then cast as chars. I then need to be able to "expand" the resulting compressed characters back to binary by reversing the process.
I have been able to do this, but currently I have only been able to compress up to 6 "bits" into a single character, because when I allow for larger amounts, there are some values in the range which do not seem to be handled well by ASCII; they become boxes or question marks and when they are cast back into an int, their original value has not been preserved. If I could use another character set, I imagine I could avoid this problem and compress the binary by 8 bits at a time, which is my goal.)
I hope this was clear, and thanks in advance!
Your problem has nothing to do with ASCII or character sets.
In Java, a char is just a 16-bit integer. When casting ints (which are 32-bit integers) to chars, the only thing you are doing is keeping the 16 least significant bits of the int, and discarding the upper 16 bits. This is called a narrowing conversion.
References:
http://java.sun.com/docs/books/jls/second_edition/html/conversions.doc.html#20232
http://java.sun.com/docs/books/jls/second_edition/html/conversions.doc.html#25363
The conversion between characters and integers uses the Unicode values, of which ASCII is a subset. If you are handling binary data you should avoid characters and strings and instead use an integer array - note that Java doesn't have unsigned 8-bit integers.
What you search for in not a cast, it's a conversion.
There is a String constructor that takes an array of byte and a charset encoding. This should help you.
I'm working on a project in which I
read in a string of binary characters,
convert it into chunks, and convert
the chunks into their values in
decimal, ints, which I then cast as
chars. I then need to be able to
"expand" the resulting compressed
characters back to binary by reversing
the process.
You don't mention why you are doing that, and (to be honest) it's a little hard to follow what you're trying to describe (for one thing, I don't see why the resulting characters would be "compressed" in any way.
If you just want to represent binary data as text, there are plenty of standard ways of accomplishing that. But it sounds like you may be after something else?

Categories