This question already has answers here:
Creating Unicode character from its number
(13 answers)
Closed 13 days ago.
I have to read a string from a file and display the corresponding unicode representation in a Text field on my application.
For example I read the string "e13a" from the file and i'd like to display the corresponding "\ue13a" character in the Text field.
Is there a way to obtain the desired behaviour?
I already tried escaping the string directly in the file but I always obtain the raw string instead of the unicode representation
tl;dr
Character.toString( Integer.parseInt( "e13a" , 16 ) )
See this code run at Ideone.com.
Code point
Parse your input string as a hexadecimal number, base 16. Convert to a decimal number, base 10.
That number represents a code point, the number permanently assigned to each of the over 144,000 characters defined in Unicode. Code points range from zero to just over one million, with most of that range unassigned.
String input = "e13a" ;
int codePoint = Integer.parseInt( input , 16 ) ;
Instantiate a String object whose content is the character identified by that code point.
String output = Character.toString( codePoint ) ;
Avoid char
The char type has been essentially broken since Java 2, and legacy since Java 5. As a 16-bit value, char is physically incapable of representing most characters.
To work with individual characters, use code point integers as seen above.
I posted the question after a lot of trying and searching.
Shortly after posting I found a more trivial solution than I expected:
The converted string is:
String converted = String.valueOf((char) Integer.parseInt(unicodeString,16));
where "unicodeString" is the string I read from the file.
Related
This question already has answers here:
Does java define string as null terminated?
(3 answers)
Closed 3 years ago.
I have searched but I found mixed answers. One site says string is terminated with a special character ‘\0’.
What is the purpose of the \0 special character, and is the source correct?
Source 1
Source 2
The \0 special character is part of a set of special characters used for octal values (link). If you try foo\377bar it gives you fooÿbar. The \0 in this case converts to 0o0 in octal, when this is converted to ASCII , reads as no string at all. If you have foo\377bar, the \377 is converted from Octal to ASCII and is read as fooÿbar. Read here.
This suggests in Java \0 is not used to null terminate a string, and further research shows Java doesn't use anything to null terminate strings, as the length of the string is enough to tell it that it has reached the end of the string.
This can be proven by running the following:
String test = "foo\0bar";
char[] list = test.toCharArray();
for (char ch : list) System.out.println(ch);
System.out.println(test);
EDIT: Java actually uses Unicode. This does not change the effect described above however, and \0nn can even be used to declare variable names, because of regular expressions in the lexical analysis stage of the compiler. This source goes through special characters, and explains how they can be used in Java. \0 is the same as \0n where n is also 0, and is used as a null character. Kind of equivalent to Epsilon in RE, which is used to represent a string that is null.
I've recently started learning JAVA. I am currently working on a few daily exercises, related to stdout and formatting.
Exercise:
Each line of output must have 2 columns
1st column: string left justified to 15 characters
2nd column: integer expressed in three digits
I finally figured out a way and came up with:
System.out.format("%-15s %03d %n",s1 , x );
But when I ran the program there was an error.
Later when I removed the spaces between specifiers i.e.
System.out.format("%-15s%03d%n",s1 , x );
The code seemed to work.
Just wanted to know what difference does it make when space is not added.
The added space(s) within the format string would be literal whitespaces. This means that you will get your 15 spaces from the beginning of the supplied string plus one whitespace which therefore makes a total spacing of 16. The same applies to the numerical value added to the output. The space after %03d would add an additional whitespace after the numerical value. Knowing this you can apply it to an example:
String s1 = "My Name";
int x = 61;
System.out.format("%-15s %03d %n", s1, x ); // Spaces in format.
System.out.format("%-15s%03d%n", s1, x ); // No Spaces in format.
The output will be:
My Name 061
My Name 061
Whereas:
My Name 061
^^^^^^^^^^^^^^^^^^^^
| 16 ||4 |
My Name 061
^^^^^^^^^^^^^^^^^^
| 15 ||3|
This format (either way) will not generate an error. What will generate and error is if any one or both of the supplied variables to the format() method are of the wrong Data Type. Is the variable s1 declared as a String and is the variable x declared as an decimal integer (int)? If not then you can end up with a IllegalFormatConversionException. See Here for proper Converter Tags to use with the format() method.
This question already has answers here:
What's the default value of char?
(14 answers)
Closed 4 years ago.
I have found that instance variable "char" default value is "u0000" (unicode of null). But when I tried with the piece of code in below, I could only see an empty print line. Please give me clarification.
public class Basics {
char c;
int x;
public static void main(String[] args) {
Basics s = new Basics();
System.out.println(s.c);
System.out.println(s.x);
}
}
Console output as follow:
(empty line)
0
'\u0000' (char c = 0;) is a Unicode control character. You are not supposed to see it.
System.out.println(Character.isISOControl(s.c) ? "<control>" : s.c);
Try
System.out.println((int) s.c);
if you want to see the numeric value of the default char (which is 0).
Otherwise, it just prints a blank (not an empty line).
You can see that it's not an empty line if you add visible characters before an after s.c:
System.out.print ("--->");
System.out.print (s.c);
System.out.println ("<---");
will print:
---> <---
Could you please provide me more information about why does unicode is selected as default value for char data type? is there any specific reason behind this?
It was recognized that language that was to become Java needed to support multilingual character sets by default. At that time Unicode was the new standard way of doing it1. When Java first adopted Unicode, Unicode used 16 bit codes exclusively. That caused the Java designers to specify char as an unsigned 16 bit integral type. Unfortunately, Unicode rapidly expanded beyond a 16 bits, and Java had to adapt ... by switching to UTF-16 as Java's native in-memory text encoding scheme.
For more background:
Why Java char uses UTF-16?
Why does Java use UTF-16 for the internal text representation
But note that:
In the latest version of Java, you have the option enabling a more compact representation for text data.
The width of char is so hard-wired that it would be impossible to change. In fact, if you want to represent a Unicode code point, you should use an int rather than a char.
1 - It is still the standard way. AFAIK there are no credible alternatives to Unicode at this time.
The specific reason that \u0000 was chosen as the default initial value for char, is because it is zero. Objects are default initialized by writing all zero bytes to all fields irrespective of their types. This maps to zero for integral types and floating point types, false for boolean, and null for reference types.
It so happens that the \u0000 character maps to the ASCII NUL control character which is a non-printing character.
This question already has answers here:
How does Java 16 bit chars support Unicode?
(3 answers)
Closed 8 years ago.
Someone asked a similar question. But I didnt really get the answer.
when I say
char myChar = 'k' in java its going to reserve 16 bits for it (according to java docs below?
http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
Now lets say I have a unicode character '電' and assume that its code point is something like U+FFFF1. This code point could not be stored in 2 bytes and so would java allocate extra bytes (UTF16 based string) for it?
In short when I have something like this -
char myChar = '電'
Assuming that its code point representation is long and will require more than 2 bytes.
How many bits will myChar have - 16 or 32
Thanks
Jave uses UTF-16, and yes every Java char is 16-bits. From the Java Tutorial - Primitive Data Types,
char: The char data type is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive).
Further, the Character Javadoc says (in part),
The methods that only accept a char value cannot support supplementary characters. They treat char values from the surrogate ranges as undefined characters. For example, Character.isLetter('\uD840') returns false, even though this specific value if followed by any low-surrogate value in a string would represent a letter.
The methods that accept an int value support all Unicode characters, including supplementary characters. For example, Character.isLetter(0x2F81A) returns true because the code point value represents a letter (a CJK ideograph).
So, supplementary characters (like your second example) aren't represented as a single 16-bit character.
This question already has answers here:
How many characters can a Java String have?
(7 answers)
Size of Initialisation string in java
(3 answers)
Closed 9 years ago.
I am trying to solve a CodeChef problem in Java and found out that I could not create a String with length > one million chars (with my compiler at least). I pasted the first one million decimal digits of Pi in a string (e.g. String PI = "3.1415926535...151") in the Java file and it fails to compile. When I take out the Pi and replace it with a shorter string like "dog", the code compiles. Can anyone confirm if this is indeed a limitation of Java?
Thanks.
Can anyone confirm if this is indeed a limitation of Java?
Yes. There is an implementation limit of 65535 on the length of a string literal1. It is not stated in the JLS, but it is implied by the structure of the class file; see JVM Spec 4.4.7 and note that the string length field is 'u2' ... which means a 16 bit unsigned integer.
Note that a String object can have up to 2^31 - 1 characters. The 2^16 -1 limit is actually for string-valued constant expressions; e.g. string literals or concatenations of literals that are embedded in the source code of a Java program.
If you want to a String that represents the first million digits of Pi, then it would be better to read the characters from a file in the filesystem, or a resource on the classpath.
1 - This limit is actually on the number of bytes in the (modified) UTF-8 representation of the String. If the string consists of characters in the range 0x01 to 0x7f, then each byte represents a single character. Otherwise, a character can require up to 6 bytes.
I think this problem is not related to string literals but to method size: http://chrononsystems.com/blog/method-size-limit-in-java. According to that, the size of the method can not exceed 64k.