So I'm making a basic GUI with the NetBeans IDE (in Java), and I want to make a button with a √ sign in it. It didn't let me copypaste it in, so I tried using its ASCII code - char sqrt = (char) 251. Instead of the square root sign, however, it gave me "û", and I have no idea why. Can someone please explain why this is happening, as well as offer an idea as to how I should go about this?
Java characters are Unicode, not ASCII. Unicode codepoint 251 (U+00FB) is "Latin Small Letter U with Circumflex". To make input of various Unicode characters using a character set with only the basic ASCII symbols, Java provides a way to input Unicode characters using a literal format. So, you can do this:
char sqrt = '\u221a';
since U+221A is the Unicode codepoint for the square root symbol.
This \uXXXX format can also be used in String literals:
String s = "The square root of 2 (\u221a2) is approximately 1.4142";
If you print that String, you will see
The square root of 2 (√2) is 1.4142
Java uses Unicode, and the Unicode value for '√' is 8730. So, this should do it:
char sqrt = 8730;
Related
In the IntelliJ 2019.2 code editor, if I paste this entire line:
String abc = "A🥝Z" ;
The KIWIFRUIT character stays intact. Good.
But if I start with this line:
String abc = "AZ"; // Before pasting
…and paste the single character 🥝 in between the A and the Z of the string, the KIWIFRUIT is replaced with the UTF-16 representation of the character as a pair of high surrogate and low surrogate numbers: \uD83E\uDD5D.
String abc = "A\uD83E\uDD5DZ" ;
I understand that this pair of hex numbers is just an alternate way to represent the KIWIFRUIT character. But I would rather see the single character in my source code.
I have verified the .java is configured within IntelliJ to be UTF-8. And as I have shown, pasting the entire line results in the green emoji glyph being displayed within the code editor.
➥ Is there some way to turn off IntelliJ’s converting of high-numbered Unicode characters into a hex string?
Use Edit | Paste as Plain Text (Paste without Formatting or Past Simple if you are using an older IDE version)
Mac: Alt+Shift+Cmd+V
PC: Alt+Shift+Ctrl+V
The string that I want to convert into character array is ষ্টোর it is in Unicode and a Bengali word.
The problem is when I am converting it in Visual studio then it is returning 6 characters but when I am converting it in Android Studio then it is showing 5 characters.
In VS I am using char[] arrayOfChars = someString.ToCharArray(); and in
Android Studio char[] arrayOfChars = someString.toCharArray();
N:B: My Android Studio IDE and Project Encoding is UTF-8. I am expecting same result as Visual Studio in Android Studio.
Those two arrays are unicode equivalent, but are being represented by different normalization forms. What seems to be happening is that the Java ToCharArray (or string representation) is using one normalization form, while the C# ToCharArray (or string representation) is using another.
This page contains a chart of different normalization forms for Bengali text - the fourth row there describes exactly what you're seeing:
I am only learning about this now, but it seems to me that the motivation for this is so that unicode implementations could remain compatible with pre-existing encodings wherever possible and practical.
For example, one pre-existing encoding may have used a single unicode character, while another pre-existing encoding may have instead used two characters combined. The solution settled on by the unicode folks is thus to support both, at the cost of not having a single "canonical" representation, as you've encountered here.
If you wish for your Java array to be normalized under the "D" normalization form that your C# array seems to be using, it appears that this page provides such a function. You may be looking for something like:
someString = Normalizer.normalize(someString, Normalizer.Form.NFD);
Unicode standard annex 15 is the official document that describes these normalization forms.
You must have entered the string differently.
The text ষ্টোর is Unicode code points 09B7 09CD 099F 09CB 09B0, i.e. 2487 2509 2463 2507 2480, as your C# shows.
The values shown by Java, i.e. 2487 2509 2463 2503 2494 2480, has the 4th character 2507 / 09CB as the two characters 2503 2494 / 09C7 09BE.
Looking them up, they are:
ো ↔ 'BENGALI VOWEL SIGN O' (U+09CB)
vs.
ে ↔ 'BENGALI VOWEL SIGN E' (U+09C7)
া ↔ 'BENGALI VOWEL SIGN AA' (U+09BE)
which combined comes out to the same thing:
ষ্টোর ↔ 09B7 09CD 099F 09CB 09B0
ষ্টোর ↔ 09B7 09CD 099F 09C7 09BE 09B0
They are combining characters, and there are different ways to combine characters to get the same result.
I have a ""binary"" string String s1 = "10011000" and want to print the corresponding Character (Ф) of this Byte, how can I make this?
I have read and tested so many solutions and tutorials...and can't find exactly what I want! Moreover, I think therected is an encoding problem.
For example, this code doesn't work, but why (I have "?" in output, so encoding problem?)?
int j = Integer.parseInt("10011000", 2);
System.out.println(new Character ((char)j));
10011000 is unicode code point 152 which is an extended unicode character which will only appear if its encoding is supported by your console
The character Ф is a Cyrillic capital letter; in Unicode, the hexadecimal value is \u0424. The binary string you are trying to parse is 152 decimal. The binary string for \u0424 is 010000100100 (1060 decimal) and so I would fix that first. And as others noted, until your environment character set supports Unicode output, Java will substitute a "?" character for any character that the current character set doesn't support. See Unicode characters in Eclipse for setting up Eclipse console to Unicode.
You have used wrong code. If you want to see in output Ф you need to change your code into this:
int j = Integer.parseInt("10000100100", 2);
System.out.println((char) j);
I am having some trouble with encoding this string into barcode symbology - Code 128.
Text to encode:
1021448642241082212700794828592311
I am using the universal encoder from idautomation.com:
https://www.bcgen.com/fontencoder/
I get the following output for the encoded text for Code 128:
Í*5LvJ8*r5;ÂoP<[7+.Î
However, in ";Âo" the character between the semi-colon and o (let us call it special A) - is not part of the extended character set used in Code128. (See the Latin Supplements at https://www.fonts2u.com/code-128.font)
Yet the same string shows a valid barcode at
https://www.bcgen.com/linear-barcode-creator.html
How?
If I use the output with the Special A on a webpage with a font face for barcodes, the special A character does not show up as the barcode (and that seems correct since the special A is not part of the character set).
What gives? Please help.
I am using the IDAutomation utility to encode the string to 128c symbology. If you can share code to do the encoding (in Java/Python/C/Perl) that would help too.
There are multiple fonts for Code128 that may use different characters to represent the barcode symbols. Make sure the font and the encoding logic match each other.
I used this one http://www.jtbarton.com/Barcodes/Code128.aspx (there is also sample code how to encode it on the site, but you have to translate it from VB). The font works for all three encodings (A, B and C).
Sorry, this is very late.
When you are dealing with the encoding of code 128, in any subset, it's a good idea to think of that coding in terms of numbers, not characters. At this level, when you have shifts, code-changes, checksums and stuff, intermixed with the data, the whole concept of "character" is lost.
However, this is what is happening:
The semicolon in the output corresponds to "27"
The lowercase o corresponds to "48" and the P to "79"
The "A with Macron" corresponds to your "00" sequence. This is why you should be dealing with numbers, not characters, at this level of encoding.
How would you expect it to show a character with a code of 00 ? That would be a space of NULL, neither of which is particularly visible.
Your software has simply rendered it the best way it can, which is to make the character 'visible' by adding 0x80 to it. If you look at charmap, you will see that code 0x80 is indeed A with macron.
The rest (indeed all) of your encoded string looks correct for a setc-encodation.
HEy all, I have only just started attempting to learn Java and have run into something that is really confusing!
I was typing out an example from the book I am using. It is to demonstrate the
char data type.
The code is as follows :
public class CharDemo
{
public static void main(String [] args)
{
char a = 'A';
char b = (char) (a + 1);
System.out.println(a + b);
System.out.println("a + b is " + a + b);
int x = 75;
char y = (char) x;
char half = '\u00AB';
System.out.println("y is " + y + " and half is " + half);
}
}
The bit that is confusing me is the statement, char half = '\u00AB'. The book states that \u00AB is the code for the symbol '1/2'. As described, when I compile and run the program from cmd the symbol that is produced on this line is in fact a '1/2'.
So everything appears to be working as it should. I decided to play around with the code and try some different unicodes. I googled multiple unicode tables and found none of them to be consistent with the above result.
In every one I found it stated that the code /u00AB was not for '1/2' and was in fact for this:
http://www.fileformat.info/info/unic...r/ab/index.htm
So what character set is Java using, I thought UNicode was supposed to be just that, Uni, only one. I have searched for hours and nowhere can I find a character set that states /u00AB is equal to a 1/2, yet this is what my java compiler interprets it as.
I must be missing something obvious here! Thanks for any help!
It's a well-known problem with console encoding mismatch on Windows platforms.
Java Runtime expects that encoding used by the system console is the same as the system default encoding. However, Windows uses two separate encodings: ANSI code page (system default encoding) and OEM code page (console encoding).
So, when you try to write Unicode character U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK to the console, Java runtime expects that console encoding is the ANSI encoding (that is Windows-1252 in your case), where this Unicode character is represented as 0xAB. However, the actual console encoding is the OEM encoding (CP437 in your case), where 0xAB means ½.
Therefore printing data to Windows console with System.out.println() produces wrong results.
To get correct results you can use System.console().writer().println() instead.
The \u00ab character is not the 1/2 character; see this definitive code page from the Unicode.org website.
What you are seeing is (I think) a consequence of using the System.out PrintStream on a platform where the default character encoding is not UTF-8 or Latin-1. Maybe it is some Windows character set as suggested by #axtavt's answer? (It also has a plausible explanation of why \u00ab is displayed as 1/2 ... and not some "splat" character.)
(In Unicode and Latin-1, \00BD is the codepoint for the 1/2 character.)
0xAB is 1/2 in good old Codepage 437, which is what Windows terminals will use by default, no matter what codepage you actually set.
So, in fact, the char value represents the "«" character to a Java program, and if you render that char in a GUI or run it on a sane operating system, you will get that character. If you want to see proper output in Windows as well, switch your Font settings in CMD away from "Raster Fonts" (click top-left icon, Properties, Font tab). For example, with Lucida Console, I can do this:
C:\Users\Documents>java CharDemo
131
a + b is AB
y is K and half is ½
C:\Users\Documents>chcp 1252
Active code page: 1252
C:\Users\Documents>java CharDemo
131
a + b is AB
y is K and half is «
C:\Users\Documents>chcp 437
Active code page: 437
One thing great about Java is that it is unicode based. That means, you can use characters from writing systems that are not english alphabets (e.g. Chinese or math symbols), not just in data strings, but in function and variable names too.
Here's a example code using unicode characters in class names and variable names.
class 方 {
String 北 = "north";
double π = 3.14159;
}
class UnicodeTest {
public static void main(String[] arg) {
方 x1 = new 方();
System.out.println( x1.北 );
System.out.println( x1.π );
}
}
Java was created around the time when the Unicode standard had values defined for a much smaller set of characters. Back then it was felt that 16-bits would be more than enough to encode all the characters that would ever be needed. With that in mind Java was designed to use UTF-16. In fact, the char data type was originally used to be able to represent a 16-bit Unicode code point.
The UTF-8 charset is specified by RFC 2279;
The UTF-16 charsets are specified by RFC 2781
The UTF-16 charsets use sixteen-bit quantities and are therefore sensitive to byte order. In these encodings the byte order of a stream may be indicated by an initial byte-order mark represented by the Unicode character '\uFEFF'. Byte-order marks are handled as follows:
When decoding, the UTF-16BE and UTF-16LE charsets ignore byte-order marks; when encoding, they do not write byte-order marks.
When decoding, the UTF-16 charset interprets a byte-order mark to indicate the byte order of the stream but defaults to big-endian if there is no byte-order mark; when encoding, it uses big-endian byte order and writes a big-endian byte-order mark.
Also see this
Well, when I use that code I get the << as I should and 1/2 for \u00BD as it should be.
http://www.unicode.org/charts/