Cannot compile Java file with non-ASCII character - java

Important:
I must use plain windows notepad only (neither IDE nor Notepad++ or any other text editors allowed).
So I have a simple class:
class Test{
public static void main(String[] args){
char c = 'қ';
System.out.println(c);
}
}
By default notepad saves text files using ANSII encoding, but as you can see I have a non-ANSII character in my code. I can compile and run this code via command prompt, but output is ? instead of қ, which seems obvious. When I change the file's encoding to UTF-8, compiler throws an error. I have read this article Illegal Character when trying to compile java code but there is no solution for my particular problem, because as I wrote above, I am not allowed to use any text editors but Windows notepad.
Thank you!

Probably you need like this:
char c = '\u039A';
I don't know the code of your 'k', but you may find it on https://www.ssec.wisc.edu/~tomw/java/unicode.html
Also hopes that Windows has this character for output in the console
p.s. The console of windows has a certain code page. Try to change it in console, for example:
REM change CHCP to UTF-8
CHCP 65001
CLS
and remember about different fonts in windows console, some of them can't draw specific symbols.

Yes, the problem is that javac is non-compliant in not accepting the BOM with UTF-8.
Use Notepad to save as Unicode (actually UTF-16LE).
Compile with
javac -encoding UTF-16 Test.java

Related

(Intellij) "unclosed character literal" and "illegal character '\u00a7'" when compiling

Im trying to compile the '§' character into a char (char c = '§') but when i try to build, it says in the build output "Unclosed character literal" and "Illegal character: '\u00a7'" followed by "Unclosed character literal" again. If i put the character into a String (String s = "§") it works fine. But when i print it to console, it prints the 'Â' character (which it shouldn't)
In another java project, i can use the '§' character fine and compile normally, and it works as intended; printing it to console shows nothing (which is normal, because it's used as an escape character for colouring the text). That project (and the current one) don't use "BOM" in intellij, and they both use UTF8 encoding
Does anyone know how to fix this? thanks:)
How I could reproduce that error:
created a new Test.java with UTF-8 encoding (default)
added main with char c = '§'; and print it
run Test.main()
no errors, as expected.
So I tried:
created second file Test1.java and changed to ISO-8859-1 encoding
added main method with just a print command
run Test1.java
this time I got the reported error but for Test.java (still encoded as UTF-8)
Looks like IDEA uses the encoding of first file for the whole source code.
Solutions:
make sure all files are encoded with UTF-8; and/or
use javac -encoding UTF-8 ... as commented by saka1029
Use Java's Character: https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html
You can retrieve the character through charValue().
Also, make sure you're using the encoding scheme (eg UTF-16) that supports your character.

ISO-8859-1 character encoding not working in Linux

I have tried below code in windows and able decode the message .But same code when i have tried Linux it's not working.
String message ="ööööö";
String encodedMsg = new String(message.getBytes("ISO-8859-1"), "UTF-8");
System.out.println(encodedMsg);
I have verified and could see the default character set in Linux platform is UTF-8(Charset.defaultCharset().name())
Kindly suggest me how to do same encoding Linux platform.
The explanation for this, is, almost always, that somewhere bytes are turned to characters or characters are turned to bytes there where the encoding is not clearly specified, thus, defaulting to 'platform default', thus, causing different results depending on which platform you run it on.
Except, every place where you turn bytes to chars or chars to bytes in your snippet of code explicitly specified encoding.
Or does it?
String message ="ööööö";
Ah, no, you forgot one place: javac itself.
You compile this code. That'll be where raw bytes (because the compiler is looking at ManmohansSourceFile.java, which is a file, which isn't characters, but a bunch of bytes) - which are converted into characters (because the java compiler works on characters), and this is done using some encoding. If you don't use the -encoding switch when running javac (or maven or gradle is running javac, and it passes an encoding, which one depends on your pom/gradle file), then this is read in using system encoding, and thus whether the string actually contains those bytes - who knows.
This is most likely the source of your problem.
The fix? Pick one:
Don't put non-ascii in your source files. Note that you can write the unicode symbol "Latin Capital Letter A with Tilde" as \u00C3 in your source file instead of as Ã. Then use \u00B6 for ¶.
String message ="\u00C3\u00B6\u00C3\u00B6\u00C3\u00B6\u00C3\u00B6\u00C3\u00B6";
String encodedMsg = new String(message.getBytes("ISO-8859-1"), "UTF-8");
System.out.println(encodedMsg);
> ööööö
Ensure you specify the right -encoding switch when compiling. So, if your text editor (that you use to type String message = "¶";) is configured as 'UTF-8', and then run javac -encoding UTF-8 manMohansFile.java.
First of all, I'm not sure exactly what you are expecting...your use of the term "encode" is a bit confusing, but from your comments, it appears that with the input "ööööö", you expect the output "ööööö".
On both Linux and OS X with Java 1.8, I do get that result.
I do not have a Windows machine to try this on.
As #Pshemo indicated, it is possible that your input, since it's hardcoded in the source code as a string, is being represented as UTF-8, not as ISO-8859-1. Actually, this is what I expected, and I was surprised that the code worked as you expected.
Try creating the input with String.encode(), encoding to ISO-8859-1.

how to insert the ≠ sign into a string

What I want as an end result is this
System.out.println("This is the not equal to sign\n≠");
to appear (when run) as
This is the not equal to sign
≠
not to appear as
This is the not equal to sign
?
Is there any way to do this? I tried using windows character map, copied the symbol here, and in my code, but after changing encoding to UTF-8 and inserting it, it comes up as ? when run...
What can be done? Thanks in advance for answers to this utterly simple question
Set character encoding to UTF-8, pass this vm argument, if your text editor already uses UTF-8 or supports this character
-Dfile.encoding=UTF-8
As #Tobias Brandt says, you could use: \u2260
And btw also #Crozin is right about your console configuration
Like this
System.out.println("This is the not equal to sign \n\u2260");
There are five potential issues here:
1) In which charset encoding are you saving (from your editor) you Java source?
2) Which charset encoding the java compiler assumes?
3) Which charset is your console?
4) Are you using some terminal with translation?
5) Does your console font include that particular character?
For getting issues 1-2 right, you should use UTF-8 for both (editor and javac settings), or more robust, specifify the Unicode char with escaped pure ascii text (Frakcool answer).
For issue 3, try -Dfile.encoding=UTF-8 or see this answer. Issues 4-5 are outside your Java program scope. If you are unsure, just redirect the ouput to a file, and look at it with a Hex editor.
When you save the java file, make sure it is saved in the same Charset as the one it is open.
In my Eclipse, when I save a file with special chars (such as \u2260) it asks me what charset I want to use.
Open your file in the terminal and inspect the content of the file.
Make sure it is the same char as the one in the editor you are using.
It seems that after Eclipse asked me if I want to change to UTF-8, it worked, only after I posted this.
Sorry for wasting your time

Ant compile: unclosed character literal

When I compile my web application using ant I get the following compiler message:
unclosed character literal
The line of offending code is:
protected char[] diacriticVowelsArray = { 'á', 'é', 'í', 'ó', 'ú' };
What does the compiler message mean?
Java normally expects its source files to be encoded in UTF-8. Have you got your editor set up to save the source file using UTF-8 encoding? The problem is if you use a different encoding, then the Java compiler will be confused (since you're using characters that will be encoded differently between UTF-8 and other encodings) and be unable to decode your source.
It's also possible that your Java is set up to use a different encoding. In that case, try:
javac -encoding UTF8 YourSourceFile.java
Use UTF encoding for text files with your Java sources.
or
Use '\uCODE' where CODE is Unicode number for á, é etc. (like for 'á' you write '\u00E1').
You might need this:
http://www.fileformat.info/info/unicode/char/e1/index.htm
It worked for me to use " instead of the ' char.
It also worked the javac -encoding UTF8 param as previously described.
This means that the compiler did not used the UTF8 coding.

String.getBytes("ISO-8859-1") gives me 16-bit characters on OS X

Using Java 6 to get 8-bit characters from a String:
System.out.println(Arrays.toString("öä".getBytes("ISO-8859-1")));
gives me, on Linux: [-10, 28]
but OS X I get: [63, 63, 63, -89]
I seem get the same result when using the fancy new nio CharSetEncoder class. What am I doing wrong? Or is it Apple's fault? :)
I managed to reproduce this problem by saving the source file as UTF-8, then telling the compiler it was really MacRoman:
javac -encoding MacRoman Test.java
I would have thought javac would default to UTF-8 on OSX, but maybe not. Or maybe you're using an IDE and it's defaulting to MacRoman. Whatever the case, you have to make it use UTF-8 instead.
What is the encoding of the source file? 63 is the code for ? which means "character can't be converted to the specified encoding".
So my guess is that you copied the source file to the Mac and that the source file uses an encoding which the Mac java compiler doesn't expect. IIRC, OS X will expect the file to be UTF-8.
Your source file is producing "öä" by combining characters.
Look at this:
System.out.println(Arrays.toString("\u00F6\u00E4".getBytes("ISO-8859-1")))
This shall print [-10,-28] like you expect (I don't like to print it this way but I know it's not the point of your question), because there the Unicode codepoints are specified, carved in stone, and your text editor is not allowed to "play smart" by combining 'o' and 'a' with diacritic signs.
Typically, when you encounter such problems you probably want to use two OS X Un*x commmands to figure what's going on under the hood: file and hexdump are very convenient in such cases.
You want to run them on your source file and you may want to run them on your class file.
Maybe the character set for the source is not set (and thus different according to system locale)?
Can you run the same compiled class on both systems (not re-compile)?
Bear in mind that there's more than one way to represent characters. Mac OS X uses unicode by default, so your string literal may actually not be represented by two bytes. You need to make sure that you load the string from the appropriate incoming character set; for example, by specifying in the source a \u escape character.

Categories