Ant compile: unclosed character literal - java

When I compile my web application using ant I get the following compiler message:
unclosed character literal
The line of offending code is:
protected char[] diacriticVowelsArray = { 'á', 'é', 'í', 'ó', 'ú' };
What does the compiler message mean?

Java normally expects its source files to be encoded in UTF-8. Have you got your editor set up to save the source file using UTF-8 encoding? The problem is if you use a different encoding, then the Java compiler will be confused (since you're using characters that will be encoded differently between UTF-8 and other encodings) and be unable to decode your source.
It's also possible that your Java is set up to use a different encoding. In that case, try:
javac -encoding UTF8 YourSourceFile.java

Use UTF encoding for text files with your Java sources.
or
Use '\uCODE' where CODE is Unicode number for á, é etc. (like for 'á' you write '\u00E1').
You might need this:
http://www.fileformat.info/info/unicode/char/e1/index.htm

It worked for me to use " instead of the ' char.
It also worked the javac -encoding UTF8 param as previously described.
This means that the compiler did not used the UTF8 coding.

Related

(Intellij) "unclosed character literal" and "illegal character '\u00a7'" when compiling

Im trying to compile the '§' character into a char (char c = '§') but when i try to build, it says in the build output "Unclosed character literal" and "Illegal character: '\u00a7'" followed by "Unclosed character literal" again. If i put the character into a String (String s = "§") it works fine. But when i print it to console, it prints the 'Â' character (which it shouldn't)
In another java project, i can use the '§' character fine and compile normally, and it works as intended; printing it to console shows nothing (which is normal, because it's used as an escape character for colouring the text). That project (and the current one) don't use "BOM" in intellij, and they both use UTF8 encoding
Does anyone know how to fix this? thanks:)
How I could reproduce that error:
created a new Test.java with UTF-8 encoding (default)
added main with char c = '§'; and print it
run Test.main()
no errors, as expected.
So I tried:
created second file Test1.java and changed to ISO-8859-1 encoding
added main method with just a print command
run Test1.java
this time I got the reported error but for Test.java (still encoded as UTF-8)
Looks like IDEA uses the encoding of first file for the whole source code.
Solutions:
make sure all files are encoded with UTF-8; and/or
use javac -encoding UTF-8 ... as commented by saka1029
Use Java's Character: https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html
You can retrieve the character through charValue().
Also, make sure you're using the encoding scheme (eg UTF-16) that supports your character.

ISO-8859-1 character encoding not working in Linux

I have tried below code in windows and able decode the message .But same code when i have tried Linux it's not working.
String message ="ööööö";
String encodedMsg = new String(message.getBytes("ISO-8859-1"), "UTF-8");
System.out.println(encodedMsg);
I have verified and could see the default character set in Linux platform is UTF-8(Charset.defaultCharset().name())
Kindly suggest me how to do same encoding Linux platform.
The explanation for this, is, almost always, that somewhere bytes are turned to characters or characters are turned to bytes there where the encoding is not clearly specified, thus, defaulting to 'platform default', thus, causing different results depending on which platform you run it on.
Except, every place where you turn bytes to chars or chars to bytes in your snippet of code explicitly specified encoding.
Or does it?
String message ="ööööö";
Ah, no, you forgot one place: javac itself.
You compile this code. That'll be where raw bytes (because the compiler is looking at ManmohansSourceFile.java, which is a file, which isn't characters, but a bunch of bytes) - which are converted into characters (because the java compiler works on characters), and this is done using some encoding. If you don't use the -encoding switch when running javac (or maven or gradle is running javac, and it passes an encoding, which one depends on your pom/gradle file), then this is read in using system encoding, and thus whether the string actually contains those bytes - who knows.
This is most likely the source of your problem.
The fix? Pick one:
Don't put non-ascii in your source files. Note that you can write the unicode symbol "Latin Capital Letter A with Tilde" as \u00C3 in your source file instead of as Ã. Then use \u00B6 for ¶.
String message ="\u00C3\u00B6\u00C3\u00B6\u00C3\u00B6\u00C3\u00B6\u00C3\u00B6";
String encodedMsg = new String(message.getBytes("ISO-8859-1"), "UTF-8");
System.out.println(encodedMsg);
> ööööö
Ensure you specify the right -encoding switch when compiling. So, if your text editor (that you use to type String message = "¶";) is configured as 'UTF-8', and then run javac -encoding UTF-8 manMohansFile.java.
First of all, I'm not sure exactly what you are expecting...your use of the term "encode" is a bit confusing, but from your comments, it appears that with the input "ööööö", you expect the output "ööööö".
On both Linux and OS X with Java 1.8, I do get that result.
I do not have a Windows machine to try this on.
As #Pshemo indicated, it is possible that your input, since it's hardcoded in the source code as a string, is being represented as UTF-8, not as ISO-8859-1. Actually, this is what I expected, and I was surprised that the code worked as you expected.
Try creating the input with String.encode(), encoding to ISO-8859-1.

Using a unicode character in a .java file?

I want to set a unicode character in a class file like this:
TextView tv = ...;
tv.setText("·");
is there anything potentially wrong with using a unicode character in a .java file?
Thanks
No. Java strings support Unicode so you shouldn't run into any problems. You might have to check that the TextView class handles all the Unicode characters (which it should), but Java itself will handle the unicode characters.
You should also ensure that the file is saved with the correct encoding settings. Essentially this means that your editor should save the java file as UTF-8 encoded Unicode. See the comments to this answer for more details on this.
Is there anything potentially wrong with using a unicode character in a .java file?
As you know, Strings within the JVM are stored as Unicode - so the question is how to deal with Unicode in Java source files ...
In short, using Unicode is fine. There are a few ways to approach it ...
By default, the javac compiler expects the source file to be in the platform default encoding. This can be overridden using the -encoding flag:
-encoding encoding
Sets the source file encoding name, such as
EUCJIS/SJIS/ISO8859-1/UTF8. If -encoding is not specified, the
platform default converter is used.
Alternatively, if it's a single character (like it appears to be), you can keep your source file in your platform default encoding, and specify the character using the Unicode escape sequence:
tv.setText("\u1234");
... where '1234' is the Unicode value for the character you want.
Another alternative is to first save your file in your Unicode-compatible encoding (say UTF-8), then use native2ascii to convert that file to your native encoding (it will convert any out of range characters to the corresponding Unicode escape sequence).
NAME
native2ascii - native to ASCII converter
SYNOPSIS
native2ascii [ options ] [ inputfile [outputfile]]
DESCRIPTION
The Java compiler and other Java tools can only process files that contain Latin-1 or Unicode-encoded (\udddd notation) characters.
native2ascii converts files that contain other character encoding into
files containing Latin-1 or Unicode-encoded charaters.
If outputfile is omitted, standard output is used for output. In addition, if inputfile is omitted, standard input is used for input.

Java: Turkish Encoding Mac/Windows

I have a problem with turkish special characters on different machines. The following code:
String turkish = "ğüşçĞÜŞÇı";
String test1 = new String(turkish.getBytes());
String test2 = new String(turkish.getBytes("UTF-8"));
String test3 = new String(turkish.getBytes("UTF-8"), "UTF-8");
System.out.println(test1);
System.out.println(test2);
System.out.println(test3);
On a Mac the three Strings are the same as the original string. On a Windows machine the three lines are (Printed with the Netbeans 6.7 console):
?ü?ç?Ü?Ç?
ğüşçĞÜŞÇı
?ü?ç?Ü?Ç?
I don't get the problem.
String test1 = new String(turkish.getBytes());
You're taking the Unicode String including the Turkish characters, and turning it into bytes using the default encoding (using the default encoding is usually a mistake). You're then taking those bytes and decoding them back into a String, again using the default encoding. The result is you've achieved nothing (except losing any characters that don't fit in the default encoding); whether you have put a String through an encode/decode cycle has no effect on what the following System.out.println(test1) does because that's still printing a String and not bytes.
String test2 = new String(turkish.getBytes("UTF-8"));
Encodes as UTF-8 and then decodes using the default encoding. On Mac the default encoding is UTF-8 so this does nothing. On Windows the default encoding is never UTF-8 so the result is the wrong characters.
String test3 = new String(turkish.getBytes("UTF-8"), "UTF-8");
Does precisely nothing.
To write Strings to stdout with a different encoding than the default encoding, you'd create a encoder something like new OutputStreamWriter(System.out, "cp1252") and send the string content to that.
However in this case, it looks like the console is using Windows code page 1252 Western European (+1 ATorres). There is no encoding mismatch issue here at all, so you won't be able to solve it by re-encoding strings!
The default encoding cp1252 matches the console's encoding, it's just that cp1252 doesn't contain the Turkish characters ğşĞŞı at all. You can see the other characters that are in cp1252, üçÜÇ, come through just fine. Unless you can reconfigure the console to use a different encoding that does include all the characters you want, there is no way you'll be able to output those characters.
Presumably on a Turkish Windows install, the default code page will be cp1254 instead and you will get the characters you expect (but other characters don't work). You can test this by changing the ‘Language to use for non-Unicode applications’ setting in the Regional and Language Options Control Panel app.
Unfortunately no Windows locale uses UTF-8 as the default code page. Putting non-ASCII output onto the console with the stdio stream functions is not something that's really reliable at all. There is a Win32 API to write Unicode directly to the console, but unfortunately nothing much uses it.
Don't rely on the console, or on the default platform encoding. Always specify the character encoding for calls like getBytes and the String constructor taking a byte array, and if you want to examine the contents of a string, print out the unicode value of each character.
I would also advise either restricting your source code to use ASCII (and \uxxxx to encode non-ASCII characters) or explicitly specifying the character encoding when you compile.
Now, what bigger problem are you trying to solve?
You may be dealing with different settings of the default encoding.
java -Dfile.encoding=utf-8
versus
java -Dfile.encoding=something else
Or, you may just be seeing the fact that the Mac terminal window works in UTF-8, and the Windows DOS box does not work in UTF-8.
As per Mr. Skeet, you have a third possible problem, which is that you are trying to embed UTF-8 chars in your source. Depending on the compiler options, you may or may not be getting what you intend there. Put this data in a properties file, or use \u escapes.
Finally, also per Mr. Skeet, never, ever call the zero-argument getBytes().
If you are using AspectJ compiler do not forget to set it's encoding to UTF-8 too. I have struggled to find this for hours.

String.getBytes("ISO-8859-1") gives me 16-bit characters on OS X

Using Java 6 to get 8-bit characters from a String:
System.out.println(Arrays.toString("öä".getBytes("ISO-8859-1")));
gives me, on Linux: [-10, 28]
but OS X I get: [63, 63, 63, -89]
I seem get the same result when using the fancy new nio CharSetEncoder class. What am I doing wrong? Or is it Apple's fault? :)
I managed to reproduce this problem by saving the source file as UTF-8, then telling the compiler it was really MacRoman:
javac -encoding MacRoman Test.java
I would have thought javac would default to UTF-8 on OSX, but maybe not. Or maybe you're using an IDE and it's defaulting to MacRoman. Whatever the case, you have to make it use UTF-8 instead.
What is the encoding of the source file? 63 is the code for ? which means "character can't be converted to the specified encoding".
So my guess is that you copied the source file to the Mac and that the source file uses an encoding which the Mac java compiler doesn't expect. IIRC, OS X will expect the file to be UTF-8.
Your source file is producing "öä" by combining characters.
Look at this:
System.out.println(Arrays.toString("\u00F6\u00E4".getBytes("ISO-8859-1")))
This shall print [-10,-28] like you expect (I don't like to print it this way but I know it's not the point of your question), because there the Unicode codepoints are specified, carved in stone, and your text editor is not allowed to "play smart" by combining 'o' and 'a' with diacritic signs.
Typically, when you encounter such problems you probably want to use two OS X Un*x commmands to figure what's going on under the hood: file and hexdump are very convenient in such cases.
You want to run them on your source file and you may want to run them on your class file.
Maybe the character set for the source is not set (and thus different according to system locale)?
Can you run the same compiled class on both systems (not re-compile)?
Bear in mind that there's more than one way to represent characters. Mac OS X uses unicode by default, so your string literal may actually not be represented by two bytes. You need to make sure that you load the string from the appropriate incoming character set; for example, by specifying in the source a \u escape character.

Categories