Intellij compilation error: "illegal character' for É sign

Intellij compilation error: "illegal character' for É sign - java

I have an Enum class in Java that contains the letter É:
public enum Types {
RÉSUMÉ
}
When trying to build the project, IntelliJ complains on the É sign:
error: illegal character: '\u2030'
RÉSUMÉ
^
I'm using Windows 10.
In the past, the same project was compiled and run with no problems on my computer. So it seems like something in the settings was changed that caused this.
There is no option to replace É with E
Any idea how to fix this?
EDIT
The code used to run with JDK 11 and was upgraded to Java 17. Maybe it has something to do with it. Trying to downgrade the JDK of the project (Settings -> Build -> Gradle -> back to JDK 11) didn't help

Character set & encoding of Java source code files
As commented, likely your source code files were written with a character encoding other than UTF-8. Now your compiler is expecting UTF-8, and reading your source code as such.
This problem could occur for either of two reasons:
Your settings for your compiler or IDE changed
👉 You changed your JDK from an earlier version of Java to Java 18 or later.
Earlier versions of Java by default expect source code to be written in the default character set of the host OS platform.
Java 18+ defaults to UTF-8 across OS platforms for most purposes.
The main clue for this character encoding misreading hypothesis is that code point U+2030 is not the LATIN CAPITAL LETTER E WITH ACUTE character, nor is it the composition of an uppercase E followed by the accent. No, the code point 2030 in hex (8,240 in decimal) is PER MILLE SIGN: ‰.
Java 18+ defaults to UTF-8
See JEP 400: UTF-8 by Default. To quote:
If source files were saved with a non-UTF-8 encoding and compiled with an earlier JDK, then recompiling on JDK 18 or later may cause problems. For example, if a non-UTF-8 source file has string literals that contain non-ASCII characters, then those literals may be misinterpreted by javac in JDK 18 or later unless -encoding is used.
You can easily verify if this new default is the source of your problem. 👉 Go back to your old project, and specify UTF-8 for compiling. Again, quoting the JEP:
Prior to compiling on a JDK where UTF-8 is the default charset, developers are strongly encouraged to check for charset issues by compiling with javac -encoding UTF-8 ... on their current JDK (8-17).

Related

Robust and performant way to detect whether a JDK has the module system given its filesystem path?

I have a script that launches a Java program, and allows the user to specify the path of the Java installation to use via an environment variable.
I'd like that script to supply Java module system arguments (specifically --add-opens) when the target JDK has the module system (JPMS, or "Jigsaw"), and omit them when it does not (if they are not omitted, the startup will fail, as JDK 8 complains about the unrecognized arguments). Right now it omits them, which results in undesirable warnings on JDK 9+ (and yes, I am looking into fixing the root causes as well).
I can implement this. Probably the most robust way would be to first invoke a Java program in the underlying script that detected the module system and emitted its results to standard output (or process exit status, maybe); the calling script could then examine that output to know whether the underlying JDK was JPMS-enabled.
I could also parse java -version but I'm not sure what's guaranteed about the format of that string.
I am hoping there's a way that's (1) robust, and (2) performant -- maybe checking for the existence of a particular file in the installation, or scanning a particular JAR file from within the calling script, or something.
Anyone expert on JPMS have a heuristic that is robust and performant for this?

Every version of Java since and including 9 contains the module system.
Every OpenJDK-derived implementation of Java 9 and later, which includes Oracle’s commercial JDK builds, supports the --version option; earlier versions only support the -version (single dash) option. So a quick and dirty way to check for 9 and later would be if $JAVA_HOME/bin/java --version >&/dev/null; then ....

The java class file format is the same for all java versions and contains the version number. Just read the header bytes of one of the relevant .class files.
For example, the last line of the output from the below PowerShell command is the version number.
Get-Content -Path "Key.class" -TotalCount 8 -Encoding Byte
The output is
202
254
186
190
0
0
0
52
52 is the version number which means file Key.class was compiled with JDK 8.

Difference between InputStreamReader in java 1.6 and java 1.7 [duplicate]

We are migrating our application from Java 1.6 to Java 1.7. We recompiled the code using Java 1.7 and received an error while compiling which was due to a character (an Ó).
Was there a change in Java 1.7 related to characters? Our application does a lot of processing of incoming files to then load them into a database and I want to ensure that when we upgrade to Java 1.7 that the reading of a file from java and the writing to the database of that content wont result in some odd character conversions.
Do I need to be concerned at all when upgrading to 1.7? If so, how to I get the same encoding that we had in Java 1.6?

The error occurs because you've told the Java compiler that your source is UTF-8 encoded, but it still contains some ISO-8859-1 extended characters. I recently had to fix similar errors in a codebase that was migrated from 1.5 to 1.6. I believe that Java 7 is much stricter about UTF-8 encoding than previous versions and will issue errors where previously the incorrect encodings were silently accepted.
You will need to make sure that your source code is "Unicode-clean", that is, you must replace any extended ISO-8859-1 characters with their Unicode equivalents.

I ran into this problem on Windows and discovered that the default encoding for 1.7 was CP-1252. I was able to get clean compiles by setting to following environment variable...
JAVA_TOOL_OPTIONS = -Dfile.encoding=UTF8

Java JAXB encoding issue

I want to know how some of special character automatically convert. Example
Unicode: 0x3 is converted to  (you can see this got converted &#three; . i have changed 3 to three) I am not sure how this convert automatically.
I am using java 1.6 and below is JAXB info.
xjc version "JAXB 2.1.10 in JDK 6"
JavaTM Architecture for XML Binding(JAXB) Reference Implementation, (build JAXB 2.1.10 in JDK 6)
Above conversion happening on one of our test environment, however if i try to do same in my local machine i am getting below exception
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:315)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.createUnmarshalException(UnmarshallerImpl.java:505)
I have spent a lot of time to figure this but not able to find any answer.Just to clear , i have checked and in code we do not have any escape custom class.
I have validated this jrocket jdk and as well as sun jdk but i have same problem. In test environment we are having jrocket jdk.
Is some face same issue ? Is this issue related to JAXB or it can be related to java ?

UTF-8 charset outputs invalid encoding in Windows Hotspot JVM 1.8.0_201

Sorry if this is a stupid question, I may be missing something basic here.
I'm just trying to encode a String using UTF-8. Following best practices, I don't assume that the default charset is UTF-8, and so I use:
"Ñ".getBytes(Charset.forName("UTF-8"))
According to the official Unicode spec, this should come out as: 0xc391
However, what I'm getting instead is: 0xc383e28098.
I'm failing to make any sense of this. This happens whether I set -Dfile.encoding=UTF-8 or not.
Strangely enough, when I don't specify the charset (or use Charset.defaultCharset()), the windows-1252 encoding is used, and the output is correctly encoded UTF-8!
What's more, when I run the code through IntelliJ and not the command line, the UTF-8 charset actually does work as expected. IntelliJ adds a lot of unrelated libraries to the classpath, so I guess one of them is responsible for the correction, but I want it to work in production.
My java -version:
java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) Client VM (build 25.201-b09, mixed mode

There’s nothing wrong with your code. The problem is how the compiler treats your source code.
When you write "Ñ" in your code and save the file, what bytes are actually written to the source file?
It appears you saved the source file as a UTF-8 file (which is usually a good choice). This means "Ñ" was written to the file as the UTF-8 bytes 0xC3 0x91.
If you were to compile it on any operating system other than Windows, where the system’s default encoding is UTF-8, things would build and run exactly as you expect.
But when you build on Windows, where the system’s default charset is windows-1252, those two bytes in the source file get treated differently. The compiler interprets those two bytes using windows-1252. Regardless of what the code looks like in your editor, the compiler sees 0xC3 0x91 and treats each byte as a windows-1252 character. In windows-1252, those bytes represent:
0xC3 → Ã (LATIN CAPITAL LETTER A WITH TILDE)
0x91 → ‘ (LEFT SINGLE QUOTATION MARK)
So the compiler compiles your string constant as `"Ã‘".
All of that translation took place at compile-time only. In a compiled .class file, all string constants are represented in the same manner; any information about how the source was encoded is lost. At runtime, Java only knows that you have (apparently) compiled your string as "Ã‘".
At runtime, when you decode that two-character string using UTF-8, you get the UTF-8 byte sequences for those two characters:
Ã → 0xc3 0x83
‘ → 0xe2 0x80 0x98
The solution, as you have surmised, is to tell the compiler that your source files are in UTF-8, so it will interpret the bytes 0xc3 0x91 as Ñ.

Running Jre 7 from a Path with Unicode characters

I'm Having the following errors:
Error: Registry key Software\JavaSoft\Java Runtime Environment.
Error: Could not find java.dll.
Error: Could not find java SE Runtime Environment.
when I'm running Jre 7 86x from a folder that contains Unicode characters.
To reproduce this dysfunction follow those steps:
Copy Jre 7 86x folder to a folder in which it's name contains Unicode characters for example (功能障碍).
Try to execute java.exe / javaw.exe.
The system language must be different from any Unicode language (Chinese, Japanese, ...), in my case the system language is English.
Any clue what is going on?

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.