Compile java source with unmappable character using IBM JDK

Compile java source with unmappable character using IBM JDK - java

I have a project, which has many java source files.
All java source files will be treated as UTF-8 encoded, but some of them contains unmappable UTF-8 character in comments, these files are commited by some members in our team, and they use GBK encoding for their local project.
I want compile it with ant on a AIX system, which has IBM JDK installed
But the compile task fails, as javac throws errors like this:
xx/xx/XX.java unmappable character for encoding UTF-8
Any easy solution ?
Edit:
As I know why the error happen, and I know how to fix encoding problem for a single java source file, My problem is actually how to identify those files have encoding problem while there is too many source files.

Here are steps how I fixed my problem:
As only part of source files contains unmappable UTF-8 characters, so we can find all such Java files by increasing the number of javac max errors by specifying a compiler argument when compiling with ant:
<compilearg line="-Xmaxerrs 100000" />
And dump error messages to a file when you call ant command.
ant -buildfile=compile.xml > error.txt
Then you can use Notepad++ to do some trick job on the output file to get the list of file which has encoding problem then you can fix them.
Use regular expression to remove unneeded content; and
Use TextFX to sort and remove duplicate lines.

Related

(Java) How to create a JAR file in the command line when code contains UTF-8 characters

I have a program that uses certain characters (arrows, gender symbols) that are not typically supported. When compiling in Eclipse, I just choose "save as UTF-8" and it works fine. However, when I'm in the console trying to compile (using javac *.java or something to that effect) it throws an error because of those UTF-8 characters. How do I adjust the way I compile so it is able to use the UTF-8 characters?

Creating JAR file is mainly like archiving the .class files so you should not get any error because of UTF-8 characters in source files.
For resolving compilation error due to UTF-8 characters, use -encoding argument for specifying source files encoding -
javac SourceJavaFile.java -encoding UTF-8

You can create a JAR file using eclipse natively by right-clicking on your project and then selecting Export. From there you can expand the Java node and select the JAR File type, and configure it from there. Reference here from the eclipse website for a full instruction of using the built-in JAR File maker on eclipse. I hope this allows for the use of UTF-8 exclusive characters which you are using.

IntelliJ keeps switching to UTF8 (I want to set CP-1252)

I have some projects which are encoded with Windows-1252/CP-1252 and I can't change the encoding. The problem is, no matter what I do, intelliJ will keep trying to read these files as UTF-8 unless I manually put every single file in the encoding list.
That requires a lot of time and effort, it's error-prone and it's not a solution at all. I have set the entire project and IDE encoding as CP-1252 but it keeps trying to read files as UTF-8 anyway.
I don't know what causes that. We are using Subversion to commit files and maven to compile (which uses UTF-8 to read files except for the super POM which uses CP-1252).
Any idea how to solve the problem? I gave a look at other posts but I found no real solution yet. I'm currently using the last IntelliJ version (2017.1.2)

I actually found out what was the problem. Maven project encoding was overriding Intellij configurations. I tried to edit the source encoding property before but it didn't work because I misspelled Cp1252. Now it seems to work.

Linux - Java/File encoding

I've searched around the web a while now and haven't found anything giving me a proper answer.
I've got a linux server running debian and a bukkit server, I've rusn my server on windows before and my files seems to go right with UTF-8 encoding. I uploaded my files via winscp and now they seems to be ASCII or something else. Because ingame and also in the files every special char, like umlauts changed to placeholders and ingame to questionmarks.
I've tried to change encoding of a file (would be hard to do this for every file... asspecially if I need to to that everytime uploading a new one) but it only changed to a single questionmark instead of these placeholder stuff.
For jenkins I needed to change encoding via encoding=... in the javac execution in my build.xml but I don't know any flag to change encoding for the java cmd.
I also read it should be possible to change the encoding for the whole java but the tried cmds didn't worked at all.
I would be happy to get some tips how to fix this or in general how to avoid converting every file I upload...
Thank you very much :)
~Julian

You can try
java -Dfile.encoding=UTF-8 *.jar
to run a java project in specific encoding no matter what default encoding the current system use.
if you intend to change all files in a project to a specific encoding in eclipse
right click on your project in project explorer -> Properties(or Alt+Enter) -> Resource -> look on the right, you can see Text File Encoding, Then you can choose UTF-8 as needed.
Remember to check all your packages(right click and check Text File Encoding part) that they all inherited from container.
Hope this help!

Byte Order Mark generating a file using Mono in Ubuntu

My .NET utility AjGenesis is a code generation tool. The compiled binaries runs without glitches under Ubuntu 10.x, and Mono. But I have a problem: generating a java text file (a normal text file for my tool) it generates Byte Order Mark at the beginning of each file. I'm using System.Text.Encoding.Default: in Windows, all OK, in Ubuntu, the Byte Order Mark are three bytes, indicating UTF8, I guess.
This difference is a problem, when I want to compile the generate .java files using ant, or javac, the BOMs generate errors. Then:
What encoding to use in Ubuntu/Mono so the generated files could be processed by javac?
I tried javac -encoding UTF8 without success, any clues? My guess: it's not for skip BOMs.
I tried System.Text.Encoding.ASCII. But my generated files have non ASCII files (Spanish accented letters). If I change the encoding, the BOMs are added, and javac refuses the files. Any suggestion?
TIA

Don't use Encoding.Default. Why make your output platform specific? Use UTF-8 - and if you have to use UTF-8 without a BOM, you can do that with:
Encoding utf8 = new UTF8Encoding(false);
To be honest though, I'm surprised javac fails. You say you've tried it "without success" - what was the result?

Try instantiating System.Text.UTF8Encoding and supplying a parameter value that doesn't include BOMs. You may read about this here:
http://msdn.microsoft.com/en-us/library/s064f8w2.aspx

Character Enconding in my java classes in Eclipse is messed up. How to fix it?

I got a eclipse project that was working OK.
One day I had to format my machine, so I copied my workspace into a backup, after installing eclipse again, I imported my projects from my backed up workspace.
What happened is that it corrupted all the string that contains special characters..
like.. é, são, etc.. to Ã‰, sÃ£o...
Is there a way to refactor it back to normal?
I tried changing character encoding in eclipse, but it doesn't update the class files.

You need to reconfigure the workspace encoding. Go to Window > Preferences and enter filter text "encoding" and set to UTF-8 everywhere when applicable, especially the Workspace's text encoding.

Is there a way to refactor it back to normal?
Did you try closing an individual file, right-clicking it to open properties and setting its encoding manually?
like.. é, são, etc.. to Ã‰, sÃ£o...
Are you sure it wasn't É (U+00C9) that was becoming Ã‰ (U+00C3 U+2030)?
That would suggest that files that were being interpreted as UTF-8 before are now being interpreted as something else (probably windows-1252).
Many Java compilation problems can be fixed by sticking to the subset of values that appear in the US-ASCII character encoding and using Unicode escape sequences for everything else (this assumes you aren't using UTF-16 or UTF-32 or something).
The code String foo = "É"; would become String foo = "\u00C9";. This improves source code portability at the expense of readability.
You can use the JDK native2ascii tool to perform the conversion:
native2ascii -encoding UTF-8 Foo.java

This is probably a stupid suggestion, but I figured I'd mention it anyways. Are you opening your file.class or your file.java? Because the *.class files, if I recall correctly, are binary files which explains why you're seeing those weird characters. The *.java files are the plain-text source files.
I figure you knew that, but the wording made me feel otherwise, so I figure I'd mention it.

How do the actual .java files on disk look? Are they damaged, or is it just that eclipse can't display them properly? If the files look good on disk, setting the encoding property as jjnguy suggest should do the trick.
If the files are damaged on disk, maybe iconv can "undamage" them?
To avoid such problem in the future, I actually suggest keeping the java files in plain ASCII, using \uNNNN escapes for non-ascii characters in strings etc (e.g. \u00E4 is ä / ä / ä / an a with two dots above).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Compile java source with unmappable character using IBM JDK - java

Related

(Java) How to create a JAR file in the command line when code contains UTF-8 characters

IntelliJ keeps switching to UTF8 (I want to set CP-1252)

Linux - Java/File encoding

Byte Order Mark generating a file using Mono in Ubuntu

Character Enconding in my java classes in Eclipse is messed up. How to fix it?

Categories

Resources