java argument with greek characters in windows - java

I have created a simple .jar file which is taking as argument a string with greek characters and prints it in a file.
However, I have the following issue:
When I execute the jar file from my local windows machine, the string is properly passed as argument in the jar file and the output in the file contains the greek characters I inserted.
When I try to execute the same jar file in a windows VM, the greek characters are not properly encoded and the output in the file contains unreadable characters.
I have even set the command prompt in the VM in chcp 1253 and set an environmental variable as JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8 with no luck...
Any suggestion?

Running chcp 1253 sets your console codepage to Windows 1253, and yet you set Java to not use it...
If you are running your program via a batch script, save it as UTF-8 and add -Dfile.encoding=UTF-8 to parameters for the java command.
If you are running your program via the console, run chcp 65001 to switch the console to UTF-8. Also, you set the variable correctly, you can leave it that way, but you can also run Java with this option set explicitly:
chcp 65001
java -Dfile.encoding=UTF-8 -jar binks.jar
EDIT: If Windows is still complaining and/or messing stuff up, try changing 65001 to 1523 and UTF-8 to Windows-1253. You'll lose support for most of Unicode, but there's greater chance now it will work.

Related

VSCode Java UTF-8 does not print characters to the console

I have a problem with encoding Java System Output occurring only in Visual Studio Code.
Like you can see in the image below bullet print as ?
Eclipse and IntelliJ print the bullet point just fine.
My program is very simple:
Things I have tried/checked:
chcp:
UTF8 is set:
I only have the Java Extension Pack by Microsoft installed.
It is a fresh new file created from VScode.
The chcp inside VSCode terminal is Active code page: 65001
The UTF-8 displayed in the lower right corner of VS code is the encoding format of the current file, not the encoding format of the terminal.
You should check the encoding by typing chcp in the terminal of VS code.
The encoding format of the terminal in vscode may be different from the system's cmd and powershell encoding.
So please check the encoding format in the terminal in VS code, not in the cmd or powershell window of the system
Here is my test display:
my code:
public class App {
public static void main(String[] args) {
System.out.println("example ●");
}
}
The encoding format of the system cmd window is 65001
The encoding format of the system powershell window is 65001
But the terminal encoding in vscode is 437
Run the result directly ( Can't display symbols )
So you need to use chcp 65001 to change the current terminal encoding format in VS code.
then run the code ( success display symbol)
But this still has problems, every time you open a new terminal, you need to manually type the command chcp 65001 to change the encoding format.
I found a way in my constant search. Add the following configuration in settings.json:
"terminal.integrated.shellArgs.windows": ["-noexit", "chcp 65001"]
A yellow squiggly line will appear indicating that this configuration is out of date and there are now new configuration commands. Never mind, this still works. If you want to see the new configuration, here.
Note: After adding this configuration, you need to restart VS code to take effect.
In this way, when you create a new terminal, you will automatically use the command chcp 65001 to change the encoding to 65001
Now run the code directly, the symbols can be displayed

How to fix Java args not getting Japanese characters properly in string from Windows Explorer?

On Windows 10, I have a shortcut file in the "SendTo" directory. It is a shortcut to a .bat file.
Inside the .bat file can have just the command "python <filepath> %*" or "java -jar <filepath> %*".
When I select and right click file(s) from Windows Explorer and have it sent to this shortcut file, it will run the program from <filepath> with the selected file(s) as arguments.
I am trying to send files with filenames containing Japanese characters as arguments. The filenames are passed to python programs just fine, but for Java programs, the args for the filenames are messed up and the Java program cannot find the file.
For example, in Java and with locale of Japan, a filename of Filename ファイル名.txt becomes Filename 繝輔ぃ繧、繝ォ蜷�.txt in the args. Other locales also do not work. The result is the same if I send the args to python and then from python to Java.
How to make it so Java gets the proper filename or can find the file properly?
You are encountering an unresolved issue with Java. See open bug JDK-8124977 cmdline encoding challenges on Windows which consolidates several problems related to passing Unicode arguments to a Java application from the command line.
Java 18 (to be released next month) resolves some UTF-8 issues with the implementation of JEP 400: UTF-8 by Default, but specifically not your problem unfortunately. From the "Goals" for JEP400:
Standardize on UTF-8 throughout the standard Java APIs, except for console I/O. [Emphasis mine]
However, there is a workaround. See Netbeans Chinese characters in java project properties run arguments, and in particular this answer which successfully processes Chinese characters passed as command line arguments using JNA (Java Native Access). From that answer:
JNA allows you to invoke Windows API methods from Java, without using
native code. So in your Java application you can call Win API methods
such as GetCommandLineW() and CommandLineToArgvW() directly, to access
details about the command line used to invoke your program, including
any arguments passed. Both of those methods support Unicode.
So the code in that answer does not read the arguments passed to main() directly. Instead it uses JNA to invoke the Win API methods to access them.
While that code was processing Chinese characters passed as arguments from the command line, it would work just as well for Japanese characters, including your Japanese filenames.

Java - working with different cmd charsets

I want to read a file path from the user in java console application,
some of the file path may contain some Hebrew characters.
how can i read the input from the command line when i don't know the encoding charset?
I have been spending some time on the web and didn't succeed to find any relevant solution that will be dynamic for every platform.
*
Screen shot when running in console
If you are using windows you need to check the terminal encoding before to make sure that its encoding supports hebrew.
To do this just type chcp in the console
as output you should see chcp 28598
if you see diffrent number type chcp 28598
Now your console encoding is set to hebrew and you should be able to write the path in hebrew without getting any exception.

rendering the US unit separator on unix machine

I have a java application that feeds a file on a unix machine, each string contains multiple US unit separators,
Locally, when i run it on eclipse on a windows machine, it displays fine on the console:
1▼somedata▼somedata▼0▼635064▼0▼somedata▼6
But when i run the program from the unix machine, the content of the file appears as.
1â¼N/Aâ¼somedataoâ¼somedataâ¼somedata
Changing the LANG variable to any value in locale -a seems not to work.
looks like character set mismatch. On linux you most probably have UTF-8. With Java you usually get UTF-16. Try converting from UTF16 to UTF8 with iconv and see how it looks like on linux.
cat file | iconv -f UTF-16 -t UTF-8
But actually it would have been much worse if it was UTF-16. Maybe it is simply a font mismatch. But you can play with character encoding (see what source is and convert to utf-8) if that's the issue. Or maybe your source is UTF-8 and destination - some local encoding.
This makes sense because your special character appears as 2 characters in the UNIX machine. Which means source is pretty much likely UTF-8 and UNIX is using an encoding where each byte is a single character.

Junk character output even after encoding

So, I have basically been trying to use Spanish Characters in my program, but wherever I used them, Java would print out '?'.
I am using Slackware, and executing my code there.
I updated lang.sh, and added: export JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8
After this when I tried printing, it did not print the question marks, but other junk characters. I printed the default Charset on screen, and it has been successfully set, but it is not printing properly.
Help?
Thanks!
EDIT: I'm writing code in windows on NetBeans, and executing .class or .jar on slackware.
Further, I cannot seem to execute locale command. I get error "bash: locale: command not found".
This is what confuses me: When I echo any special characters on Slackware console, they are displayed perfectly, but when I run a java program that simply prints it's command line arguments (and I enter the special characters as Command Line input), it outputs garbage.
If you are using an ssh client such as PuTTY, check that it is using a UTF-8 charset as well.

Categories