How do I use Chinese characters in Java? - java

I'm trying to use Chinese characters in Java GUI components. I have changed my keyboard output to Chinese and can type in Netbeans in Chinese. Further, I can compile these Java files. However, when I run these programs, the characters are displayed as English question marks. What can I do to change this?

First, you have to make sure that you are compiling with a suitable setting for the encoding option to javac.
Second, you have to be running with a suitable character encoding. In most cases, setting -Dfile.encoding=UTF8 will do it, but it also depends on what sort of program (command line? GUI?) and what environment you're running it in.

I don't think the shortest answer is best here so I recommend you read this excellent blog post on "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" by Joel Spolsky.
I think the couple minutes spent reading this will be more than worth it.

Related

Printing up a line in java console (reverse of '\n')

Foreword
I've seen some questions related to this in other languages and done quite some research on this. First, before we begin going down the rabbit hole, I must describe why I'm doing this. I'm pretty printing a binary tree and I found a way to do it that I don't think anyone else has found (can't find it anywhere on the web I've searched forever). It solves a lot of the problems related to spacing and conflicts and I'll probably provide it as an answer to prettyprint Btree questions later. Anyways, an essential part of doing the above is to be able to move the line feed (the cursor) one line up. I will also settle for moving the carriage return one line up.
I have made three EDITs to this question named "Edit 1-3" near the bottom.
My specific Windows version (might be relevant): Windows 10 v1607
OS build: 14393.2035
The problem??
The problem is that this is easily done with ANSI escape characters
ANSI escape codes: https://en.wikipedia.org/wiki/ANSI_escape_code
With "\033[A" being the specific code I want to implement (got it from a question about moving one line up in python).
The problem
So problem solved right? Oh no, not even close. The problem is, apparently windows had this """wonderful""" update that may have broke everything ANSI/console related. The below links to a question about this but with different ANSI requirements however the root problem is the same.
How to print color in console using System.out.println?
tl:dr It doesn't work for windows for some reason, you need a big library to do this or something else.
Sooo then I tried doing some in-depth research on the update itself and why this happens to be windows specific (of course) and I was led to this.
the root cause of windows problem: https://github.com/Microsoft/WSL/issues/1173
tl;dr:
Basically, the update changed a default setting for cmd.exe which is the default console that java uses. What changed was how the console setting for ANSI codes are processed for 'child applications' AKA java (stuff that uses the console but isn't console itself). The default Before the update was that child applications inherited the default of console which was to have ANSI enabled. NOW (because apparently, the previous default broke things) they changed it so that child applications DO NOT inherit the default settings from the console. This means java has to effectively set this console mode itself if java wants to have ANSI enabled. There are obviously a plethora of libraries for the java command to do that but of the good available ones, there are some issues. Firstly they require maven and I don't want maven, also I don't like NEEDING a million extra libraries to do something very simple and it adds to the number of things my code relies on (which is bad). So I'm trying to avoid simple solutions like "just use JANSI" to setConsoleMode for ANSI.
What I've tried after this
after I was done reading/understanding that I tried going into my Registry Editor and trying to change the default setting of the console (cmd.exe) to always enable ANSI even for child applications (AKA java). It had been suggested in the GitHub discussion but to my surprise the specific setting wasn't even there for me. (it was supposed to be in Computer -> HKEY_CUURENT_USER -> Console between "TrimLeadingZeros" and "WindowAlpha").
Try 2
Since I couldn't edit the console setting without big libraries full of things I don't want, I tried thinking outside the box and I messed around in java settings. I discovered where exactly java sets it's console and discovered that I could change that setting to use a different console in java. I had recently installed git and known that git bash was available so I tried that using the below question (and google in general) as a starting point.
https://superuser.com/questions/1196463/start-sh-exe-bash-with-given-path
Third time is the charm?
I couldn't get the darn thing to work. It didn't return any errors or do anything new, like at all. It was even set 'for the current project' so it should've done something different when I ran my project no? I believe part of the problem is that my root git folder is screwy (not in program files). This is probably because the computer I'm using is a work computer and maybe some setting there affects where git was installed, I don't know but what I do know is that when I changed consoles I tried this path:
C:\Users\abbotts1\AppData\Local\Programs\Git\bin\bash.exe
I also tried:
C:\Users\abbotts1\AppData\Local\Programs\Git\bash.exe
and
C:\Users\abbotts1\AppData\Local\Programs\Git\bin\sh.exe
and
"C:\Users\abbotts1\AppData\Local\Programs\Git\bin\sh.exe" --login -i
And after each change of console I tried this code:
public class ExpressionEvaluator {
public static void main(String[] args) {
System.out.println("1");
System.out.println("\033[A");
System.out.println("2");
}
}
And I always got:
1
Extra Line here
2
As the output. If I removed the 'move cursor up' ANSI println statement attempt I got:
1
2
What I wanted was:
2
1
(the whole point being, I'm able to move up a line freely)
I've also tried making them print statements and that didn't work either.
So I'm now at wits end
So here is what I want, an answer something like:
A nonintrusive way to change the setting for cmd.exe to allow child applications like java to use ANSI BY DEFAULT when ran
OR a nonintrusive library that doesn't require maven and a million other things to enable ANSI so I can run the ANSI in java in the console and get my desired output
OR a programmatic workaround that allows me to effectively print one line up without this whole ANSI thing
OR help to configure IntelliJ console so that I can actually use a different console and use ANSI to print one line up
To address the obvious concerns
Since 1 and 4 are superuser questions and 2 is 'I need a library' and offtopic I'm more talking about point 3 here. If there is no workaround for this and the answer is one of the offtopic ones just tell me there is no workaround. I don't know how else to ask this question since it is a 'programming specific problem' it just has many solutions some of which are not 'on the topic' because they aren't programming solutions. If the only answer is 'the solution isn't on topic here' then I'll go ask in the appropriate place. Let me know if I should delete this question when I go to ask it somewhere else.
What I think might work for on topic part of this
Since 3 is on topic here I'll discuss what I'm thinking:
Maybe I could make some sort of system to print only to certain arrays instead of moving up and down lines (i.e. have an array to represent the lines, traverse the tree and instead of moving up and down lines, just switching the array to print to). I don't see this as very efficient and Its kind of a waste of arrays and processing power but if its a solution I'm willing to hear it. That's all I can 'think of' but most of what I've tried is trying to get ANSI to work.
For future reference:
what is the appropriate way to ask these questions with multiple off-topic solutions/ solution questions? Is it better just not to ask them at all? Do solutions that involve questions constitute a chameleon question? I don't feel like bringing the meta effect upon myself.
Edit 1: where I'm at so far:
I've tried the whole git bash thing again and boy was it a process. The actual git bash path in the terminal needs to be in quotes with the --login -I arguments coming afterward. Ex:
C:\Git\bin\sh.exe --login -i
This setting is for the java terminal. THEN you have to set a windows environment variable named PATH to your java JDK. Go to your path environment variables (there are a million and one youtube videos for that) and set a new variable named PATH to your JDK. Ex:
C:\Program Files\Java\jdk-10.0.1\bin
This question:
How can i make gitbash find the javac command?
Goes through that process extensively
Tl;dr You have to set a windows path variable this in order for git bash to recognize the javac and java commands.
Once that's done you need to actually run your java and javac commands like you would in any terminal. BTW be careful because paths in git bash require a two '\' notation instead of the normal '\' so your source directory path might look like this:
C:\IdeaProjects\Calculator\src\
Then you just run:
javac ClassName.java
java ClassName
BUT THEN it doesn't actually print the ANSI output, it prints the raw escape characters. Also, I found out I was using the WRONG escape sequence (I had the wrong number to represent the 'esc' button since the 'esc' key is represented as some number, but I had the wrong hex number I was using like x330 or something). I also learned that the notation is like this:
'esc key hex number' + '[' + 'parameter hex values separated by commas'
so this might look like:
\x1B [ A
where the actual letters and numbers are hex value stuff (without the obvious 0x...) and the first escape hex value has an x in it (why?). Anyways, when running them in Java as strings you need to escape the escape character (duh right?) with an extra '\' so, for example, the code might look like:
System.out.println("\\\x1B [A");
I just noticed that stack overflow escapes these too so I
actually have three '\' but for you guys, it only displays two '\',
weird right? Anyways back to what I was saying
BUT STILL, the output doesn't actually work! This is where I'm at. I've done the above and I 99.9% know git bash is installed right and runs fine but when I run this:
public class ExpressionEvaluator {
public static void main(String[] args) {
System.out.println("1");
System.out.println("\\x1B [A");
System.out.println("2");
}
}
I get this in java console (not git bash??):
1
\x1B [A
2
and this in git bash:
1
\x1B [A
2
What I actually want is:
2
1
Because the ANSI escape character is supposed to move my cursor/linefeed/whatever, one line upwards. The same thing happens if I run the above code but instead use the ANSI code: "\x1B [F". Only raw ANSI is output. I'm pretty sure git bash was supposed to be 'natively ANSI aware' and I've seen people say that on websites so I don't know why it isn't working.
And I still don't know for sure if those are two separate console outputs or the same console output. I really can't tell so if anyone wants to leave a comment saying 'yeah its the same dummy', I'd appreciate it because I can't find a definitive source out on the web that it is. I think it is but nothing other than the console setting in IntelliJ indicates that as true.
I've heard rumors of a TERM variable that needs to be set or otherwise manipulated on windows. I've checked myself what it is using:
echo $TERM
in git bash and I got back:
cygwin
So I don't exactly know if that's good or bad because I've literally gone through all the search terms you can think of and they all lead to the same basic page of results for 'git bash colors not working' and most of them involve windows 7 (don't have it) installing maven/jansi (don't want or shouldn't need it) or some other language that isn't Java and using some other IDE which isn't IntelliJ. Some pages that do have my specific requirements have said something about TERM supposed to be xterm or some other thing like xterm-256 or something for 'color' output, something like this. I am so unfamiliar with this stuff so I don't even know where to begin.
Too long give us a tl;dr
I need to know why git bash is printing raw ANSI instead of actually using the ANSI.
what I know
I'm using git bash with IntelliJ, 99.9% sure I have my path set correctly, I am able to run my java class from git bash, I have it set as the IntelliJ terminal and I currently have the windows TERM variable set to cygwin.
What I don't know
I don't know what TERM needs to be and can't find it on the web, I can't tell if the IntelliJ console that appears when I click the green arrow 'run' is the same as the git bash console, and I can't figure out if some other thing is preventing me from actually interpreting the ANSI.
What I need
I need a simple explanation, something straight out of r/ELI5 of what is wrong with git bash if anything and how to fix it. If it can't be explained simply or nothing is wrong then maybe I'll try another supposedly 'natively ANSI aware' terminal. I think Powershell was another option that was listed. My best bet is that the TERM variable needs to be something else, or git bash was never really natively ANSI aware and capable, to begin with. I've seen other questions with the same problem for colors but their fixes are for older versions/different languages and things or they don't actually work. I have yet to find a good page for 'git bash outputs raw ANSI in IntelliJ' and I've used variations of those exact words for hours now. All I can get is long GitHub discussions on the 'bugs' related to this and they confuse me, don't lead to solutions, and may be active or just don't contain any resolution.
Edit 2
After doing some more research I've learned that my previous escape code was correct:
\033[A
\x1B[A
should be similar.
I also learned that it isn't the console I'm using that's the problem, its Windows itself. I now know this because I've tried compiling and running on cmd.exe , git bash and powershell. To change the default setting talked about earlier (consoles not supporting ansi for child applications) you have to EXPLICITLY enable it via the program itself rather than rely on a console or something.
link to a question that explains this in Python here:
How to use the new support for ANSI escape sequences in the Windows 10 console?
tl;dr
The method that they use is something called getConsoleMode and setConsoleMode and VIRTUALTERMINALPROCESSING flags. Apparently you need to use these to actually set the console mode to support ANSI. I don't currently know if those are things that are in java hidden somewhere or (likely) something that needs to be added to the base java libraries. I'm going to try and figure out how they actually get the ctypes thing from that question (seems to be what they're importing to sue these methods) and get the methods I need. Once I do I'll post that as an answer unless one of you figures it out before I do and can explain it better.
Apparently, the escape sequences work fine if you only use them from the console but If you use a "child application" then they don't work. So at the very least we now for sure know the root cause of the problem.
Edit 3
Found this which is highly relevant especially the console virtual terminal sequences section (lefthand navbar):
https://learn.microsoft.com/en-us/windows/console/console-virtual-terminal-sequences
This which, near the bottom has a whole C-implementation of how to enable the console to read ANSI. Apparently, this doesn't need libraries at all but the process to actually change the console defaults to use code like this requires sysadmin privileges, intimate knowledge of the program files and a whole host of other things (at least if you're before the windows 10 update when color support was changed). Now it's still disabled by default but can be enabled. I don't yet know how to try ANSI from the console directly. I've tried multiple
echo \x1B[(insert ANSI code here)
but none of the commands seem to work in ANY terminal (cmd, git bash, powershell). They just return the raw code
\x1B[(whatever the ANSI code was)
I obviously am new to the console so I might be using the wrong command, if so feel free to enlighten me but the examples I've seen use echo. ANYWAYS, I thought that calling the ANSI directly from terminal was supposed to work since its supposedly enabled by default just not child applications (post windows 10 update) BUT maybe its not, maybe its disabled by default and even when enabled its still disabled for child applications (Java) unless explicitly changed in said child application (java). I'm going to try to see if it's possible to enable ANSI directly from the console, or if the linked C-code needs to be directly ported to Java or ran in the console just to work. The problem is I don't know how to get the imports/includes that the C-code uses and use it for code in java. I'd rather not just accept a coded 'solution' in C and try and use that alongside java code, I'd rather translate, understand it better and have my own code that does the same thing.
Another option I've been told is something called ANSICON which is like some sort of plugin that you install in the console with the -i flag and that's supposed to enable ANSI at least in-console. I found this
https://community.liferay.com/blogs/-/blogs/enable-ansi-colors-in-windows-command-prompt
the above explains that process in a little more detail.
My specific windows version and the version of the update
Another thing I learned was that the specific version of windows 10 that the 'update' that changed ANSI console behavior was like windows 10 v1151 or something, I'll try and find the webpage to source this directly but I currently have windows 10 v1607 so I think I should be good. Included in that was the actual OS build which I have 14393.2035 and I think that was identified as a particularly intermediate update to this process in one of my previous links (I believe its the one with the whole GitHub discussion about the update, you can find it here: https://github.com/Microsoft/WSL/issues/1173). I have a work computer so I can't really make system updates because I'm not an administrator and I doubt IT would let an intern go around updating work computers.
Anyways, I'll continue on trying to see if I can get the C-code to java, I'll test it and then try and post an answer. If ya'll are ahead of me let me know.
The python way you reference in "Edit 2" (https://stackoverflow.com/a/36760881/309816) simply invokes Windows-specific native code (kernel32, which is non-portable) to "fix" this.
I suppose you are OK with that and want to do the same in Java (i.e. invoke kernel32 when you detect Windows)...
A very lightweight library for achieving the same in Java is JNA which has out-of-the-box wrapper for kernel32 (see: https://java-native-access.github.io/jna/4.2.1/com/sun/jna/platform/win32/Kernel32.html)
You seem to be after this method: https://java-native-access.github.io/jna/4.2.1/com/sun/jna/platform/win32/Wincon.html#SetConsoleMode-com.sun.jna.platform.win32.WinNT.HANDLE-int-
Hope this works for you.
EDIT: technically, you only need jna.jar (see getting started here: https://github.com/java-native-access/jna/blob/master/www/GettingStarted.md), but I would suggest you also use jna-platform.jar so that you don't need write the code that generates the mappings for kernel32 at runtime yourself.
Home for JNA: https://github.com/java-native-access/jna
I think adding 1 (or 2 if you add jna-platform) jars that have a very specific scope (doing native calls without all the JNI preparation overhead) is lightweight enough. You don't need to generate any headers, or change anything in your compilation process. It will just work by adding those jars to your classpath.
You should also clarify in your question that this is about Windows. Maybe edit the title to: "Printing up a line in java console (reverse of '\n') on Windows" as this is really about a platform-specific concern that you want to address with Java.

special characters from git to eclipse

So I am cloning some java files from git to eclipse. There are special characters like Ü è. letters you would see in the Spanish language that java normally does not like. When I open up the project in java it turns them into the square with the ? in the middle of it and java complains about it saying there is a special character problem. It wouldn't be that big of a problem but I'm doing it for work and there are A LOT of code and go through and a lot of special characters. Anything I can do about this to either make java like it or not change the characters when I go from git to eclipse?
(Assuming that when you say "I open up the project in java" you actually mean opening the project in Eclipse:)
You need to do two things: first figure out what the file encoding is, second change your settings so Eclipse would use that encoding. Figuring out the encoding can be troublesome. With a git remote repository on Unix the obvious guess would be UTF-8, with Windows UTF-16 would also come into my mind as a possibility. In worst case you can always open your file in a hex editor and check how are your special characters actually encoded. After that making Eclipse use that encoding is easy. (And you may consider changing it only for this special project of yours.)

Atom Processing having problems with folders with spaces

I have recently starting using the Java based language Processing.
I started off with using the standard processing editor that has been installed on my Windows machine, but didn't quite take to it. It's not very customizable and lacks things like highlighting variable and function names throughout the code.
So I decided to use Atom instead, and so far it's been great. Although with one problem:
I can't build sketches that have spaces in their directories.
If I want to build a sketch that has the path...
C:\Users\Sulu\Documents\Processing\Test Sketches\Test\test.pde
I get the message:
DPI detection failed, fallback to 96 dpi
C:\Users\Sulu\Documents\Processing\Test does not exist.
[Finished in 1.008s]
I'm sure this is down to the fact that there is a space in the path.
My question is this. Is there anyway that I can get Atom, or maybe it is the 'processing-java.exe' I need to modify, to ignore space names in the path?
To automatically add doublequotes?
I'd be really grateful to any help with this as I have a lot of sketches that have spaces in their path names and renaming them all would be tedious.
Thanks.
It is a part of Processing language, it is a rule, just like syntax which you have to follow and can't be modified, at least as far as I know.
This is what the official github wiki says -
Names of sketches cannot start with a number, or have spaces inside. This is mostly because of a restriction on the naming of Java classes. I suppose if lots of people find this upsetting, we could add some extra code to unhinge the resulting class name from the sketch name, but it adds complexity, and complexity == bugs. :)
So, I am afraid there is no solution to your problem other than renaming your sketches, at least not yet!
EDIT :
Seems like I misinterpreted the question a little bit, I would suppose your sketch works with processing editor but not in external editors because they use cli to compile your project, and the instructions do contain the file path, which can not have spaces when present in bash, if they do, they need to be enclosed within quotes or spaces need to be "escaped" with \, which is again, controlled by processing and I don't think you can somehow modify "processing" to add quotes to paths during "building" or escape spaces \ so the answer still remains the same.

Java - Unmappable character for encoding Cp1252

I'm working on this extra project, and I just took a break for a couple of days. I come back and try to debug a problem that was occurring and this appears.
The compiler reports 100 errors (three types - unmappable character for encoding Cp1252, illegal escape character, illegal character: '\u####' (See screenshot for examples of what # can represent), the google search I've done leads me to believe that this is "unicode")
This error appears for 3/4 of my source files when I try to javac them (tested each file individually)
Being rather new to java (I'm in Computer Science 120 at highschool...) I haven't the slightest clue what to do about this error.
Google search talks about special characters (such as trade-mark characters). Of which there is no such thing in my code any wheres, as well as encoding.
While I don't understand really understand this encoding thing, from what I've read in the google searches, it sounds like each operating system uses a default encoding format and JVM will use the set format. However, I've javac the program (and several other programs) several times, furthermore the code hasn't changed for roughly a week (due another issue I've been having).
Further after commenting out everything related to my other issue, this problem still appears, despite that I know my program should be javac-ing fine with the code I've commented out.
With no changes for the past week, I haven't the slightest clue what could be possibly causing this, noting this error has not appeared for me before.
As for my question... What's happening? How do I fix this?
I'm using -
Windows 7
Java 8
Notepad
Command-Prompt

IntelliJ and internationalisation: accented characters

I have a java application which has a GUI in both English and French, using the standard Java internationalisation services. I wrote it in JBuilder 2005 on an old machine, and recently upgraded, which has meant changing IDEs. I have finally settled on IntelliJ.
However, it doesn't seem able to handle the accented characters in my ListResourceBundle descendants which contain French. When I first created the IntelliJ project and added my source (which I did manually, to be sure nothing weird was going on behind the scenes), I noticed that all the accented characters had been changed into pairs of characters such as é. I went through the code and corrected all of these, and assumed that the problem was fixed.
But I find on running the (rebuilt) project that the pairs of characters are still showing, instead of the accented characters that I see in my code!
Can someone who has done internationalisation in IntelliJ please tell me what I need to do to fix this?
PS: I'm on the Mac.
Two things --
First, make sure your files are being stored as UTF, and that your source control supports the encoding.
Second, consider using the resource bundle editing support built into IntelliJ http://www.jetbrains.com/idea/features/i18n_support.html
Java resource bundles should only hold ascii and Unicode escape codes
see [http://java.sun.com/developer/technicalArticles/Intl/ResourceBundles/].
e.g. \u00d6ffnen for German Öffnen.
The command line tool native2ascii converts from your native format to ascii plus unicode escape codes. It is a bit of a hassle but not an Intellij but a Java problem.
Note: I use Intellij on a Mac to create programs localized in English, German and Japanese.

Categories