How to use UTF-8 / Unicode in J-Creator?

How to use UTF-8 / Unicode in J-Creator? - java

I'm trying to do a project for a programming class, and I need to figure out how to be able to get unicode working in J-Creator, if possible. I haven't been able to find anything so far. When I try to print a word in a non-latin alphabet, such as "цитата", it prints "??????". How to I get UTF-8 in J-Creator?

I don't use JCreator, and therefore couldn't verify this answer, but it may still be helpful since there are three general points to consider when rendering UTF-8 text from a Java application to a terminal window:
Assuming that you are using the PrintStream method println() to display your text in JCreator's terminal window, you probably need to instantiate that PrintStream to explicitly use the character encoding UTF-8:
try{
PrintStream outStream = new PrintStream(System.out, true, "UTF-8");
outStream.println("цитата");
} catch(UnsupportedEncodingException e){
// Error handling
}
If you don't do that, and simply call System.out.println("цитата"); instead, then you will be using the default encoding at runtime. That is determined by the JVM during startup, and is unlikely to be UTF-8.
You may need to set the code page in the terminal window to UTF-8. You don't mention your operating system, but from a terminal on Windows (e.g. PowerShell, Command Prompt, etc.) you would call chcp 65001. That may or may not be applicable for your environment, and I have no idea whether JCreator even allows you set the code page when running your application.
You obviously must use a font which supports the text being rendered within the terminal window, which is Russian in your case. I don't know how JCreator determines the font to use and whether it allows you to explicitly set that font, but it can be set for a Command Prompt window on Windows.
It's possible that despite your best efforts the JCreator terminal simply won't support what you want to do. In that case your workaround will be to use a terminal window supported by your operating system and run your application there, assuming that you are able to set the code page and the font.

Related

How to get file's icon and type description in Ubuntu?

With Java 11 on Windows, I can get info about my files using:
import javax.swing.filechooser.FileSystemView
var type = FileSystemView.getFileSystemView().getSystemTypeDescription(file)
var icon = FileSystemView.getFileSystemView().getSystemIcon(file)
On Ubuntu (20.04) however, things are different. By now, I've figured out that the icon has a ToolkitImage inside instead of a BufferedImage, which is annoying because it's internal API, but I can render that now.
The remaining problem is the file type, which still returns null on Ubuntu when using the FileSystemView, or returns "Generic File" for every file if using the new FileChooser().getTypeDescription(file) way.
How can I get a proper file type description on Ubuntu?

getFileSystemView is broken
A bold claim: Whatever you wanted this to do, it won't work. Based on looking at the source. You can skip this section if you're willing to accept it's a dead end, but I best back up such a claim, so read on if you'd like to be convinced:
The sources I have here for both JDK11 and JDK17 do the following relatively simplistic approach for FileSystemView.getFileSystemView():
If the value of File.separator is \\, return the windows implementation.
If the value is /, return the unix implementation (that'd therefore be just about everything else, notably including macs).
Otherwise return the generic implementation. Let's forget about this, which OS has neither / or \\ at this point? pre-MacOSX mac os is long dead at this point, that's the only one I can think of.
The unix implementation is:
return null;
Oof. That's not going to get us very far. The windows implementation goes with ShellFolder. Which is general code; I do not understand why the unix implementation just disregards it.
Perhaps this explanation makes the most sense: .getSystemTypeDescription is intended to return the opinion of the OS itself as to how one would describe the type of file this is. The reason the unix implementation just return null;s is simply that as a concept this isn't how unix works. The OS itself doesn't have some sort of registry that maps file extensions to names (such as windows' HKEY_LOCAL_MACHINE/.txt and friends), nor does it have a concept where each file has its own metadata that contains additional info, such as 'which app created me' / 'which app should be used to open me when double clicked', such as MacOSX does. (Of course, if you do run this on a mac, you still get null, which really isn't excusable).
Of course, we now get into a more tricky debate: What is your OS, really. One could say 'well, its linux, and KDE, or GnomeDesktop or whatnot, well, that's just this app, you know'. But one could also say that you run the java app on the OS 'KDE/Linux'. In other words, what does System mean when we talk about FileSystemView. Evidently, the JDK impl source I'm looking at (which is OpenJDKs) chooses to define it as 'just linux', which has no such thing as the 'system's opinion on what type of file this is', making return null; a correct, but mostly useless, answer.
The getSystemIcon implementation of the abstract supertype itself is weird: It is a near carbon copy of the windows-specific implementation of getSystemTypeDescription - namely: Get the ShellFolder object, then ask it. I have no idea why on unix, 'just ask the shellfolder' is the implementation of getSystemIcon, whereas 'just return null' is the implementation of getSystemTypeDescription - why not also ask the shellfolder?
At any rate, even if you did, not much use there: The default shell folder implementation always returns null. This is sun.awt code so it is considerably more likely that the implementation of AWT for that specific platform overrides it, but this isn't in the openjdk sources as far as I looked, at any rate.
The default impl of getSystemIcon will return either a generic file icon or a generic folder icon (by invoking UIManager.getIcon("FileView.directotyIcon"), for example) if the ShellFolder returns null as an icon.
So let's give up on this implementation: Conclude it cannot help you.
Define 'type description'
What does that really mean? I can only foresee 3 useful takes on what this is supposed to mean:
Something that human eyeballs and brains will likely understand.
A mime type, which is a universal standard for describing file types.
"Whatever the window manager that the user is using would see in the local equivalent of a file explorer app - explorer.exe, on windows, Finder.app on mac, etc".
Presumably the getSystemTypeDescription is the method that is supposed to answer the 3rd option (the local window manager's description). But, given that OpenJDK doesn't actually implement this (well, it does, in a useless way, by just returning null), the only way you're getting that is if you put in the considerable effort to figure out how each and every popular window manager used worldwide does it and port it all over to java code. I assume you're not interested in doing that kind of work.
But the other 2 - there are ways to get that.
Let's start with mime types.
Plan A is to ask java:
import java.nio.file.*;
class Test {
public static void main(String[] args) throws Exception {
var p = Paths.get("test.otf");
Files.createFile(p);
System.out.println(Files.probeContentType(p));
Files.delete(p);
}
}
Save to that a file and run it: java Test.java (yay JDK11+ where you can just pass java files to the JVM executable), and see if it works. That is, that should be returning application/font-sfnt for you. It does, for me at any rate, with Coretto JDK17 (java -version: openjdk version "17.0.3" 2022-04-19 LTS) on Ubuntu 20.04.1.
Running it with Temurin 17 (JDK from the Adoptium project) on mac: font/otf. Oh well, that's embarrassing, perhaps. But it's not necessarily a bad answer. Unfortunately, the Mac's own Finder app has a 'type description' column and that's "OpenType® font", not "font/otf". Presumably macs have a mimetype to human readable description database someplace that as far as I know you can't access with generic java code. Still, "font/otf" is better than "an .otf file", presumably.
If the probe method isn't working for you, you can always choose to check if /etc/mime.types exists, which should exist on linuxen. For each line, .split("\\s+"); v[0] is the mime type, and the remaining elements are each an extension without the dot, e.g. my ubuntu install would list application/font-sfnt as being the mime type for types otf and ttf.
Yet another alternative is to ship a known list of extension-to-mimetype mappings. For example, The Eclipse Jetty has a MimeTypes class that is pre-filled with this sizable list of known extensions.
Steve Jobs / flash the MIME gang-sign
If you're like Steve or the MIME consortium, this whole business of treating 'the stuff after the last dot in the file name' as somehow indicative of what kind of file it is, leaves a bad taste in your mouth and you'd like to avoid it. You can, sort of. On unixen anyway. Most unix installs have /usr/bin/file - both my mac and the ubuntu install I'm looking at has this. You can ProcessBuilder.exec that. This tool does not look at the file name at all, solely at the actual content. It might be slow (reads the whole thing if it needs to), but, if I run it on an OTF file, it spits out:
actual-valid-font-file.tof: OpenType font data
which is certainly a string I could show to a user that's "prettier" than font/otf, though it isn't quite what a native mac app would show (which shows OpenType® font as mentioned before.
On windows, where file (the filetype guesser application) isn't usually available, well, it sounds like FileSystemView.getFileSystemView().getSystemTypeDescription(file) actually works. I bet the number of systems where /usr/bin/file doesn't exist, and getSystemTypeDescription returns nothing useful, is infinitesemal.
Icons
Presumably you want the same thing here: Give me that icon which would be familiar to the user, which runs into the same issue, especially on linux: Each and every 'file explorer' app has its own icon set, and there are a lot of file explorer apps - just about every window manager ships their own version of it and there are a lot of linux window managers. I'm not sure any JVM impl out there has code to fetch the right icon out of all of those different window manager implementations, and I don't think there's a standardized way to accomplish this using just plain jane disk access, either.
But, we've established you can pick up the mime type (if using /usr/bin/file, there's the --mime-type option. (My /usr/bin/file gives me application/vnd.ms-opentype, so that's 3 different mimetypes for the same thing already, boy that whole XKCD comic of 'there are 14 different standards' comes up a lot, doesn't it)
Given a mime type, there are loads of icon sets out there, free and open source.
The Oxygen icons project is a FOSS iconset hosted on github with an icon (in various sizes) for a boatload of mimetypes. You could use .getSystemIcon first, and if that doesn't return a suitable answer (bit tricky; sometimes you get a generic 'its a file' icon which you might not want), then use an icon set. You won't be matching the Look-n-Feel of the platform, but then again if this question is really just "I want to write an app in swing that looks indistinguishable from the host OS, be it windows, mac, KDE, Gnome, Xfce, Cinnamon, Budgie, or Enlightenment", the only pragmatic answer pretty much has to be: "Just give up on that pipe dream".
NB: Hoi :)

The code for getSystemTypeDescription is
public String getSystemTypeDescription(File f) {
return null;
}
which is overridden in WindowsFileSystemView, but not in UnixFileSystemView. Maybe JFileChooser suits your needs:
JFileChooser chooser = new JFileChooser();
String type = chooser.getTypeDescription(file);
I get a FileNotFoundException during the execution of getSystemIcon. Following the code, in the method getShellFolder there is this snippet
if (!Files.exists(Paths.get(file.getPath()), LinkOption.NOFOLLOW_LINKS)) {
throw new FileNotFoundException();
}
so symbolic links are not followed, and maybe that's the issue. But again, JFileChooser works:
Icon icon = chooser.getIcon(file);

How to print Unicode symbols U+2610 and U+2612 to Windows console with Java?

What I do:
public class Main {
public static void main(String[] args) {
char i = 0x25A0;
System.out.println(i);
i = 0x2612;
System.out.println(i);
i = 0x2610;
System.out.println(i);
}
}
What I get in IDE:
What I get in IDE
What I get in Windows console:
What I get in Windows console
I have Windows 10 (Russian locale), Cp866 default coding in console, UTF-8 coding in IDE.
How to make characters in console look correct?

Two problems here, actually:
Java converts output to its default encoding which doesn't have anything to do with the console encoding, usually. This can apparently only be overridden at VM startup with, e.g.
java -Dfile.encoding=UTF-8 MyClass
The console window has to use a TrueType font in order to display Unicode. However, neither Consolas, nor Lucida Console have ☐, or ☒. So they show up as boxes with Lucida Console and boxes with a question mark with Consolas (i.e. the missing glyph glyph). The output is still fine, you can copy/paste it easily, it just doesn't look right, and since the Windows console doesn't use font substitution (hard to do that with a character grid anyway), there's little you can do to make them show up.
I'd probably just use [█], [ ], and [X] instead.

Cp866 default coding in console
well yeah. Code page 866 doesn't include characters U+25A0, U+2610 or U+2612. So even if Java were using the correct encoding for the console (either because you set something like -Dfile.encoding=cp866, or it guessed the right encoding, which it almost never manages), you couldn't get the characters out.
How to make characters in console look correct?
You can't.
In theory you could use -Dfile.encoding=utf-8, and set the console encoding to UTF-8 (or near enough, code page 65001). Unfortunately the Windows console is broken for multi-byte encodings (other than the legacy locale-default supported ones, which UTF-8 isn't); you'll get garbled output and hangs on input. This approach is normally unworkable.
The only reliable way to get Unicode to the Windows console is to skip the byte-based C-standard-library I/O functions that Java uses and go straight to the Win32 native WriteConsoleW interface, which accepts Unicode characters (well, UTF-16 code units, same as Java strings) and so avoids the console bugs in byte conversion. You can use JNA to access this API—see example code in this question: Java, UTF-8, and Windows console though it takes some extra tedious work if you want to make it switch between console character output and regular byte output for command piping.
And then you have to hope the user has non-raster fonts (as #Joey mentioned), then then you have to hope the font has glyphs for the characters you want (Consolas doesn't for U+2610 or U+22612). Unless you really really have to, getting the Windows console to do Unicode is largely a waste of your time.

Are you sure, that the font you use, has characters to display the Unicode? No font supports every possible Unicode character. U+9744,9632 and 9746 are not supported by e.g. the Arial font. You can Change the font of your IDE console and your Windows console too.

Java encodings for Japanese

Our software has a script that creates different language JAR files, for Japanese we use the encoding SJIS in a call to native2asci. This worked last time a Japanese build was attempted but now seems to only work in certain contexts. For example in the following dialog the encoding seems to only work in the title bar:
Anyone have any idea about what might be causing this? Could this problem be related to a change in Java?

What exactly do you pass through native2ascii? Just to make sure, you're using native2ascii -encoding Shift_JIS, right? And you're passing text files or source files through native2ascii, right?
My only other idea is that after the text has been converted to \uXXXX format, the font you're using to display the dialog may not have all the Kanji and Kana. Explicitly set a font, and try that.

I would suggest checking these 2 things:
Make absolutely sure that the native2ascii conversions are correct. You should do a round trip conversion with the -reverse flag, and make sure that your input and output are in sync.
Double-check that your fonts used can support Shift-JIS. Those blocks and symbols that appear in the dialog text and button text look like the characters might be OK, but the fonts might not support them.
An additional word of caution: If this application is intended for use on Windows, then you really should be using the MS932 or windows-31j encoding. SJIS will work for all but a dozen or so symbols, but it turns out these symbols (like the full-width tilde) are actually used quite frequently in Japan.

I think the right way to do this is to use UTF-8 or UTF-16 exclusively. Kanji and Katakana demand special attention.

clear screen option in java [duplicate]

This question already has answers here:
How to clear the console?
(14 answers)
Closed 7 years ago.
Is there any option to clear the screen in java as clrscr() in C.

As dirty hacks go, I like msparer's solution. An even dirtier method that I've seen used (I would never do this myself. I swear. Really.) is to write a bunch of newlines to the console. This doesn't clear the screen at all, but creates the illusion of a clear screen to the user.
char c = '\n';
int length = 25;
char[] chars = new char[length];
Arrays.fill(chars, c);
System.out.print(String.valueOf(chars));

If you're talking about a console application, then there isn't a clear screen option AFAIK. A quite dirty option would be to invoke the clear screen command of the underlying OS.
Then it's something like
Runtime.getRuntime().exec("cls");
for Windows or
Runtime.getRuntime().exec("clear");
for a load of other OS. You can find out the OS with System.getProperty("os.name").

If you're talking about the console, then no. Writing to the console is just a special case of an output stream. Output streams don't know anything about the screen, as they can be just as easily redirected to a file or another system device.

For any console which supports ANSI escapes the following would work (would e.g. work in Win98 console).
private final String ANSI_CLS = "\u001b[2J";
....
System.out.print(ANSI_CLS);
System.out.flush();
...
Starting with Win NT this won't work anymore and you can either
Do a JNI call (e.g. like here: Java: Clear console and control attributes
Or write out a bunch of empty lines
Otherwise you are out of luck.
And btw. you must keep in mind that System.out and System.err don't have to be console they could be set to what ever (writing into a file e.g.) an usecase where clearing the screen wouldn't make any sense at all.

On linux, you can do something like:
System.out.println("\f");
You can also use Jcurses

To clear the screen just type:
System.out.print('\u000C');

You can also try ANSI Escape Codes:
If your terminal support them, try something like this:
System.out.print("\033[2J\033[1;1H");
You can include \0333[1;1H to be sure if \0333[2J does not move the cursor in the upper left corner.
More specifically:
033 is the octal of ESC
2J is for clearing the entire console/terminal screen
1;1H moves the cursor to row 1 and column 1

Jansi is an excellent workaround. I am an amateur coder and Jansi is easy to setup especially with Eclipse.
The following is a link to the homepage of Jansi:
http://jansi.fusesource.org/
The following is a link to a site containing a code as a demonstration of AnsiConsole class contained in the Jansi package:
http://www.rgagnon.com/javadetails/java-0047.html

For Windows, Java Console API project provides functionality to determine console size and set cursor position. Clearing the screen is trivial with that. It's a version 0.2 now so it's not exactly production ready, but it works.
Alternatively, you can simply print out some new lines via System.out.println(). 640 should be enough for everybody :-) It's not the same as clearing screen, but for user's intents and purposes it'd do.

you should give a try with JNA and try mapping native libraries:
on linux you must map C functions from ncurses library
on windows you must map functions from both msvcrt and kernel32, as clearly stated here
PS
let me known if you need some sample code

How to get correct encoding?

I have utf-8 file which I want to read and display in my java program.
In eclipse console(stdout) or in swing I'm getting question marks instead of correct characters.
BufferedReader fr = new BufferedReader(
new InputStreamReader(
new FileInputStream(f),"UTF-8"));
System.out.println(fr.readLine());
inpuStreamReader.getEncoding() //prints me UTF-8
I generally don't have problem displaying accented letters either on the linux console or firefox etc.
Why is that so? I'm ill from this :/
thank you for help

I'm not a Java expert, but it seems like you're creating a UTF-8 InputStreamReader with a file that's not necessarily UTF-8.
See also: Java : How to determine the correct charset encoding of a stream

It sounds like the Eclipse console is not processing UTF-8 characters, and/or the font configured for that console does not support the Unicode characters you are trying to display.
You might be able to get this to work if you configure Eclipse to expect UTF-8 characters, and also make sure that the font in use can display those Unicode characters that are encoded in your file.
From the Eclipse 3.1 New and Noteworthy page:
You can configure the console to
display output using a character
encoding different from the default
using the Console Encoding settings on
the Common tab of a launch
configuration.
As for Swing, I think you're going to need to select the right font.

There are several parameters at work, when the system has to display Unicode characters -
The first and foremost that comes to the mind, is the encoding of the input stream or buffer, which you've already figured out.
The next one in the list is the Unicode capabilities of the application - Eclipse does support display of Unicode characters in the console output; with a workaround :).
The last one in my mind is that of the font used in you console output - not all fonts come with glyphs for displaying Unicode characters.
Update
The non-display of Unicode characters is most likely due to the fact that Cp1252 is used for encoding characters in the console output. This can be modified by visiting the Run configuration of the application - it appears in the Common tab of the run-time configuration.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.