Character encoding between Java (Linux) and Windows system

Character encoding between Java (Linux) and Windows system - java

I have a simple program that makes a request to a remote server running a service which I believe is written in Delphi, but definately running on Windows.
I'm told the service will be using whatever the default encoding is for Windows.
When I get a response and use println to output it I'm getting some strange symbols in the output, which make me think it is a character encoding issue.
How can I tell Java the the input from the remote system is in the windows encoding?
I have tried the following:
_receive = new BufferedReader(new InputStreamReader(_socket.getInputStream(),"ISO-8859-1"));
_System.out.println(_receive.readLine());
The extra characters appear as squares in the output with 4 numbers in the square.

Unless you KNOW what the "default encoding" is, you can't tell what it is. The "default encoding" is generally the system-global codepage, which can be different on different systems.
You should really try to make people use an encoding that both sides agree on; nowadays, this should almost always be UTF-16 or UTF-8.
Btw, if you are sending one character on the Windows box, and you receive multiple "strange symbols" on the Java box, there's a good chance that the Windows box is already sending UTF-8.

Use cp1252 instead of ISO-8859-1, as it is default on windows.

Related

Issues with windows command line using utf-8 code page

In my application, I'm reading the properties file(UTF-8 encoded) containing Chinese characters and printing them on the Windows command line. But,for some reason the the messages are not getting displayed correctly(some extra characters are coming). However the same mesages are getting displayed correctly on Eclipse console and Cygwin. I've set the command line code page to utf-8(65001) and used "Lucida" font as well.
If you see the above image, on Windows it printed one extra 0 on the second line, but that is not expected;on cygwin the message was printed correctly.
Please let me know if I'm missing something. From this post , I can see that there are some issues with Windows UTF-8 code page implementation. If so, is there any other way to get over this problem ?

I can see that there are some issues with Windows UTF-8 code page implementation
oh most surely yes
is there any other way to get over this problem ?
The typical solution is to accept that the Windows command prompt is a broken disaster and move on. But if you really have to, you can use JNA to call the Win32 function WriteConsole directly, avoiding the broken byte encoding layer, when you can determine you are outputting to the console (rather than eg a pipe).

How do I get the character encoding of the currently running JVM?

I have an issue where I think my local JVM is running with a different character encoding to one that runs on a server. Is there a command I can pass to the vm to display what character encoding is has? Something like
java show 'charset'
I've been searching for ages, and all of the solutions require writing a test in java code, but I don't really want to have to write code, deploy it to the server, etc.

JavaFx application in Windows is not displaying text correctly

So I have an application written in JavaFx 2.2 that has been packaged for linux, mac, and windows. I am getting a strange issue with some of the text fields though. The application will read a file and populate some labels based on whats found in the file. When run on ubuntu or mac we get a result like as you can see we have that special accent character over the c and it looks just fine. However in Windows it shows up like this . Any idea as to why this is happening? I was a bit confused as it is the same exact application on all three. Thanks.

Make sure to specify character encoding when reading the file, in order to avoid using the platform's default encoding, which varies between operating systems. Just by coincidence, the default on Linux and Mac happens to match the file encoding and produces correct output, but you should not rely on it.

clear output in telnet

Is there any way to make a telnet app to clear the output client-side (using Java Socket connection + Buffers)? For example, the program queries the connected user for login and password and when they've succeeded logging in, I do cls for Windows or clear for Linux.

The telnet application is a terminal emulator. In really old times the only way to communicate with a computer was by using a terminal with a pure text based screen and a keyboard. The terminal sent everything you typed to the computer. The computer sent characters back that was printed on the screen. Just like telnet.
DEC created a series of terminals called VT52, VT100 etc. They was able to interpret special control sequences so that the computer could give more fancy instructions to the terminal. These control sequences was standardized by ANSI and is now called ANSI escape codes. Terminal emulators that understand the VT100 escape codes are called VT100 terminal emulators.
You may look up the ansi escape codes on wikipedia and other places. They all start with the character codes for escape and [ followd by the control characters. The control characters for clearing the screen is "2J".
So, what you need to do is sending this string from your server to the telnet client:
myOutputStream.print("\u001B[2J");
myOutputStream.flush();
You may send other control characters as well. Try "\u001B[7m" to reverse the screen.

On the Linux side, clear simply issues some terminal control characters to tell it to clear the screen. For VT terminals, that's Esc]2J. Not sure if Windows would support something similar.

Scrolling blank lines is the only way I can think of in Java if you need to support windows.

C++ socket message contains extra ASCII 0 character

So this is a really strange problem. I have a Java app that acts as a server, listens for and accepts incoming client connections, and then read data (XML) off of the socket. Using my Java client driver, everything works great. I receive messages as expected. However, using my C++ client driver on the first message only, the very first character is read to be an ASCII 0 (shows up like a little box). We're using the standard socket API in C++, sending in a char* (we've done char*, std::string, and just text in quotes).
I used Wireshark to sniff the packet and sure enough, it's in there off of the wire. Admittedly, I haven't done the same on the client computer. My argument is that it really shouldn't matter, but correct me if that assumption is incorrect.
So my question: what the heck? Why does just the first message contain this extra prepended data, but all other messages are fine? Is there some little trick to making things work?

This is most likely an encoding issue. If you're just using char * for your C++ client, you're assuming ASCII encoding (at best), while Java uses Unicode (or UTF, I misremember which) internally and emits UTF-8 (IIRC) by default.
Either have your Java server emit 7-bit/character ASCII, or have your C++ client read the encoding Java is emitting.
Ahhh. I'm going to have to spend some time curled up with Google by a fireplace to figure out how to match up the encoding, but that does give me something to go on. I'll probably need to change my Java encoding to match what C++ uses, since that matches the customer scenario. Anyone with a good link, additional info, or code snippet, please post.
If you've got your XML packed as a string, you can use getBytes() to do your encoding:
byte [] asciiEncodedBytes = myString.getBytes("US-ASCII");
EDIT: It's been a while since I've been in Java land, but it doesn't look like Java has any ASCII encoding streams in the core library. I did find this class out there which apparently will wrap an ASCII encoding stream for you. Thankfully it's in an open source project so you might be able to mine the class out of it for your server.

Not that I know of. It's time to binary-search the space of possible culprits.
I would run Wireshark on the client computer to make sure the problem really is originating there. Theoretically some misbehaving router or something could do this (very hard to believe though).
Then I would check the arguments to the socket APIs while the program is actually running, using a debugger.
At that point, if the program is definitely correct and the packets coming out of the computer are definitely wrong, you're looking at a misbehaving networking library or a bad driver.

So, the encoding thing didn't work. In the end, I simply did a substring(startIndex) call on the incoming message using xmlMessage.indexOf("<") as the starting index. It may not be elegant, but it'll work. And the box, will remain a mystery. I appreciate the insight that you three provided.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Character encoding between Java (Linux) and Windows system - java

Use cp1252 instead of ISO-8859-1, as it is default on windows.

Related

Issues with windows command line using utf-8 code page

How do I get the character encoding of the currently running JVM?

JavaFx application in Windows is not displaying text correctly

clear output in telnet

C++ socket message contains extra ASCII 0 character

Categories

Resources