Handling newline character in input between Windows and Linux - java

I think this is a standard problem which may have been asked before but I could not get the exact answer so posting the issue.
The issue is that our server is running on a linux box. We access the server over the browser on a window box to enter data into field which is supposed to contain multiple lines which user can enter by pressing the enter key after each line
Abc
Def
GHI
When this input field (this is a text area),is read on the linux machine, we want to split the data based on new line character.
I had three question on this.
Does the incoming data contain "\r\n" or "\n"
If incoming data does contain "\r\n", the linux line.separator property (vm property) would not work for me as it would say "\n" and therefore may leave "\r" in the data.
If "\r" is left in the data, if I open the file on a windows machine, will this mean a newline character?
Finally can anyone tell me the standard way to deal with this issue?

The standard java.io.DataInputStream and java.io.BufferedInputReader both handle this automatically through the readLine() method. You really should not use DataInputStream for this since it does not support character sets correctly and it's readLine() has been deprecated for a long time.
For text output, you can use java.io.PrintWriter in which it's printLn(), and related methods with parameters, output the correct newline sequence for the current platform. java.io.BufferedWriter also handles this correctly and provides a public newLine() method.

Linux uses \n.
Windows uses \r\n.
Therefore, unless you've tweaked something in linux, it should be coming in \n.
You could regex out \r\n and \n and replace with whatever you want to avoid problem 3.
http://en.wikipedia.org/wiki/Newline

Rather than using regular expression, you can also make it simpler by doing something like.
StringBuilder sb=new StringBuilder();
// append your texts here and to go to a new line use
if(System.getProperty("os.name").startsWith("Windows")){
sb.insert("\r\n");
}
else {
sb.insert("\n");
}
So if your local environment is windows , you can have this working locally and will also work if you're deploying to a different linux based environments.

Probably try this?
String[] lines = inputString.split("\r?\n");
Not 100% sure about the syntax but the basic idea of the regex is: "zero or one \r, and exactly one \n". Or, if you just want to normalize the input:
inputString = inputString.replace("\r?\n", "\n");
Doesn't seem very painful to me. ;-)

Thanks for the response guys.. Finally looking at suggestion given by Kevin, we used StringReader and BufferedReader wrapper over it to overcome the issue. We used string reader as data is read as a string from the request.
Hopefully this question helps people in future

Related

Java: PrintWriter and newlines in a string

My question is pretty straight forward, if I have a single long string with alot of "\n" newlines within it, i.e:
strings = "Hey\nThere\nFriend\n"
And use a PrintWriter in Java to do the following:
PrintWriter save = new PrintWriter("test.txt");
save.println(strings);
save.close();
Will the file I end up with be formatted with the \n? i.e the file will have:
Hey
There
Friend
Or will it have:
Hey\nThere\nFriend
If it's the latter, can someone guide me on how I might change my code (and understanding of how PrintWriter works) to create the former output?
In fact, \n will work but only for Unix based OS. Windows based OS use \r\n as separator.
You should avoid using specific OS line separator if you want to write a portable code.
Favor System.lineSeparator() to not be OS dependent.
Note also that PrintWriter provides println() to achieve a break line that is not OS dependent (even if it is not necessary useful for you use case)
You will get a text file containing a single text line Hey\nThere\nFriend\n followed by your operating system new-line sequence (inserted by println()).
The meaning of \n depends on the operating system and possibly the text editor. On Linux \n usually will be interpreted as new-line sequence but on Windows the new-line sequence is \r\n so most text editors (e.g. native Notepad) will display a single HeyThereFriend line.
On windows platform \n means char(13) +Char(10) you can use
String nl = Character.toString ((char) 13)+Character.toString ((char) 10);
String strings = "Hey"+nl+"There"+nl+"Friend"+nl;
System.out.print(strings);

JTextPane and newlines

I'm writing a program that (at one point) makes a command-line call to another native application, gets the output from that application, and puts it into a JTextPane as a String. The problem is, it doesn't seem to grab the newline characters the way it should. Because I'm using linux, each line ends with a ^M instead of a \n.
Is there any way to tell Java to look for those and create a newline in the string?
private void getSettings() {
Commander cmd = new Commander();
settings = cmd.getCommandOutput("hdhomerun_config " + ipAddress + " get /sys/boot");
settingsTextPane.setText(settings);
}
I end up with the output barfed into one line and wrapped around in the text pane.
As I recall Unix displays ^M for the carriage return character \r so you could try to replace it by using the replace method of the String class
settingsTextPane.setText(settings.replace('\r', '\n'));
Thanks guys, I looked through my code again and realized I was reading the output from the program one line at a time, and just appending the lines. I needed to add a \n at the end of each line that I read. Correct me if I'm wrong, but I believe Java automatically corrects newlines based on your operating system.

In which OS that Java runs on is "\n" not a valid newline sequence?

I know Java recommends the use of line.separator for \n newline because some operating systems may not necessarily recognize \n. What operating systems are these?
http://docs.oracle.com/javase/tutorial/essential/environment/sysprop.html
On Windows, \r\n is the newline sequence.
The original Mac OS (but not OS X) uses \r.
I'm fairly certain that Java translates the \n into the appropriate newline character for the OS you are using, thus making your job as a programmer easier.
---- EDIT ----
As it turns out what I wrote above might be incorrect, although I believe it is correct for certain scenarios e.g. within UI controls etc. But here are two methods on two different classes you can use to write the OS-appropriate newline character(s) to a file, rather than trying to manage it yourself:
PrintWriter.println()
and
BufferedWriter.newLine()

Carriage returns/line breaks with \n in strings in Android

I'm writing an Android app where I'm writing a file to disk with one data value per line. Later, such files can be read back into the app, and this simple data format is deserialized back into an array. At the moment, I'm delineating data values/lines in the serialization and deserialization code with \n.
How does Android handle carriage returns and such line breaks? Can I use \n safely in this context?
Its better to use
String lineSep = System.getProperty("line.separator");
Yes, the line separator under Windows is \r\n (CR+LF), on Mac \r and on Linux (Android) \n. Java has a clever reader which returns the line without separator, and a println which uses the platform setting System.getProperty("line.separator").
1 - like the two people before me I say, System.getProperty("line.separator") is your friend.
2 - As android is based on linux it is most likely that the line.separator property is in fact \n
3 - When reading or writing your files use buffered reader and writer an respectively their methods readLine and newLine. You should not have any trouble dealing with input/output files from/for other platforms.

Get filename as UTF-8? (ä,ü,ö ... is always '?')

I have to read the name of some files and put them in a list as a string. Its not so hard I just have some Problems with some characters like ä,ö,ü ... they are always as a '?' in my string.
Whats the Problem? Well the encoding. Ok this should be easy... thats what i thought. So I tried to use functions like:
new String(insert.getBytes("UTF-8")
or
new String(insert.getBytes("ISO-8859-1"), "UTF-8")
because the most of the files are ISO-8859-1
Its not helping. This is my code:
...
File[] fileList = dir.listFiles();
String insert;
for(File f : fileList) {
...
insert=f.getName().substring(0,f.getName().length()-4);
insert=insert.charAt(0)+insert.substring(1,insert.length()).toLowerCase().replaceFirst("([0-9]*(_s?(i)?(_dat)?)*$)", "").replaceFirst("_", " ");
...
System.out.println("test UTF8: " + new String(insert.getBytes("UTF-8"))); //not helping
System.out.println("test ISO , UTF8: " + new String(insert.getBytes("ISO-8859-1"), "UTF-8")); //not helping
...
names.add(insert);
}
At the end there are a lot of strings with '?' characters in my list.
How to fix the problem? And whats the best way if there are not only ISO-8859-1 files? (lets say there are a lot of unknown encoded files)
Thank You!
Given the extended comments back and forth under the question, it now looks like this is either a font problem or (perhaps more likely) a filename encoding problem.
I asked Lissy to run the following command to let us figure out what the problem is. If she is sure that the filename contain "ä" in them, but that character does not appear when she ls the filename, then this command will tell us whether this is a font or encoding problem.
touch filenäme
ls filen*me
If this shows "filenäme" in the output of ls then we know the problem is with the creation/copy of the files onto this system. This could happen if the program which created the files didn't realize what the filesystem encoding was or was too stupid to do the right thing. The convmv program will probably be the best way to fix this.
convmv -f ENCODING -t utf8 -r .
The question is what is the proper encoding. Possibilities include UTF-16, cp850, or perhaps iso8859-1. convmv --list will show you the list of currently known (to your system) encodings. Since the listed command above only shows you what it might do, it is safe to run several times with different encodings until you find one which works for all files.
If this is a font problem, we'll have to look into that
Unexpected question marks, spalts, etc in a String are a sign that something somewhere doesn't recognize a particular character when converting from one character set to another.
In your case, the problem could be occurring in a couple of places:
It could be occurring when your Java program is reading the file names from the directory (in the dir.listFiles() call).
It could be happening when you print the characters to the console stream.
In either case, the root cause is most likely a mismatch between what Java thinks the locale settings should be and the settings that the operating system and/or command shell are using.
As an experiment, try to list a directory containing the problematic file names from the command line. Do you see question marks or other splats there?
A second experiment to perform is to modify your Java program to dump one of the problem Strings as a sequence of numbers representing the character codes for each of the characters. Do you see the character codes for an ASCII / Unicode '?'.
The encoding of the content of the file name has nothing to do with the encoding of the file name itself.
You should get correct results from System.out.println(insert)
If you don't, it means that the shell has a different character encoding that the default character encoding for your system (this rarely happens; it would usually be the result of an explicit command to switch encodings in the shell).
If the file names are displayed correctly when you list the directory in the shell, I would expect them to be displayed correctly without specifying an encoding in your Java program.
If the shell is incapable of displaying the character (it is substituting the replacement character 0xFFFD (�) for these unprintable characters), there's nothing you can do from your Java application to change that. You need to change the terminal character encoding, install the right fonts, etc.; that is a operating system issue, not a Java issue.
At the same time, even if your terminal can't display the correct results, the Java program should be handling the character encodings correctly without your intervention.
The library behind the File API is figuring out the correct character encoding for your system and doing the necessary decoding into characters. Likewise, the database driver should negotiate with the database to determine the correct encoding, and do any necessary encoding into bytes on behalf of your application.
In a comment you wrote:
#mdrg: well, theres a Problem. I have to read the name of the files and then put them into a database. And there are a lot of '?' , that shouldnt be... – Lissy 27 mins ago
My guess is that the column you're inserting the filenames into specifies US-ASCII as the encoding and replaces characters outside that range with a replacement character, which in your case is the question mark.
So you have to find out the encoding for the column in your database table where you store the filenames. Various products have various syntaxes for retrieving that information.
In Java 1.6 you can use System.console() instead of System.out.println() to display accentuated characters to console.
public class Test {
public static void main(String args[]){
String s = "caractères français : à é \u00e9"; // Unicode for "é"
System.console().writer().println(s);
}
}
and the output is
C:\temp>java Test
caractères français : à é é

Categories