Save to file with special characters - java

My program works fine in my MAC, but when I try it on WINDOWS all the special characters turns into %$&.. I am norwegian so the special characters is mostly æøå.
This is the code I use to write to file:
File file = new File("Notes.txt");
if (file.exists() && !file.isDirectory()) {
try(PrintWriter pw = new PrintWriter(new BufferedWriter(new FileWriter("Notes.txt", true)))) {
pw.println("");
pw.println("*****");
pw.println(notat.getId());
pw.println(notat.getTitle());
pw.println(notat.getNote());
pw.println(notat.getDate());
pw.close();
}catch (Exception e) {
//Did not find file
}
} else {
//Did not find file
}
Now how can I assure that the special characters gets written correct in both OS?
NOTE: I use IntelliJ, and my program is a .jar file.

Make sure that you use the same encoding on windows as you do on mac.
IDEA displays the encoding in the right lower corner. Furthermore, you can configure the encoding Settings -> Editor -> File Encodings.
It's possible to configure the encoding project wide or per file.
Furthermore, read java default file encoding to make sure, reading and writing files will always use the same charset.

Related

Writing to Buffered Writer UTF-8 Characters With Accents Are Coming Out Garbled

I am reading from a UTF-8 input file with accented characters, reading the lines and writing them back to a different file (also UTF-8) but the accented characters are coming out garbled in the output. For instance the following words:
León
Mānoa
are output as:
Le�n
Manoa
I've looked at about 100 answers to this question which all suggest reading and writing the files as the code indicates below, but I keep having the same result.
I've broken down the code to the elemental features below:
public class UTF8EncoderTest
{
public static void main(String[] args)
{
try
{
BufferedReader inputFileReader = new BufferedReader(new InputStreamReader(new FileInputStream("utf8TestInput.txt"), "UTF-8"));
BufferedWriter outputFileWriter = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("utf8TestOutput.txt"), "UTF-8"));
String line = inputFileReader.readLine();
while (line != null)
{
outputFileWriter.write(line + "\r\n");
line = inputFileReader.readLine();
}
inputFileReader.close();
outputFileWriter.close();
System.out.println("Finished!");
}
catch (IOException e)
{
e.printStackTrace();
}
}
}
But this still results in garbled characters in the output file. Any help would be appreciated!
Try this:
String sText = "This león and this is Mānoa";
File oFile = new File(getExternalFilesDir("YourFolder"), "YourFile.txt");
try {
FileOutputStream oFileOutputStream = new FileOutputStream(oFile, true); //append
OutputStreamWriter writer = new OutputStreamWriter(oFileOutputStream, StandardCharsets.ISO_8859_1);
writer.append(sText);
writer.close();
} catch (IOException e) {
}
I tried your code with your examples and it works without problems (characters are not changed or lost).
Few tips when you deal with charsets in Java:
Default character encoding in Java is the character encoding used by JVM.
By default, JVM uses platform encoding i.e. character encoding of your server (OS).
Java gets character encoding by calling System.getProperty("file.encoding","UTF-8") at the time of JVM start-up. So if Java doesn't get any file.encoding attribute it uses UTF-8 character encoding. Most important point to remember is that Java caches character encoding or value of system property file.encoding in most of its core classes like InputStreamReader, which needs character encoding after JVM started. So if you change system property file.encoding programmatically when application is running you will not see desired effect (change) in your application and that's why you should always work with your own character encoding provided to your application and if its need to be set than set character encoding or charset while you start JVM.
How to get default character encoding?
The easiest way to get default character encoding is to call System.getProperty("file.encoding"), which will return default character encoding if JVM started with -Dfile.encoding property or program has not called System.setProperty("file.encoding", someEncoding).
java.nio.Charset provides a convenient static method Charset.defaultCharset() which returns default character encoding.
By using InputStreamReader#getEncoding().
How to set default character encoding?
By providing the file.encoding system property when JVM starts e.g.:
java -Dfile.encoding="UTF-8" HelloWorld
If you don't have control how JVM starts up, you can set environment variable JAVA_TOOL_OPTIONS to -Dfile.encoding="UTF-16" or any other character encoding, and it will be picked up when JVM starts in your windows machine. JVM will also print Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF16 on console to indicate that it has picked JAVA_TOOS_OPTIONS.
Alternatively, you can try:
Path inputFilePath = Paths.get("utf8TestInput.txt");
BufferedReader inputFileReader = Files.newBufferedReader(inputFilePath, StandardCharsets.UTF_8);
Path outputFilePath = Paths.get("utf8TestOutput");
BufferedWriter outputFileWriter = Files.newBufferedWriter(outputFilePath, StandardCharsets.UTF_8);

can not save utf8 file in windows server with java

I have a simple java application that saves some String in utf-8 encode.
But when I open that file with notepad and save as,it shows it's encode ANSI.Now I don't know where is the problem?
My code that save the file is
File fileDir = new File("c:\\Sample.txt");
Writer out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(fileDir), "UTF8"));
out.append("kodehelp UTF-8").append("\r\n");
out.append("??? UTF-8").append("\r\n");
out.append("???? UTF-8").append("\r\n");
out.flush();
out.close();
The characters you are writing to the file, as they appear in the code snippet, are in the basic ASCII subset of UFT-8. Notepad is likely auto-detecting the format, and seeing nothing outside the ASCII range, decides the file is ANSI.
If you want to force a different decision, place characters such as 字 or õ which are well out of the ASCII range.
It is possible that the ??? strings in your example were intended to be UTF-8. If so. make sure your IDE and/or build tool recognizes the files as UTF-8, and the files are indeed UTF-8 encoded. If you provide more information about your build system, then we can help further.

Saving certain text from JTextArea to a file using JFileChooser

I have this text from my JTextArea:
Getting all .mp3 files in C:\Users\Admin\Music including those in subdirectories
C:\Users\Admin\Music\Sample Music\Kalimba.mp3
C:\Users\Admin\Music\Sample Music\Maid with the Flaxen Hair.mp3
C:\Users\Admin\Music\Sample Music\Sleep Away.mp3
Finished Searching...
I want to save only this part:
C:\Users\Admin\Music\Sample Music\Kalimba.mp3
C:\Users\Admin\Music\Sample Music\Maid with the Flaxen Hair.mp3
C:\Users\Admin\Music\Sample Music\Sleep Away.mp3
Unfortunately I can't with the code below:
JFileChooser saveFile = new JFileChooser("./");
int returnVal = saveFile.showSaveDialog(this);
File file = saveFile.getSelectedFile();
BufferedWriter writer = null;
if (returnVal == JFileChooser.APPROVE_OPTION)
{
try {
writer = new BufferedWriter( new FileWriter( file.getAbsolutePath()+".txt")); // txt for now but needs to be m3u
searchMP3Results.write(writer); // using JTextArea built-in writer
writer.close( );
JOptionPane.showMessageDialog(this, "Search results have been saved!",
"Success", JOptionPane.INFORMATION_MESSAGE);
}
catch (IOException e) {
JOptionPane.showMessageDialog(this, "An error has occured",
"Failed", JOptionPane.INFORMATION_MESSAGE);
}
}
With the code above, it saves everything from the JTextArea. Can you help me?
P.S. If possible, I want to save it as an M3U Playlist.
I'm assuming searchMP3Results is the JTextArea containing the text. In this case you could just get the text as a String using searchMP3Results.getText() and run the result through a regular expression looking for file paths. An example regex for Windows paths is on this question java regular expression to match file path. Unfortunately this ties your application to Windows, but if that's acceptable then you're good to go otherwise you should detect the OS using system properties and select the correct regex.
As far as the m3u you should just be able to export the directory paths (one per line). Extended m3u files (using the header #EXTM3U) require additional information, but you should be able to get away with the simple version.
Update: Added code
Update 2: Changed regex to a modified version of path regex (vice file) and now run it against each line instead of performing a multiline assessment
String text = searchMP3Results.getText();
StringBuilder output = new StringBuilder();
for ( String s : text.split("\n") ) {
if ( java.util.regex.Pattern.matches("^([a-zA-Z]:)?(\\\\[\\s\\.a-zA-Z0-9_-]+)+\\\\?$", s) ) {
output.append(s).append("\n");
}
}
This code splits the input into an array of lines (you may want to use \r\n instead of just \n) and then uses a regex to check if the line is a path/filename combination. No further checks are performed and the path/filename is assumed to be valid since it's presumably coming from an external application. What I mean is the regex doesn't check for invalid characters in the path/filename nor does it check for the file existence though this would be trivial to add.

saving Java file in UTF-8

When I run this program it gives me a '?' for the unicode code-point \u0508. This is because the default windows character encoding CP-1252 is unable to map this code-point.
But when I save this file in Eclipse as 'Text file encoding' = UTF-8 and run this program it gives me the correct output AԈC.
why does this work? I mean the java file is saved as UTF-8 but still the underlying windows OS encoding is CP-1252. My question is similar to, when I try to read a text file in UTF-16 which was originally written in UTF-8, the output is wierd with different box symbols.
public class e {
public static void main(String[] args) {
System.out.println(System.getProperty("file.encoding"));
String original = new String("A" + "\u0508" + "C");
try {
System.out.println("original = " + original);
} catch (Exception e) {
e.printStackTrace();
}
}
}
Saving the Java source file either as UTF-8 or Windows-1252 shouldn't make any difference, because both encodings encode all the ASCII code-points the same way. And your source file is only using ASCII characters.
So, that you should try to find the bug somewhere else. I suggest to redo the steps you did with care and do the tests over.
The issue is the setting of file.encoding when you run the program, and the destination of System.out. If System.out is an eclipse console, it may well be set to be UTF-8 eclipse console. If it's just a Windows DOS box, it is a CP1252 code page, and will only display ? in this case.

Carriage Return\Line feed in Java

I have created a text file in Unix environment using Java code.
For writing the text file I am using java.io.FileWriter and BufferedWriter. And for newline after each row I am using bw.newLine() method (where bw is object of BufferedWriter).
And I'm sending that text file by attaching in mail from Unix environment itself (automated that using Unix commands).
My issue is, after I download the text file from mail in a Windows system, if I
opened that text file the data is not properly aligned. newline() character is
not working, I think so.
I want same text file alignment as it is in Unix environment, if I opened the
text file in Windows environment also.
How do I resolve the problem?
Java code below for your reference (running in Unix environment):
File f = new File(strFileGenLoc);
BufferedWriter bw = new BufferedWriter(new FileWriter(f, false));
rs = stmt.executeQuery("select * from jpdata");
while ( rs.next() ) {
bw.write(rs.getString(1)==null? "":rs.getString(1));
bw.newLine();
}
Java only knows about the platform it is currently running on, so it can only give you a platform-dependent output on that platform (using bw.newLine()) . The fact that you open it on a windows system means that you either have to convert the file before using it (using something you have written, or using a program like unix2dos), or you have to output the file with windows format carriage returns in it originally in your Java program. So if you know the file will always be opened on a windows machine, you will have to output
bw.write(rs.getString(1)==null? "":rs.getString(1));
bw.write("\r\n");
It's worth noting that you aren't going to be able to output a file that will look correct on both platforms if it is just plain text you are using, you may want to consider using html if it is an email, or xml if it is data. Alternatively, you may need some kind of client that reads the data and then formats it for the platform that the viewer is using.
The method newLine() ensures a platform-compatible new line is added (0Dh 0Ah for DOS, 0Dh for older Macs, 0Ah for Unix/Linux). Java has no way of knowing on which platform you are going to send the text. This conversion should be taken care of by the mail sending entities.
Don't know who looks at your file, but if you open it in wordpad instead of notepad, the linebreaks will show correct. In case you're using a special file extension, associate it with wordpad and you're done with it. Or use any other more advanced text editor.
bw.newLine(); cannot ensure compatibility with all systems.
If you are sure it is going to be opened in windows, you can format it to windows newline.
If you are already using native unix commands, try unix2dos and convert teh already generated file to windows format and then send the mail.
If you are not using unix commands and prefer to do it in java, use ``bw.write("\r\n")` and if it does not complicate your program, have a method that finds out the operating system and writes the appropriate newline.
If I understand you right, we talk about a text file attachment.
Thats unfortunate because if it was the email's message body, you could always use "\r\n", referring to http://www.faqs.org/rfcs/rfc822.html
But as it's an attachment, you must live with system differences. If I were in your shoes, I would choose one of those options:
a) only support windows clients by using "\r\n" as line end.
b) provide two attachment files, one with linux format and one with windows format.
c) I don't know if the attachment is to be read by people or machines, but if it is people I would consider attaching an HTML file instead of plain text. more portable and much prettier, too :)
Encapsulate your writer to provide char replacement, like this:
public class WindowsFileWriter extends Writer {
private Writer writer;
public WindowsFileWriter(File file) throws IOException {
try {
writer = new OutputStreamWriter(new FileOutputStream(file), "ISO-8859-15");
} catch (UnsupportedEncodingException e) {
writer = new FileWriter(logfile);
}
}
#Override
public void write(char[] cbuf, int off, int len) throws IOException {
writer.write(new String(cbuf, off, len).replace("\n", "\r\n"));
}
#Override
public void flush() throws IOException {
writer.flush();
}
#Override
public void close() throws IOException {
writer.close();
}
}

Categories