My application needs to use a .properties file for configuration.
In the properties files, users are allow to specify paths.
Problem
Properties files need values to be escaped, eg
dir = c:\\mydir
Needed
I need some way to accept a properties file where the values are not escaped, so that the users can specify:
dir = c:\mydir
Why not simply extend the properties class to incorporate stripping of double forward slashes. A good feature of this will be that through the rest of your program you can still use the original Properties class.
public class PropertiesEx extends Properties {
public void load(FileInputStream fis) throws IOException {
Scanner in = new Scanner(fis);
ByteArrayOutputStream out = new ByteArrayOutputStream();
while(in.hasNext()) {
out.write(in.nextLine().replace("\\","\\\\").getBytes());
out.write("\n".getBytes());
}
InputStream is = new ByteArrayInputStream(out.toByteArray());
super.load(is);
}
}
Using the new class is a simple as:
PropertiesEx p = new PropertiesEx();
p.load(new FileInputStream("C:\\temp\\demo.properties"));
p.list(System.out);
The stripping code could also be improved upon but the general principle is there.
Two options:
use the XML properties format instead
Writer your own parser for a modified .properties format without escapes
You can "preprocess" the file before loading the properties, for example:
public InputStream preprocessPropertiesFile(String myFile) throws IOException{
Scanner in = new Scanner(new FileReader(myFile));
ByteArrayOutputStream out = new ByteArrayOutputStream();
while(in.hasNext())
out.write(in.nextLine().replace("\\","\\\\").getBytes());
return new ByteArrayInputStream(out.toByteArray());
}
And your code could look this way
Properties properties = new Properties();
properties.load(preprocessPropertiesFile("path/myfile.properties"));
Doing this, your .properties file would look like you need, but you will have the properties values ready to use.
*I know there should be better ways to manipulate files, but I hope this helps.
The right way would be to provide your users with a property file editor (or a plugin for their favorite text editor) which allows them entering the text as pure text, and would save the file in the property file format.
If you don't want this, you are effectively defining a new format for the same (or a subset of the) content model as the property files have.
Go the whole way and actually specify your format, and then think about a way to either
transform the format to the canonical one, and then use this for loading the files, or
parse this format and populate a Properties object from it.
Both of these approaches will only work directly if you actually can control your property object's creation, otherwise you will have to store the transformed format with your application.
So, let's see how we can define this. The content model of normal property files is simple:
A map of string keys to string values, both allowing arbitrary Java strings.
The escaping which you want to avoid serves just to allow arbitrary Java strings, and not just a subset of these.
An often sufficient subset would be:
A map of string keys (not containing any whitespace, : or =) to string values (not containing any leading or trailing white space or line breaks).
In your example dir = c:\mydir, the key would be dir and the value c:\mydir.
If we want our keys and values to contain any Unicode character (other than the forbidden ones mentioned), we should use UTF-8 (or UTF-16) as the storage encoding - since we have no way to escape characters outside of the storage encoding. Otherwise, US-ASCII or ISO-8859-1 (as normal property files) or any other encoding supported by Java would be enough, but make sure to include this in your specification of the content model (and make sure to read it this way).
Since we restricted our content model so that all "dangerous" characters are out of the way, we can now define the file format simply as this:
<simplepropertyfile> ::= (<line> <line break> )*
<line> ::= <comment> | <empty> | <key-value>
<comment> ::= <space>* "#" < any text excluding line breaks >
<key-value> ::= <space>* <key> <space>* "=" <space>* <value> <space>*
<empty> ::= <space>*
<key> ::= < any text excluding ':', '=' and whitespace >
<value> ::= < any text starting and ending not with whitespace,
not including line breaks >
<space> ::= < any whitespace, but not a line break >
<line break> ::= < one of "\n", "\r", and "\r\n" >
Every \ occurring in either key or value now is a real backslash, not anything which escapes something else.
Thus, for transforming it into the original format, we simply need to double it, like Grekz proposed, for example in a filtering reader:
public DoubleBackslashFilter extends FilterReader {
private boolean bufferedBackslash = false;
public DoubleBackslashFilter(Reader org) {
super(org);
}
public int read() {
if(bufferedBackslash) {
bufferedBackslash = false;
return '\\';
}
int c = super.read();
if(c == '\\')
bufferedBackslash = true;
return c;
}
public int read(char[] buf, int off, int len) {
int read = 0;
if(bufferedBackslash) {
buf[off] = '\\';
read++;
off++;
len --;
bufferedBackslash = false;
}
if(len > 1) {
int step = super.read(buf, off, len/2);
for(int i = 0; i < step; i++) {
if(buf[off+i] == '\\') {
// shift everything from here one one char to the right.
System.arraycopy(buf, i, buf, i+1, step - i);
// adjust parameters
step++; i++;
}
}
read += step;
}
return read;
}
}
Then we would pass this Reader to our Properties object (or save the contents to a new file).
Instead, we could simply parse this format ourselves.
public Properties parse(Reader in) {
BufferedReader r = new BufferedReader(in);
Properties prop = new Properties();
Pattern keyValPattern = Pattern.compile("\s*=\s*");
String line;
while((line = r.readLine()) != null) {
line = line.trim(); // remove leading and trailing space
if(line.equals("") || line.startsWith("#")) {
continue; // ignore empty and comment lines
}
String[] kv = line.split(keyValPattern, 2);
// the pattern also grabs space around the separator.
if(kv.length < 2) {
// no key-value separator. TODO: Throw exception or simply ignore this line?
continue;
}
prop.setProperty(kv[0], kv[1]);
}
r.close();
return prop;
}
Again, using Properties.store() after this, we can export it in the original format.
Based on #Ian Harrigan, here is a complete solution to get Netbeans properties file (and other escaping properties file) right from and to ascii text-files :
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import java.io.Reader;
import java.io.Writer;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Properties;
/**
* This class allows to handle Netbeans properties file.
* It is based on the work of : http://stackoverflow.com/questions/6233532/reading-java-properties-file-without-escaping-values.
* It overrides both load methods in order to load a netbeans property file, taking into account the \ that
* were escaped by java properties original load methods.
* #author stephane
*/
public class NetbeansProperties extends Properties {
#Override
public synchronized void load(Reader reader) throws IOException {
BufferedReader bfr = new BufferedReader( reader );
ByteArrayOutputStream out = new ByteArrayOutputStream();
String readLine = null;
while( (readLine = bfr.readLine()) != null ) {
out.write(readLine.replace("\\","\\\\").getBytes());
out.write("\n".getBytes());
}//while
InputStream is = new ByteArrayInputStream(out.toByteArray());
super.load(is);
}//met
#Override
public void load(InputStream is) throws IOException {
load( new InputStreamReader( is ) );
}//met
#Override
public void store(Writer writer, String comments) throws IOException {
PrintWriter out = new PrintWriter( writer );
if( comments != null ) {
out.print( '#' );
out.println( comments );
}//if
List<String> listOrderedKey = new ArrayList<String>();
listOrderedKey.addAll( this.stringPropertyNames() );
Collections.sort(listOrderedKey );
for( String key : listOrderedKey ) {
String newValue = this.getProperty(key);
out.println( key+"="+newValue );
}//for
}//met
#Override
public void store(OutputStream out, String comments) throws IOException {
store( new OutputStreamWriter(out), comments );
}//met
}//class
You could try using guava's Splitter: split on '=' and build a map from resulting Iterable.
The disadvantage of this solution is that it does not support comments.
#pdeva: one more solution
//Reads entire file in a String
//available in java1.5
Scanner scan = new Scanner(new File("C:/workspace/Test/src/myfile.properties"));
scan.useDelimiter("\\Z");
String content = scan.next();
//Use apache StringEscapeUtils.escapeJava() method to escape java characters
ByteArrayInputStream bi=new ByteArrayInputStream(StringEscapeUtils.escapeJava(content).getBytes());
//load properties file
Properties properties = new Properties();
properties.load(bi);
It's not an exact answer to your question, but a different solution that may be appropriate to your needs. In Java, you can use / as a path separator and it'll work on both Windows, Linux, and OSX. This is specially useful for relative paths.
In your example, you could use:
dir = c:/mydir
Related
Here's the deal :
I was asked to developp a JAVA program that would do some reorganisations of .tsv files (moving cells to do some kind of transposition).
So, I tried to do it cleanly and got now 3 different packages:
.
Only tsvExceptions and tsvTranspositer are needed to make the main (TSVTransposer.java) work.
Yesterday I learned that I would have to implement it in Talend myself which I had never heard of.
So by searching, i stepped on this stackOverflow topic. So i followed the steps, creating a routine, copy/pasting my main inside it (changing the package to "routines") and added the external needed libraries to it (my two packages exported as jar files and openCSV). Now, when I open the routine, no error is showned but I can't drag & drop it to my created job !
Nothing happens. It just opens the component infos as shown with "Properties not available."
package routines;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import com.opencsv.CSVReader;
import com.opencsv.CSVWriter;
import tsvExceptions.ArgsExceptions;
import tsvExceptions.EmptyArgsException;
import tsvExceptions.OutOfBordersArgsException;
import tsvTranspositer.CommonLine;
import tsvTranspositer.HeadOfValuesHandler;
import tsvTranspositer.InputFile;
import tsvTranspositer.OutputFile;
public class tsvRoutine {
public static void main(String[] args) throws ArgsExceptions {
// Boolean set to true while everything is good
Boolean everythingOk = true;
String inputFile = null; // Name of the entry file to be transposed.
String outputFile = null; // Name of the output file.
int serieNb = 1 ; // Number of columns before the actual values in the input file. Can be columns describing the product as well as empty columns before the values.
int linesToCopy = 0; // Number of lines composing the header of the file (those lines will be copy/pasted in the output)
/*
* Handling the arguments first.
*/
try {
switch (args.length) {
case 0:
throw new EmptyArgsException();
case 1:
inputFile = args[0];
String[] parts = inputFile.split("\\.");
// If no outPutFile name is given, will add "Transposed" to the inputFile Name
outputFile = parts[0] + "Transposed." + parts[1];
break;
case 2:
inputFile = args[0];
outputFile = args[1];
break;
case 3:
inputFile = args[0];
outputFile = args[1];
serieNb = Integer.parseInt(args[2]);
break;
case 4:
inputFile = args[0];
outputFile = args[1];
serieNb = Integer.parseInt(args[2]);
linesToCopy = Integer.parseInt(args[3]);
break;
default:
inputFile = args[0];
outputFile = args[1];
serieNb = Integer.parseInt(args[2]);
linesToCopy = Integer.parseInt(args[3]);
throw new OutOfBordersArgsException();
}
}
catch (ArgsExceptions a) {
a.notOk(everythingOk);
}
catch (NumberFormatException n) {
System.out.println("Arguments 3 & 4 should be numbers."
+ " Number 3 is the Number of columns before the actual values in the input file. \n"
+ "(Can be columns describing the product as well as empty columns before the values. (1 by default)) \n"
+ "Number 4 is the number of lines to copy/pasta. (0 by default) \n"
+ "Please try again.");
everythingOk = false;
}
// Creating an InputFile and an OutputFile
InputFile ex1 = new InputFile(inputFile, linesToCopy);
OutputFile ex2 = new OutputFile(outputFile);
if (everythingOk) {
try ( FileReader fr = new FileReader(inputFile);
CSVReader reader = new CSVReader(fr, '\t', '\'', 0);
FileWriter fw = new FileWriter(outputFile);
CSVWriter writer = new CSVWriter(fw, '\t', CSVWriter.NO_QUOTE_CHARACTER))
{
ex1.setReader(reader);
ex2.setWriter(writer);
// Reading the header of the file
ex1.readHead();
// Writing the header of the file (copy/pasta)
ex2.write(ex1.getHeadFile());
// Handling the line containing the columns names
HeadOfValuesHandler handler = new HeadOfValuesHandler(ex1.readLine(), serieNb);
ex2.writeLine(handler.createOutputHOV());
// Each lien will be read and written (in multiple lines) one after the other.
String[] row;
CommonLine cl1;
// If the period is monthly
if (handler.isMonthly()) {
while (!ex1.isAllDone()) {
row = ex1.readLine();
if (!ex1.isAllDone()) {
cl1 = new CommonLine(row, handler.getYears(), handler.getMonths(), serieNb);
ex2.write(cl1.exportOutputLines());
}
}
}
// If the period is yearly
else {
while (!ex1.isAllDone()) {
row = ex1.readLine();
if (!ex1.isAllDone()) {
cl1 = new CommonLine(row, handler.getYears(), serieNb);
ex2.write(cl1.exportOutputLines());
}
}
}
}
catch (FileNotFoundException f) {
System.out.println(inputFile + " can't be found. Cancelling...");
}
catch (IOException e) {
System.out.println("Unknown exception raised.");
e.printStackTrace();
}
}
}
}
I know the exceptions aren't correctly handled yet, but they are in some kind of hurry for it to work in some way.
Another problem that will occur later is that I have no idea how to parse arguments to the program that are required.
Anyway, thanks for reading this post!
You cannot add routines per drag and drop to a job. You will need to access the routines functions through components.
For example, you would start with a tFileListInput to get all files you need. Then you could add a tFileInputDelimited where you describe all fields of your input. After this, with e.g. a tJavaRow component, you can write some code which would access your routine.
NOTE: Keep in mind that Talend works usually row-wise. This means that your routines should handle stuff in a row-wise manner. This could also mean that your code has to be refactored accordingly. A main function won't work, this has at least to become a class which can be instanciated or has static functions.
If you want to handle everything on your own, instead of a tJavaRow component you might use a tJava component which adds more flexibility.
Still, it won't be as easy as simply adding the routine and everything will work.
In fact, the whole code can become a job on its own. Talend generates the whole Java code for you:
The parameters can become Context variables.
The check if numbers are numbers could be done several ways, for example with a tPreJob and a tJava
Input file could be connected with a tFileInputDelimited with a dot separator
Then, every row will be processed with either a tJavaRow with your custom code or with a tMap if its not too complex.
Afterwards, you can write the file with a tFileOutputDelimited component
Everything will get connected via right click / main to iterate over the rows
All exception handling is done by Talend. If you want to react to exceptions, you can use a component like tLogRow.
Hope this helps a bit to set the direction.
I'm reading in an XML configuration file that I don't control the format of, and the data I need is in the last element. Unfortunately, that element is a base64 encoded serialised Java class (yes, I know) that is 31200 characters in length.
Some experimenting seems to show that not only can the Java XML/XPath libraries not see the value in this element (they silently set the value to a blank string), if I just read the file into a string and print it out to console, everything (even a closing element on the next line) gets printed, but not this one element.
Finally, if I manually go into the file and break the line into rows, Java can see the line, although this obviously breaks XML parsing and deserialisation. It also isn't practical as I want to make a tool that will work across many such files.
Is there some line length limit in Java that stops this working? Can I get around it with a third party library?
EDIT: here's the XML-related code:
FileInputStream fstream = new FileInputStream("path/to/xml/file.xml");
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document d = db.parse(fstream);
String s = XPathFactory.newInstance().newXPath().compile("//el1").evaluate(d);
For reading a large xml file, you can use SAX parser.
In addition to that reading the values inside the "characters" in the SAX parser should be read using "String Buffer" instead of String.
You can check out the SAX parser here.
I wondered if it might be possible to do some pre-processing to the XML as you read it in.
I've been having a play to see if I could break down the long element into a list of sub-elements. Then this could be parsed and the sub-elements could be built back into a string. My testing threw up the fact that my initial guess of 4500 characters per sub element was still a bit high for my XML parsing to cope with, so I just arbitrarily picked 1000 and it seems to cope with that.
Anyway, this might help, it might not, but here's what I came up with:
private static final String ELEMENT_TO_BREAK_UP_OPEN = "<element>";
private static final String ELEMENT_TO_BREAK_UP_CLOSE = "</element>";
private static final String SUB_ELEMENT_OPEN = "<subelement>";
private static final String SUB_ELEMENT_CLOSE = "</subelement>";
private static final int SUB_ELEMENT_SIZE_LIMIT = 1000;
public static void main(final String[] args) {
try {
/* The XML currently looks like this:
*
* <root>
* <element> ... Super long input with 30000+ characters ... </element>
* </root>
*
*/
final File file = new File("src\\main\\java\\longxml\\test.xml");
final BufferedReader reader = new BufferedReader(new FileReader(file));
final StringBuffer buffer = new StringBuffer();
String line = reader.readLine();
while( line != null ) {
if( line.contains(ELEMENT_TO_BREAK_UP_OPEN) ) {
buffer.append(ELEMENT_TO_BREAK_UP_OPEN);
String substring = line.substring(ELEMENT_TO_BREAK_UP_OPEN.length(), (line.length() - ELEMENT_TO_BREAK_UP_CLOSE.length()) );
while( substring.length() > SUB_ELEMENT_SIZE_LIMIT ) {
buffer.append(SUB_ELEMENT_OPEN);
buffer.append( substring.substring(0, SUB_ELEMENT_SIZE_LIMIT) );
buffer.append(SUB_ELEMENT_CLOSE);
substring = substring.substring(SUB_ELEMENT_SIZE_LIMIT);
}
if( substring.length() > 0 ) {
buffer.append(SUB_ELEMENT_OPEN);
buffer.append(substring);
buffer.append(SUB_ELEMENT_CLOSE);
}
buffer.append(ELEMENT_TO_BREAK_UP_CLOSE);
}
else {
buffer.append(line);
}
line = reader.readLine();
}
reader.close();
/* The XML now looks something like this:
*
* <root>
* <element>
* <subElement> ... First Part of Data ... </subElement>
* <subElement> ... Second Part of Data ... </subElement>
* ... Multiple Other SubElements of Data ..
* <subElement> ... Final Part of Data ... </subElement>
* </element>
* </root>
*/
//This parses the xml with the new subElements in
final InputSource src = new InputSource(new StringReader(buffer.toString()));
final Node document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src).getFirstChild();
//This gives us the first child (element) then that's children (subelements)
final NodeList childNodes = document.getFirstChild().getChildNodes();
//Then concatenate them back into a big string.
final StringBuilder finalElementValue = new StringBuilder();
for( int i = 0; i < childNodes.getLength(); i++ ) {
final Node node = childNodes.item(i);
finalElementValue.append( node.getFirstChild().getNodeValue() );
}
//At this point do whatever you need to do. Decode, Deserialize, etc...
System.out.println(finalElementValue.toString());
}
catch (final Exception e) {
e.printStackTrace();
}
}
There are a few issues with this in terms of it's general application:
It does rely on the element you want to break up being uniquely identifiable. (But I'm guessing the logic to find the element can be improved quite a bit)
It relies on knowing the format of the XML and hoping that doesn't change. (Only in the latter parsing section, you could potentially parse it better with xPath once it has been broken into subelements)
Having said all of that, you do end up with a parsable XML string, which you can build your encoded string from, so this might help you on your way to a solution.
I have a resource file (.properties file) which is having thai characters.
When I read that file using the below code it is showing junk characters like "?"
package RD1.Common;
import java.util.Enumeration;
import java.util.Locale;
import java.util.ResourceBundle;
public class LabelManagerRD {
public static String[] getLabel(String ParamString1)
{
String NextEle = "";
String str2 = ParamString1;
int i = 1;
String Final[] = new String[1000];
ResourceBundle bundle =
ResourceBundle.getBundle("rd", Locale.US);
Enumeration<String> enumeration = bundle.getKeys();
while (enumeration.hasMoreElements())
{
NextEle = enumeration.nextElement();
if (NextEle.toLowerCase().contains(str2.toLowerCase()))
{
Final[i] = NextEle+"="+bundle.getString(NextEle);
i++;
}
}
return Final;
}
public static void main(String[] args)
{
try
{
String TestValue[] = getLabel("RD.RDRAPCEX");
for(int i=1;i<=TestValue.length;i++)
{
if (!(TestValue[i].length()==0))
{
System.out.println(i+" - "+TestValue[i]);
}
}
}
catch (Exception e)
{
}
}
}
And properties file (rd_en_US.properties) is like below
BL_BLNG_GROUP.BL_BLNG_GRP.BLNG_GRP_ID.IP=รสสรืเ เพนีย รก~^PAGE_1~^Y~^N
BL_BLNG_GROUP.BL_BLNG_GRP.LONG_DESC.IP=Long Desc~^PAGE_1~^Y~^N
BL_BLNG_GROUP.BL_BLNG_GRP.SHORT_DESC.IP=Short Desc~^PAGE_1~^Y~^N
BL_BLNG_GROUP.BL_BLNG_GRP.DETAIL_DESC.IP=Explanatory Note~^PAGE_1~^Y~^N
Please suggest how to proceed with this.
Thanks in advance,
Sandy
If encoding of your file is corrent then you must note that System.out will not be able to print the UTF-8 characters with default console settings. Make sure the console you use to display the output is also encoded in UTF-8.
In Eclipse for example, you need to go to Run Configuration > Common to do this.
Property files are typically interpreted in ISO 8859-1 encoding. If you need other characters not included in this set use unicode escapes like \uxxxx. There are also tools available to convert property files with different encoding to this one (see native2ascii).
I have a config file, named config.txt, look like this.
IP=192.168.1.145
PORT=10022
URL=http://www.stackoverflow.com
I wanna change some value of the config file in Java, say the port to 10045. How can I achieve easily?
IP=192.168.1.145
PORT=10045
URL=http://www.stackoverflow.com
In my trial, i need to write lots of code to read every line, to find the PORT, delete the original 10022, and then rewrite 10045. my code is dummy and hard to read. Is there any convenient way in java?
Thanks a lot !
If you want something short you can use this.
public static void changeProperty(String filename, String key, String value) throws IOException {
Properties prop =new Properties();
prop.load(new FileInputStream(filename));
prop.setProperty(key, value);
prop.store(new FileOutputStream(filename),null);
}
Unfortunately it doesn't preserve the order or fields or any comments.
If you want to preserve order, reading a line at a time isn't so bad.
This untested code would keep comments, blank lines and order. It won't handle multi-line values.
public static void changeProperty(String filename, String key, String value) throws IOException {
final File tmpFile = new File(filename + ".tmp");
final File file = new File(filename);
PrintWriter pw = new PrintWriter(tmpFile);
BufferedReader br = new BufferedReader(new FileReader(file));
boolean found = false;
final String toAdd = key + '=' + value;
for (String line; (line = br.readLine()) != null; ) {
if (line.startsWith(key + '=')) {
line = toAdd;
found = true;
}
pw.println(line);
}
if (!found)
pw.println(toAdd);
br.close();
pw.close();
tmpFile.renameTo(file);
}
My suggestion would be to read the entire config file into memory (maybe into a list of (attribute:value) pair objects), do whatever processing you need to do (and consequently make any changes), then overwrite the original file with all the changes you have made.
For example, you could read the config file you have provided by line, use String.split("=") to separate the attribute:value pairs - making sure to name each pair read accordingly. Then make whatever changes you need, iterate over the pairs you have read in (and possibly modified), writing them back out to the file.
Of course, this approach would work best if you had a relatively small number of lines in your config file, that you can definitely know the format for.
this code work for me.
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.Properties;
public void setProperties( String key, String value) throws IOException {
Properties prop = new Properties();
FileInputStream ip;
try {
ip = new FileInputStream("config.txt");
prop.load(ip);
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
prop.setProperty(key, value);
PrintWriter pw = new PrintWriter("config.txt");
prop.store(pw, null);
}
Use the Properties class to load/save configuration. Then simply set the value and save it again.
Properties p = new Properties();
p.load(...);
p.put("key", "value");
p.save(...)
It's easy and straightforward.
As a side, if your application is a single application that does not need to scale to run on multiple computers, do not bother to use a database to save config. It is utter overkill. However, if you application needs real time config changes and needs to scale, Redis works pretty well to distribute config and handle the synchronization for you. I have used it for this purpose with great success.
Consider using java.util.Properties and it's load() and store() methods.
But remember that this would not preserve comments and extra line breaks in the file.
Also certain chars need to be escaped.
If you are open to use third party libraries, explore http://commons.apache.org/configuration/. It supports configurations in multiple format. Comments will be preserved as well. (Except for a minor bug -- apache-commons-config PropertiesConfiguration: comments after last property is lost)
I use the following code to save Chinese characters into a .txt file, but when I opened it with Wordpad, I couldn't read it.
StringBuffer Shanghai_StrBuf = new StringBuffer("\u4E0A\u6D77");
boolean Append = true;
FileOutputStream fos;
fos = new FileOutputStream(FileName, Append);
for (int i = 0;i < Shanghai_StrBuf.length(); i++) {
fos.write(Shanghai_StrBuf.charAt(i));
}
fos.close();
What can I do ? I know if I cut and paste Chinese characters into Wordpad, I can save it into a .txt file. How do I do that in Java ?
There are several factors at work here:
Text files have no intrinsic metadata for describing their encoding (for all the talk of angle-bracket taxes, there are reasons XML is popular)
The default encoding for Windows is still an 8bit (or doublebyte) "ANSI" character set with a limited range of values - text files written in this format are not portable
To tell a Unicode file from an ANSI file, Windows apps rely on the presence of a byte order mark at the start of the file (not strictly true - Raymond Chen explains). In theory, the BOM is there to tell you the endianess (byte order) of the data. For UTF-8, even though there is only one byte order, Windows apps rely on the marker bytes to automatically figure out that it is Unicode (though you'll note that Notepad has an encoding option on its open/save dialogs).
It is wrong to say that Java is broken because it does not write a UTF-8 BOM automatically. On Unix systems, it would be an error to write a BOM to a script file, for example, and many Unix systems use UTF-8 as their default encoding. There are times when you don't want it on Windows, either, like when you're appending data to an existing file: fos = new FileOutputStream(FileName,Append);
Here is a method of reliably appending UTF-8 data to a file:
private static void writeUtf8ToFile(File file, boolean append, String data)
throws IOException {
boolean skipBOM = append && file.isFile() && (file.length() > 0);
Closer res = new Closer();
try {
OutputStream out = res.using(new FileOutputStream(file, append));
Writer writer = res.using(new OutputStreamWriter(out, Charset
.forName("UTF-8")));
if (!skipBOM) {
writer.write('\uFEFF');
}
writer.write(data);
} finally {
res.close();
}
}
Usage:
public static void main(String[] args) throws IOException {
String chinese = "\u4E0A\u6D77";
boolean append = true;
writeUtf8ToFile(new File("chinese.txt"), append, chinese);
}
Note: if the file already existed and you chose to append and existing data wasn't UTF-8 encoded, the only thing that code will create is a mess.
Here is the Closer type used in this code:
public class Closer implements Closeable {
private Closeable closeable;
public <T extends Closeable> T using(T t) {
closeable = t;
return t;
}
#Override public void close() throws IOException {
if (closeable != null) {
closeable.close();
}
}
}
This code makes a Windows-style best guess about how to read the file based on byte order marks:
private static final Charset[] UTF_ENCODINGS = { Charset.forName("UTF-8"),
Charset.forName("UTF-16LE"), Charset.forName("UTF-16BE") };
private static Charset getEncoding(InputStream in) throws IOException {
charsetLoop: for (Charset encodings : UTF_ENCODINGS) {
byte[] bom = "\uFEFF".getBytes(encodings);
in.mark(bom.length);
for (byte b : bom) {
if ((0xFF & b) != in.read()) {
in.reset();
continue charsetLoop;
}
}
return encodings;
}
return Charset.defaultCharset();
}
private static String readText(File file) throws IOException {
Closer res = new Closer();
try {
InputStream in = res.using(new FileInputStream(file));
InputStream bin = res.using(new BufferedInputStream(in));
Reader reader = res.using(new InputStreamReader(bin, getEncoding(bin)));
StringBuilder out = new StringBuilder();
for (int ch = reader.read(); ch != -1; ch = reader.read())
out.append((char) ch);
return out.toString();
} finally {
res.close();
}
}
Usage:
public static void main(String[] args) throws IOException {
System.out.println(readText(new File("chinese.txt")));
}
(System.out uses the default encoding, so whether it prints anything sensible depends on your platform and configuration.)
If you can rely that the default character encoding is UTF-8 (or some other Unicode encoding), you may use the following:
Writer w = new FileWriter("test.txt");
w.append("上海");
w.close();
The safest way is to always explicitly specify the encoding:
Writer w = new OutputStreamWriter(new FileOutputStream("test.txt"), "UTF-8");
w.append("上海");
w.close();
P.S. You may use any Unicode characters in Java source code, even as method and variable names, if the -encoding parameter for javac is configured right. That makes the source code more readable than the escaped \uXXXX form.
Be very careful with the approaches proposed. Even specifying the encoding for the file as follows:
Writer w = new OutputStreamWriter(new FileOutputStream("test.txt"), "UTF-8");
will not work if you're running under an operating system like Windows. Even setting the system property for file.encoding to UTF-8 does not fix the issue. This is because Java fails to write a byte order mark (BOM) for the file. Even if you specify the encoding when writing out to a file, opening the same file in an application like Wordpad will display the text as garbage because it doesn't detect the BOM. I tried running the examples here in Windows (with a platform/container encoding of CP1252).
The following bug exists to describe the issue in Java:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058
The solution for the time being is to write the byte order mark yourself to ensure the file opens correctly in other applications. See this for more details on the BOM:
http://mindprod.com/jgloss/bom.html
and for a more correct solution see the following link:
http://tripoverit.blogspot.com/2007/04/javas-utf-8-and-unicode-writing-is.html
Here's one way among many. Basically, we're just specifying that the conversion be done to UTF-8 before outputting bytes to the FileOutputStream:
String FileName = "output.txt";
StringBuffer Shanghai_StrBuf=new StringBuffer("\u4E0A\u6D77");
boolean Append=true;
Writer writer = new OutputStreamWriter(new FileOutputStream(FileName,Append), "UTF-8");
writer.write(Shanghai_StrBuf.toString(), 0, Shanghai_StrBuf.length());
writer.close();
I manually verified this against the images at http://www.fileformat.info/info/unicode/char/ . In the future, please follow Java coding standards, including lower-case variable names. It improves readability.
Try this,
StringBuffer Shanghai_StrBuf=new StringBuffer("\u4E0A\u6D77");
boolean Append=true;
Writer out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(FileName,Append), "UTF8"));
for (int i=0;i<Shanghai_StrBuf.length();i++) out.write(Shanghai_StrBuf.charAt(i));
out.close();