I am currently using Freemarker to generate a number of configuration files. So far these have been either xml files or properietary format text files. I now would like to generate some Java .properties files but have hit a couple of issues.
The first is character encoding. As far as I can see simply adding
<#ftl encoding="8859_1">
to the start of the file should sort this out.
The second issue is the escaping of the keys and values. The keys are probably ok as I would be hardcoding these in the template anyway so I can escape them in the template. The values will be coming from my data model and so will need escaping.
I can see how I can create my own user defined directive and by installing it as a shared variable use it in my template.
Is this the best or only way to do this? I would have thought generating .properties files is something that has been tackled many times before and was hoping something may already exist before I start writing my own code.
The class java.util.Properties got various store methods to save properties to OutputStreams or files. This seems more preferable than trying to adapt freemarker.
I don't get what are the charset issues that are specific to generating properties files. But note that the charset of the template and the charset of the output are independent, so you might as well use the same charset for these templates as for the others (like maybe UTF-8).
As of escaping, always use auto-escaping if you can. In 2.3.24 that will be especially sleek, but unless you are allowed to use unreleased versions, you had to wait for that until the end of February or so. (If you can use unreleased/unofficial versions, you can find out about the internal testing releases in the developer list archive.) Before 2.3.24, there's <#escape x as propEsc(x)>all the template content here</#escape>, where propEsc is a TemplateMethodModelEx (not a TemplateDirectiveModel) that you have added as shared variable or such. And so all ${...}-s will be magically escaped.
Related
I'm designing a new application in JSE which I want to internationalize.
I've never done such an application. I'm looking for the best practices about the internationalization. The application while be writing the translated data in files or DB. I've searched about best practices but I didn't found anything about my main question(the first one).
Should I put all the internationalization data in some layer or next to the object they are about ?
Could I directly use the properties files as a kind of enum to do a switch case ?
Or can I reverse engineer the data catched and know the default internationalize value and work with it?
I did encounter several strategies. I would start with a properties file.
One factor is that the data must be professionally maintained:
keep it in version control.
keep a version number for us humans, "1.0.23"
keep the texts ordered and nice, to help translation.
keep a second properties file with a glossary for consistent translation.
Undermore I did see generating properties or java ListResourceBundles from DocBook XML, Excel, translation memories. And yes, database.
Maintenance of data must be done careful, as several different parties will use the text at different times.
Programming tools, consistency checks and preparing data, communicating are tasks not to neglect.
Properties files are not entirely ideal, but IDEs have generally some support for them.
Set up everything for UTF-8, though take notice that properties files use ISO-8859-1, but you can use \uXXXX escaping or do a encoding conversion in your build process. ListResourceBundle java sources, generated than, would be an alternative.
I need to parse the configurations defined in a Vagrantfile written in Ruby and use the settings elsewhere in my java code. Tried exploring jRubyParser but din't come across any documentation that defines it's use.
Cloned the Vagrant repo locally, but browsing through the code does not help either as I don't have prior experience with Ruby. How would Vagrant be reading the configurations defined in the file ? Any inputs ?
Vagrantfile is a regular Ruby script, i.e. it's meant to be interpreted by Ruby intepreter more than read as a configuration file.
To make things harder, some configuration options aren't declared as top level variables in Vagrantfile, but rather as properties of object in some function calls (like "config.vm.provider".
Depending on how complex your configuration is, I would consider just reading the file line by line and do regular expression matching to get variables I'd need. Not the most elegant solution, but probably way quicker too implement than alternatives.
Also, if your provider is always the same, say VirtualBox, maybe you could get some of your configuration from there. In that case, you would just need to read file located somewhere in "VirtualBox VMs" directory (on Mac, it's in "$HOME/VirtualBox VMs"). It's an XML file, so you could use one of the Java XML parsers to get what you need.
I am designing a simple library that deals with properties files.
I noticed that since JRE 1.5 the class Properties defines methods like:
public synchronized void loadFromXML(InputStream in)
public void storeToXML(OutputStream os, String comment)
I am questioning the fact that this is a real enhancement in the API of this class. Properties files have been, since JRE 1.5 text based files, and the newly introduced XML format is not adding anything to the functionalities, other than the possibility to use a different forma which is
more verbose
more complex (to understand, to change, to parse)
more inefficient (it uses dom internally to parse into an hastable: it consumes more memory, it requires helper classes in the implementation, and most likely is also slower)
more fragile (xml requires escaping of characters <>&"' while properties only need to escape backslashes, since it also supports Java backslash escaping)
it breaks backward compatibility of the programs using it, since users running JDK 1.4 won't be able to read xml properties. (ok, who cares...)
So I fail to understand the reason behind why engineers in Sun added this feature.
The question is:
Does anybody finds some advantage of using an XML-based properties files over a traditional text based one?
I need to evaluate this problem, since I don't want to add a useless feature to my simple library that I cited before.
Did you ever used an XML-based properties file over a Java Properties file? And why?
Note: same question can be made for Log4J xml file format, but at least Log4J xml format adds nesting ability and some sort of syntax which has some meaning, and I do understand that. But with this xml format for properties, I don't.
If staying within the Java environment, using a Java properties file works great. Even if you expect other programming languages to interact with your library, you'll probably be ok with a 'regular' properties file. However, for hierarchical data, XML is the standard. The reason you may want to support this change, and possibly the reason why Sun included it, is that other programming languages have extensive libraries for parsing XML files for hierarchical data.
The reason I'm answering is because I have actually used this feature before! But not for a great reason. In one program I'm working on now, I've found it easiest to keep a set of data in a properties object and I output the object to XML so that it can later be read by Python. At the moment, the data is further manipulated in a Python script and more children are added to the XML file. Without being able to output easily to XML, this would be a little more painful.
If I had the time, I wouldn't bother outputting to XML though. The main reason I'm using the Python code that takes in the XML is because somebody else wrote it and I'm temporarily using it until I have the time to reevaluate that section of my program and re-code it.
So there's a reason for using the XML! It isn't a good one, but it's a reason.
I imagine there are other cases like this where having the properties outputted as an XML aids in compatibility with other languages, since most languages have a robust XML parsing library and it makes it easier to manipulate hierarchical data. And in scientific programming, it seems you rarely get the luxury of sticking to one language.
Some points:
You can use standard, cross-platform tools to create it
You don't need to worry about peculiarities of escaping and character encoding, as you can use standard tools, which actually makes it more robust. The old properties file format is poorly specified.
Standard, cross-platform tools can use the data.
For most applications Java is used in, a bit of start up time isn't going to make much difference (particularly given the start up time of the rest of the system).
Java SE 1.6 is a bout to complete its end-of-life. Pre-1.5 isn't particularly relevant for Java SE (or EE).
But no, I've never seen it actually used.
Afaik the XML format is encouraged because of the encoding: (by specs) strictly ASCII for plain files (may I suggest you http://mojo.codehaus.org/native2ascii-maven-plugin/), UTF-8 (default) for XML property files as stated in http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Properties.html
edit: I beg your pardon: ISO-8859-1 for property plain files
I am writing a program use JSP and Java. How can I use property files to support multiple languages?
And by the way, there are always some things like \u4345.
What is this? How do they come?
For the multiple languages, check out the ResourceBundle class.
About the \u4345, this is one of the dark and very annoying legacy corners of Java. The property files need to be in ASCII, so that all non-ASCII characters need to encoded as \uxxxx (their Unicode value). You can convert a file to use this encoding with the native2ascii command line tool.
If you are using an IDE or a build tool, there should be an option to invoke this automatically.
If the property file is something you have full control over yourself, you can starting from Java6 also use UTF-8 (or any other character set) directly in the property file, and specify that encoding when you load it:
// new in Java6
props.load(new InputStreamReader(new FileInputStream(file), 'UTF-8'));
Again, this only works if you load the Properties yourself, not if someone else does it, such as a ResourceBundle (used for internationalization).
there is an entire tutorial on http://java.sun.com/docs/books/tutorial/i18n/index.html
This specifies and explains about anything you need to know.
The Java tutorial on i18n has been mentioned already by Peter. If you are building JSPs you probably want to look at the JSTL which basically allows you to use the functionality of ResourceBundle through JSP tags.
(Disclaimer: I looked at a number of posts on here before asking, I found this one particularly helpful, I was just looking for a bit of a sanity check from you folks if possible)
Hi All,
I have an internal Java product that I have built for processing data files for loading into a database (AKA an ETL tool). I have pre-rolled stages for XSLT transformation, and doing things like pattern replacing within the original file. The input files can be of any format, they may be flat data files or XML data files, you configure the stages you require for the particular datafeed being loaded.
I have up until now ignored the issue of file encoding (a mistake I know), because all was working fine (in the main). However, I am now coming up against file encoding issues, to cut a long story short, because of the nature of the way stages can be configured together, I need to detect the file encoding of the input file and create a Java Reader object with the appropriate arguments. I just wanted to do a quick sanity check with you folks before I dive into something I can't claim to fully comprehend:
Adopt a standard file encoding of UTF-16 (I'm not ruling out loading double-byte characters in the future) for all files that are output from every stage within my toolkit
Use JUniversalChardet or jchardet to sniff the input file encoding
Use the Apache Commons IO library to create a standard reader and writer for all stages (am I right in thinking this doesn't have a similar encoding-sniffing API?)
Do you see any pitfalls/have any extra wisdom to offer in my outlined approach?
Is there any way I can be confident of backwards compatibility with any data loaded using my existing approach of letting the Java runtime decide the encoding of windows-1252?
Thanks in advance,
-James
With flat character data files, any encoding detection will need to rely on statistics and heuristics (like the presence of a BOM, or character/pattern frequency) because there are byte sequences that will be legal in more than one encoding, but map to different characters.
XML encoding detection should be more straightforward, but it is certainly possible to create ambiguously encoded XML (e.g. by leaving out the encoding in the header).
It may make more sense to use encoding detection APIs to indicate the probability of error to the user rather than rely on them as decision makers.
When you transform data from bytes to chars in Java, you are transcoding from encoding X to UTF-16(BE). What gets sent to your database depends on your database, its JDBC driver and how you've configured the column. That probably involves transcoding from UTF-16 to something else. Assuming you're not altering the database, existing character data should be safe; you might run into issues if you intend parsing BLOBs. If you've already parsed files written in disparate encodings, but treated them as another encoding, the corruption has already taken place - there are no silver bullets to fix that. If you need to alter the character set of a database from "ANSI" to Unicode, that might get painful.
Adoption of Unicode wherever possible is a good idea. It may not be possible, but prefer file formats where you can make encoding unambiguous - things like XML (which makes it easy) or JSON (which mandates UTF-8).
Option 1 strikes me as breaking backwards compatibility (certainly in the long run), although the "right way" to go (the right way option generally does break backwards compatibility) with perhaps additional thoughts about if UTF-8 would be a good choice.
Sniffing the encoding strikes me as reasonable if you have a limited, known set of encodings that you tested to know that your sniffer correctly distinguishes and identifies.
Another option here is to use some form of meta-data (file naming convention if nothing else more robust is an option) that lets your code know that the data was provided according to the UTF-16 standard and behave accordingly, otherwise convert it to the UTF-16 standard before moving forward.