CSV parsing in Java - working example..? [closed] - java

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I want to write a program for a school java project to parse some CSV I do not know. I do know the datatype of each column - although I do not know the delimiter.
The problem I do not even marginally know how to fix is to parse Date or even DateTime Columns. They can be in one of many formats.
I found many libraries but have no clue which is the best for my needs:
http://opencsv.sourceforge.net/
http://www.csvreader.com/java_csv.php
http://supercsv.sourceforge.net/
http://flatpack.sourceforge.net/
The problem is I am a total java beginner. I am afraid non of those libraries can do what I need or I can't convince them to do it.
I bet there are a lot of people here who have code sample that could get me started in no time for what I need:
automatically split in Columns (delimiter unknown, Columntypes are known)
cast to Columntype (should cope with $, %, etc.)
convert dates to Java Date or Calendar Objects
It would be nice to get as many code samples as possible by email.
Thanks a lot!
AS

You also have the Apache Commons CSV library, maybe it does what you need. See the guide. Updated to Release 1.1 in 2014-11.
Also, for the foolproof edition, I think you'll need to code it yourself...through SimpleDateFormat you can choose your formats, and specify various types, if the Date isn't like any of your pre-thought types, it isn't a Date.

There is a serious problem with using
String[] strArr=line.split(",");
in order to parse CSV files, and that is because there can be commas within the data values, and in that case you must quote them, and ignore commas between quotes.
There is a very very simple way to parse this:
/**
* returns a row of values as a list
* returns null if you are past the end of the input stream
*/
public static List<String> parseLine(Reader r) throws Exception {
int ch = r.read();
while (ch == '\r') {
//ignore linefeed chars wherever, particularly just before end of file
ch = r.read();
}
if (ch<0) {
return null;
}
Vector<String> store = new Vector<String>();
StringBuffer curVal = new StringBuffer();
boolean inquotes = false;
boolean started = false;
while (ch>=0) {
if (inquotes) {
started=true;
if (ch == '\"') {
inquotes = false;
}
else {
curVal.append((char)ch);
}
}
else {
if (ch == '\"') {
inquotes = true;
if (started) {
// if this is the second quote in a value, add a quote
// this is for the double quote in the middle of a value
curVal.append('\"');
}
}
else if (ch == ',') {
store.add(curVal.toString());
curVal = new StringBuffer();
started = false;
}
else if (ch == '\r') {
//ignore LF characters
}
else if (ch == '\n') {
//end of a line, break out
break;
}
else {
curVal.append((char)ch);
}
}
ch = r.read();
}
store.add(curVal.toString());
return store;
}
There are many advantages to this approach. Note that each character is touched EXACTLY once. There is no reading ahead, pushing back in the buffer, etc. No searching ahead to the end of the line, and then copying the line before parsing. This parser works purely from the stream, and creates each string value once. It works on header lines, and data lines, you just deal with the returned list appropriate to that. You give it a reader, so the underlying stream has been converted to characters using any encoding you choose. The stream can come from any source: a file, a HTTP post, an HTTP get, and you parse the stream directly. This is a static method, so there is no object to create and configure, and when this returns, there is no memory being held.
You can find a full discussion of this code, and why this approach is preferred in my blog post on the subject: The Only Class You Need for CSV Files.

My approach would not be to start by writing your own API. Life's too short, and there are more pressing problems to solve. In this situation, I typically:
Find a library that appears to do what I want. If one doesn't exist, then implement it.
If a library does exist, but I'm not sure it'll be suitable for my needs, write a thin adapter API around it, so I can control how it's called. The adapter API expresses the API I need, and it maps those calls to the underlying API.
If the library doesn't turn out to be suitable, I can swap another one in underneath the adapter API (whether it's another open source one or something I write myself) with a minimum of effort, without affecting the callers.
Start with something someone has already written. Odds are, it'll do what you want. You can always write your own later, if necessary. OpenCSV is as good a starting point as any.

i had to use a csv parser about 5 years ago. seems there are at least two csv standards: http://en.wikipedia.org/wiki/Comma-separated_values and what microsoft does in excel.
i found this libaray which eats both: http://ostermiller.org/utils/CSV.html, but afaik, it has no way of inferring what data type the columns were.

You might want to have a look at this specification for CSV. Bear in mind that there is no official recognized specification.
If you do not now the delimiter it will not be possible to do this so you have to find out somehow. If you can do a manual inspection of the file you should quickly be able to see what it is and hard code it in your program. If the delimiter can vary your only hope is to be able to deduce if from the formatting of the known data. When Excel imports CSV files it lets the user choose the delimiter and this is a solution you could use as well.

I agree with #Brian Clapper. I have used SuperCSV as a parser though I've had mixed results. I enjoy the versatility of it, but there are some situations within my own csv files for which I have not been able to reconcile "yet". I have faith in this product and would recommend it overall--I'm just missing something simple, no doubt, that I'm doing in my own implementation.
SuperCSV can parse the columns into various formats, do edits on the columns, etc. It's worth taking a look-see. It has examples as well, and easy to follow.
The one/only limitation I'm having is catching an 'empty' column and parsing it into an Integer or maybe a blank, etc. I'm getting null-pointer errors, but javadocs suggest each cellProcessor checks for nulls first. So, I'm blaming myself first, for now. :-)
Anyway, take a look at SuperCSV. http://supercsv.sourceforge.net/

At a minimum you are going to need to know the column delimiter.

Basically you will need to read the file line by line.
Then you will need to split each line by the delimiter, say a comma (CSV stands for comma-separated values), with
String[] strArr=line.split(",");
This will turn it into an array of strings which you can then manipulate, for example with
String name=strArr[0];
int yearOfBirth = Integer.valueOf(strArr[1]);
int monthOfBirth = Integer.valueOf(strArr[2]);
int dayOfBirth = Integer.valueOf(strArr[3]);
GregorianCalendar dob=new GregorianCalendar(yearOfBirth, monthOfBirth, dayOfBirth);
Student student=new Student(name, dob); //lets pretend you are creating instances of Student
You will need to do this for every line so wrap this code into a while loop. (If you don't know the delimiter just open the file in a text editor.)

I would recommend that you start by pulling your task apart into it's component parts.
Read string data from a CSV
Convert string data to appropriate format
Once you do that, it should be fairly trivial to use one of the libraries you link to (which most certainly will handle task #1). Then iterate through the returned values, and cast/convert each String value to the value you want.
If the question is how to convert strings to different objects, it's going to depend on what format you are starting with, and what format you want to wind up with.
DateFormat.parse(), for example, will parse dates from strings. See SimpleDateFormat for quickly constructing a DateFormat for a certain string representation.
Integer.parseInt() will prase integers from strings.
Currency, you'll have to decide how you want to capture it. If you want to just capture as a float, then Float.parseFloat() will do the trick (just use String.replace() to remove all $ and commas before you parse it). Or you can parse into a BigDecimal (so you don't have rounding problems). There may be a better class for currency handling (I don't do much of that, so am not familiar with that area of the JDK).

Writing your own parser is fun, but likely you should have a look at
Open CSV. It provides numerous ways of accessing the CSV and also allows to generate CSV. And it does handle escapes properly. As mentioned in another post, there is also a CSV-parsing lib in the Apache Commons, but that one isn't released yet.

Related

How to modify update a large file with small content changes at specific indexes

I need to modify a file. We've already written a reasonably complex component to build sets of indexes describing where interesting things are in this file, but now I need to edit this file using that set of indexes and that's proving difficult.
Specifically, my dream API is something like this
//if you'll let me use kotlin for a second, assume we have a simple tuple class
data class IdentifiedCharacterSubsequence { val indexOfFirstChar : int, val existingContent : String }
//given these two structures
List<IdentifiedCharacterSubsequences> interestingSpotsInFile = scanFileAsPerExistingBusinessLogic(file, businessObjects);
Map<IdentifiedCharacterSubsequences, String> newContentByPreviousContentsLocation = generateNewValues(inbterestingSpotsInFile, moreBusinessObjects);
//I want something like this:
try(MutableFile mutableFile = new com.maybeGoogle.orApache.MutableFile(file)){
for(IdentifiedCharacterSubsequences seqToReplace : interestingSpotsInFile){
String newContent = newContentByPreviousContentsLocation.get(seqToReplace);
mutableFile.replace(seqToReplace.indexOfFirstChar, seqtoReplace.existingContent.length, newContent);
//very similar to StringBuilder interface
//'enqueues' data changes in memory, doesnt actually modify file until flush call...
}
mutableFile.flush();
// ...at which point a single write-pass is made.
// assumption: changes will change many small regions of text (instead of large portions of text)
// -> buffering makes sense
}
Some notes:
I cant use RandomAccessFile because my changes are not in-place (the length of newContent may be longer or shorter than that of seq.existingContent)
The files are often many megabytes big, thus simply reading the whole thing into memory and modifying it as an array is not appropriate.
Does something like this exist or am I reduced to writing my own implementation using BufferedWriters and the like? It seems like such an obvious evolution from io.Streams for a language which typically emphasizes indexed based behaviour heavily, but I cant find an existing implementation.
Lastly: I have very little domain experience with files and encoding schemes, so I have taken no effort to address the 'two-index' character described in questions like these: Java charAt used with characters that have two code units. Any help on this front is much appreciated. Is this perhaps why I'm having trouble finding an implementation like this? Because indexes in UTF-8 encoded files are so pesky and bug-prone?

Reading ahead with BufferedReader (Java)

I'm writing a parser for files that look like this:
LOCUS SCU49845 5028 bp DNA PLN 21-JUN-1999
DEFINITION Saccharomyces cerevisiae TCP1-beta gene, partial cds, and Axl2p
(AXL2) and Rev7p (REV7) genes, complete cds.
ACCESSION U49845
VERSION U49845.1 GI:1293613
I want to get information preceded by certain tags (DEFINITION, VERSION etc.) but some descriptions cover multiple lines and I do need all of it. This is a problem when using BufferdReader to read my file.
I almost figured it out by using mark() and reset() but when executing my program I noticed that it only works for one tag and other tags are somehow skipped. This is the code I have so far:
Pattern pTag = Pattern.compile("^[A-Z]{2,}");//regex: 2 or more uppercase letters is a tag
Matcher mTagCurr = pTag.matcher(line);
if (mTagCurr.find()) {
reader.mark(1000);
String nextLine = reader.readLine();
Matcher mTagNext = pTag.matcher(nextLine);
if (mTagNext.find()){
reader.reset();
continue;
}
Pattern pWhite = Pattern.compile("^\\s{6,}");
Matcher mWhite = pWhite.matcher(nextLine);
while (mWhite.find()) {
line = line.concat(nextLine);
}
System.out.println(line);
}
This piece of code is supposed to find tags and concatenate descriptions that cover more than one line. Some answers I found here advised using Scanner. This is not an option for me. The files I work with can be very large (largest I encountered was >50GB) and by using BufferedReader I wish to put less of a strain on my system.
I suggest accumulating the information you get as your read it in a single pass parser. This will be simpler and faster in this case I suspect.
BTW, you want to cache your Patterns as creating them is quite expensive. You may find that you want ovoid using them entirely in some cases.
The code starts by finding a continuation line and calling reset() if it does not find it, but the code that reads additional lines does not seem to do that. Could it be reading the start of another section in the Genbank file and not putting it back? I don't see all the loop control code here, but what I do see appears to be correct.
If all else fails and you need something easy, there's always BioJava (see How to Read a Genbank File with Biojava3 and see if it helps). I have tried to use BioJava for my own projects, but it always falls a little short.
When I have written FASTA and FASTQ parsers, I read into a byte or char buffer and process it that way, but there is more buffer management code to write. That way, I don't have to worry about putting bytes back in a buffer. This can also avoid regex, which can be expensive in a time-critical application. Of course, this take more time to implement.
Tip: For fastest implementation if you are managing the buffer yourself, check out NIO (Java NIO Tutorial). I have seen give up a 10x speedup in some cases (writing data). The only drawback is that I have not found an easy way to read gzipped sequence data with NIO yet.

Java CSV parser with unescaped quotes [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I have a CSV file that has some quoting issues:
"Albanese Confectionery","157137","ALBANESE BULK ASST. MINI WILD FRUIT WORMS 2" 4/5LB",9,90,0,0,0,.53,"21",50137,"3441851137","5 lb",1,4,4,$6.7,$6.7,$26.8
SuperCSV is choking on these fruit worms (pun intended). I know that the 2" should probably be 2"", but it's not. LibreOffice actually parses this correctly (which surprises me). I was thinking of just writing my own little parser but other rows have commas inside the string:
"Albanese Confectionery","157230","ALBANESE BULK JET FIGHTERS,ASSORTED 4/5 B",9,90,0,0,0,.53,"21",50230,"3441851230","5 lb",1,4,4,$6.7,$6.7,$26.8
Does anyone know of a Java library that will handle crazy stuff like this? Or should I try all the available ones? Or am I better off hacking this out myself?
The right solution is to find the person who generated the data and beat them over the head with a keyboard until they fix the problem on their end.
Once you've exhausted that route, you could try some of the other CSV parsers on the market, I've used OpenCSV with success in the past.
Even if OpenCSV won't solve the problem out of the box, the code is fairly easy to read and available under an Apache license, so it might be possible to modify the algorithm to work with your wonky data, and probably easier than starting from scratch.
Surprising even myself here, but I think I would hack it myself. I mean, you only need to read the lines and generate the tokens by splitting on quotes/commas, whichever you want. That way you can adjust the logic the way it suites you. It's not very hard. The file seems to be broken as much so that going through some existing solutions seems like more work.
One point though - if LibreOffice already parses it correctly, couldn't you just save the file from there, thus generating a file that is more reasonable. However, if you think LibreOffice might be guessing, just write the tokenizer yourself.
+1 for the 'choking on fruit worms' pun - I nearly choked on my coffee reading that :)
If you really can't get that CSV fixed, then you could just supply your own Tokenizer (Super CSV is very flexible like that!).
You'd normally write your own readColumns() implementation, but it's quicker to extend the default Tokenizer and override the readLine() method to intercept the String (and fix the unescaped quotes) before it's tokenized.
I've made an assumption here that any quotes not next to a delimiter or at the start/end of the line should be escaped. It's far from perfect, but it works for your sample input. You can implement this however you like - it was too early in the morning for me to use a regex :)
This way you don't have to modify Super CSV at all (it just plugs in), so you get all of the other features like cell processors and bean mapping as well.
package org.supercsv;
import java.io.IOException;
import java.io.Reader;
import org.supercsv.io.Tokenizer;
import org.supercsv.prefs.CsvPreference;
public class FruitWormTokenizer extends Tokenizer {
public FruitWormTokenizer(Reader reader, CsvPreference preferences) {
super(reader, preferences);
}
#Override
protected String readLine() throws IOException {
final String line = super.readLine();
if (line == null) {
return null;
}
final char quote = (char) getPreferences().getQuoteChar();
final char delimiter = (char) getPreferences().getDelimiterChar();
// escape all quotes not next to a delimiter (or start/end of line)
final StringBuilder b = new StringBuilder(line);
for (int i = b.length() - 1; i >= 0; i--) {
if (quote == b.charAt(i)) {
final boolean validCharBefore = i - 1 < 0
|| b.charAt(i - 1) == delimiter;
final boolean validCharAfter = i + 1 == b.length()
|| b.charAt(i + 1) == delimiter;
if (!(validCharBefore || validCharAfter)) {
// escape that quote!
b.insert(i, quote);
}
}
}
return b.toString();
}
}
You can just supply this Tokenizer to the constructor of your CsvReader.

Benefits of MessageFormat in Java

In a certain Java class for a Struts2 web application, I have this line of code:
try {
user = findByUsername(username);
} catch (NoResultException e) {
throw new UsernameNotFoundException("Username '" + username + "' not found!");
}
My teacher wants me to change the throw statement into something like this:
static final String ex = "Username '{0}' not found!" ;
// ...
throw new UsernameNotFoundException(MessageFormat.format(ex, new Object[] {username}));
But I don't see the point of using MessageFormat in this situation. What makes this better than simple string concatenation? As the JDK API for MessageFormat says:
MessageFormat provides a means to produce concatenated messages in language-neutral way. Use this to construct messages displayed for end users.
I doubt that the end users would see this exception since it would only be displayed by the application logs anyway and I have a custom error page for the web application.
Should I change the line of code or stick with the current?
Should I change the line of code or stick with the current?
According to your teacher your should.
Perhaps he wants you to learn different approaches for the same thing.
While in the sample you provided it doesn't make much sense, it would be useful when using other types of messages or for i18n
Think about this:
String message = ResourceBundle.getBundle("messages").getString("user.notfound");
throw new UsernameNotFoundException(MessageFormat.format( message , new Object[] {username}));
You could have a messages_en.properties file and a messages_es.properties
The first with the string:
user.notfound=Username '{0}' not found!
And the second with:
user.notfound=¡Usuario '{0}' no encontrado!
Then it would make sense.
Another use of the MessageFormat is described in the doc
MessageFormat form = new MessageFormat("The disk \"{1}\" contains {0}.");
double[] filelimits = {0,1,2};
String[] filepart = {"no files","one file","{0,number} files"};
ChoiceFormat fileform = new ChoiceFormat(filelimits, filepart);
form.setFormatByArgumentIndex(0, fileform);
int fileCount = 1273;
String diskName = "MyDisk";
Object[] testArgs = {new Long(fileCount), diskName};
System.out.println(form.format(testArgs));
The output with different values for fileCount:
The disk "MyDisk" contains no files.
The disk "MyDisk" contains one file.
The disk "MyDisk" contains 1,273 files.
So perhaps your teacher is letting you know the possibilities you have.
Teachers way allows for easier localisation as you can extract a single string rather than several little bits.
But I don't see the point of using
MessageFormat in this situation
In that specific situation it doesn't buy you much. In general, using MessageFormat allows you to externalize those messages in a file. This allows you to:
localize the messages by language
edit the messages outside without
modifying source code
Personally, I would stick with the concatenation way, but it's just a matter of preference. Some people think it's cleaner to write a string with variables as one string, and then pass the params as a list after the string. The more variables you have in the string, the more sense using MessageFormat makes, but you only have one, so it's not a big difference.
Of course if you don't need internationalization, it is overhead, but basically the code as the teach wants it is more "internationalizable" (although not actually internationalized as the string is still hard coded).
Since this is a teaching situation, though he may be doing it just to show you how to use those classes rather than as the best way to program for this specific example.
In terms of the best way to program, if internationalization is a requirement, then you need to code for it, if not then don't. I just adds overhead and time (to write the code) for no reason.
Pace the other answers, the importance of MessageFormat for internationalizion is not just that it makes it easier to make an external file. In other languages the location of the parameter may be different in the sentence structure of the messages, so using MessageFormat allows you to change that per language, something that string concatenation would not.
One advantage that I see in using MessageFormat is that when you decide to externalize your strings, it would be much easier to build the message and also, it makes more sense to see "Username '{0}' not found!" in your resource file as one string accessed by only one ID.

I need a fast key substitution algorithm for java

Given a string with replacement keys in it, how can I most efficiently replace these keys with runtime values, using Java? I need to do this often, fast, and on reasonably long strings (say, on average, 1-2kb). The form of the keys is my choice, since I'm providing the templates here too.
Here's an example (please don't get hung up on it being XML; I want to do this, if possible, cheaper than using XSL or DOM operations). I'd want to replace all #[^#]*?# patterns in this with property values from bean properties, true Property properties, and some other sources. The key here is fast. Any ideas?
<?xml version="1.0" encoding="utf-8"?>
<envelope version="2.3">
<delivery_instructions>
<delivery_channel>
<channel_type>#CHANNEL_TYPE#</channel_type>
</delivery_channel>
<delivery_envelope>
<chan_delivery_envelope>
<queue_name>#ADDRESS#</queue_name>
</chan_delivery_envelope>
</delivery_envelope>
</delivery_instructions>
<composition_instructions>
<mime_part content_type="application/xml">
<content><external_uri>#URI#</external_uri></content>
</mime_part>
</composition_instructions>
</envelope>
The naive implementation is to use String.replaceAll() but I can't help but think that's less than ideal. If I can avoid adding new third-party dependencies, so much the better.
The appendReplacement method in Matcher looks like it might be useful, although I can't vouch for its speed.
Here's the sample code from the Javadoc:
Pattern p = Pattern.compile("cat");
Matcher m = p.matcher("one cat two cats in the yard");
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, "dog");
}
m.appendTail(sb);
System.out.println(sb.toString());
EDIT: If this is as complicated as it gets, you could probably implement your own state machine fairly easily. You'd pretty much be doing what appendReplacement is already doing, although a specialized implementation might be faster.
It's premature to leap to writing your own. I would start with the naive replace solution, and actually benchmark that. Then I would try a third-party templating solution. THEN I would take a stab at the custom stream version.
Until you get some hard numbers, how can you be sure it's worth the effort to optimize it?
Does Java have a form of regexp replace() where a function gets called?
I'm spoiled by the Javascript String.replace() method. (For that matter you could run Rhino and use Javascript, but somehow I don't think that would be anywhere near as fast as a pure Java call even if the Javascript compiler/interpreter were efficient)
edit: never mind, #mmyers probably has the best answer.
gratuitous point-groveling: (and because I wanted to see if I could do it myself :)
Pattern p = Pattern.compile("#([^#]*?)#");
Matcher m = p.matcher(s);
StringBuffer sb = new StringBuffer();
while (m.find())
{
m.appendReplacement(sb,substitutionTable.lookupKey(m.group(1)));
}
m.appendTail(sb);
// replace "substitutionTable.lookupKey" with your routine
You really want to write something custom so you can avoid processing the string more than once. I can't stress this enough - as most of the other solutions I see look like they are ignoring that problem.
Optionally turn the text into a stream. Read it char by char forwarding each char to an output string/stream until you see the # then read to the next # slurping out the key, substituting the key into the output: repeat until end of stream.
I know it's plain old brute for - but it's probably the best.
I'm assuming you have some reasonable assumption around '#' not just 'showing up' independant of your token keys in the input. :)
please don't get hung up on it being XML; I want to do this, if possible, cheaper than using XSL or DOM operations
Whatever's downstream from your process will get hung up if you don't also process the inserted strings for character escapes. Which isn't to say that you can't do it yourself if you have good cause, but does mean you either have to make sure your patterns are all in text nodes, and you also correctly escape the replacement text.
What exact advantage does #Foo# have over the standard &Foo; syntax already built into the XML libraries which ship with Java?
Text processing is going to always be bounded if you dont shift your paradigm. I dont know how flexible your domain is, so not sure if this is applicable, but here goes:
try creating an index into where your text substitution is - this is especially good if the template doesnt change often, because it becomes part of the "compile" of the template, into a binary object that can take in the value required for the substitutions, and blit out the entire string as a byte array. This object can be cached/saved, and next time, resubstitute in the new value to use again. I.e., you save on parsing the document every time. (implementation is left as an exercise to the reader =D )
But please use a profiler to check whether this is actually the bottleneck that you say it is before embarking on writing a custom templating engine. The problem may actually be else where.
As others have said, appendReplacement() and appendTail() are the tools you need, but there's something you have watch out for. If the replacement string contains any dollar signs, the method will try to interpret them as capture-group references. If there are any backslashes (which are used to escape the dollars sing), it will either eat them or throw an exception.
If your replacement string is dynamically generated, you may not know in advance whether it will contain any dollar signs or backslashes. To prevent problems, you can append the replacement directly to the StringBuffer, like so:
Pattern p = Pattern.compile("#([^#]*?)#");
Matcher m = p.matcher(s);
StringBuffer sb = new StringBuffer();
while (m.find())
{
m.appendReplacement("");
sb.append(substitutionTable.lookupKey(m.group(1)));
}
m.appendTail(sb);
You still have to call appendReplacement() each time, because that's what keeps you in sync with the match position. But this trick avoids a lot of pointless processing, which could give you a noticeable performance boost as a bonus.
this is what I use, from the apache commons project
http://commons.apache.org/lang/api/org/apache/commons/lang/text/StrSubstitutor.html
I also have a non-regexp based substitution library, available here. I have not tested its speed, and it doesn't directly support the syntax in your example. But it would be easy to extend to support that syntax; see, for instance, this class.
Take a look at a library that specializes in this, e.g., Apache Velocity. If nothing else, you can bet their implementation for this part of the logic is fast.
I wouldn't be so sure the accepted answer is faster than String.replaceAll(String,String). Here for your comparison is the implementation of String.replaceAll and the Matcher.replaceAll that is used under the covers. looks very similar to what the OP is looking for, and I'm guessing its probably more optomized than this simplistic solution.
public String replaceAll(String s, String s1)
{
return Pattern.compile(s).matcher(this).replaceAll(s1);
}
public String replaceAll(String s)
{
reset();
boolean flag = find();
if(flag)
{
StringBuffer stringbuffer = new StringBuffer();
boolean flag1;
do
{
appendReplacement(stringbuffer, s);
flag1 = find();
} while(flag1);
appendTail(stringbuffer);
return stringbuffer.toString();
} else
{
return text.toString();
}
}
... Chii is right.
If this is a template that has to be run so many times that speed matters, find the index of your substitution tokens to be able to get to them directly without having to start at the beginning each time. Abstract the 'compilation' into an object with the nice properties, they should only need updating after a change to the template.
Rythm a java template engine now released with an new feature called String interpolation mode which allows you do something like:
String result = Rythm.render("Hello #who!", "world");
The above case shows you can pass argument to template by position. Rythm also allows you to pass arguments by name:
Map<String, Object> args = new HashMap<String, Object>();
args.put("title", "Mr.");
args.put("name", "John");
String result = Rythm.render("Hello #title #name", args);
Since your template content is relatively long you could put them into a file and then call Rythm.render using the same API:
Map<String, Object> args = new HashMap<String, Object>();
// ... prepare the args
String result = Rythm.render("path/to/my/template.xml", args);
Note Rythm compile your template into java byte code and it's fairly fast, about 2 times faster than String.format
Links:
Check the full featured demonstration
read a brief introduction to Rythm
download the latest package or
fork it

Categories