java string tokenizer - java

What if I have a file that I am using a string tokenizer on to get values between commas. Its a csv file. Here is sample input:
test,first,second,,fourth,fifth
so how can i catch that empty comma? Right now its just pretending nothing is there. It doesn't even see that there is a place with nothing in it.

Using String#split() would be recommended over StringTokenizer.
String[] s = "test,first,second,,fourth,fifth".split(",");
System.out.println(Arrays.asList(s));
System.out.println(s.length);
// output:
// [test, first, second, , fourth, fifth]
// 6
Also, if you have much more involved CSV parsing in your code, if possible, try using an existing library like JavaCSV.

I am not sure if I am understanding your question correctly. I would use well-known packages like opencsv.

The split technique works great, so long as none of your elements have a comma inside it. You can use existing libraries. I've also had good results using regexp for CSV processing.

Related

Read CSV with semicolons into a String on Java

My main problem is that I'm trying to read a CSV delimited by ; in Java and the problem comes when I try to read a field of the CSV that contains a ;. For example:
"I want you to do that;"
In this case the field is recognized like
"I want you to do that"
And it creates another field that is just an empty string.
I use a BufferedReader to read the CSV and the split method to separate it with the ;. I'm not allowed to use libraries like OpenCSV so I want to find a solution with the method I'm using.
Parse according to the quotation marks
If the data incidentally containing the delimiter is wrapped in double quotes (QUOTATION MARK), then you should have no problem with parsing. Your parsing should look first for pairs of double quote characters. After that, look for delimiters outside of those pairs.
Rather than writing the parsing code yourself, I highly recommend using a CSV library. In the Java ecosystem, you have a wealth of good products to choose. For example, I have made successful use of Apache Commons CSV.
See also the specification for CSV: RFC 4180.

If data contains commas, how to store it in csv?

If the task is to create a csv file out of some data where commas may be present, is there a way to do it without later confusing which comma is a delimiter and which comma is part of a value?
Obviously, we can use a different delimiter, replace all occurrences, or replace the original comma with something else, but for the purpose of this question let's say that modifying the original data is not an option and a comma is the only delimiter allowed.
How would you approach something like this? Would it be easier to create the xls instead? Can you recommend any java libraries that handle this well?
A true CSV reader should be able to handle this; the values should be in quotes, e.g.:
one,two,"a, b, c",four
...per item #6 in Section 2 of the RFC.
While there's no single CSV standard, the usual convention is to surround entries containing commas in double quotes (i.e. ").
Prempting the next question: What to do if your data contains a double quote? In this case they are usually substituted for a pair of double quotes.
While I hate to cite wikipedia as a source, they do have a pretty good roundup of basic rules and examples for CSV formatting.
I would either use a different delimiter or use a library like Apache POI.
I think the best way is to use Apache POI: http://poi.apache.org/
You can easily create XLS documents without much hassle.
However, if you really need CSV and not XLS, you can surround the value with quotes. This should also solve the problem.
Usually, you work with , as separator and ' as quote. So your values would look like:
foo, 'bar, baz', iik, aje
the task is to create a csv file
Actually an impossible task, since there is no such thing as "a CSV" file. Different Microsoft produces have used different (subtly different, I grant) formats and named them all "CSV". As most spreadsheets can read delimiter separated value (DSV) files, you might be better writing one of those.

Is there a tool to take a block of text and turn it into Java StringBuffer code?

I have several blocks of text that I need to be able to paste inline in my code for some unit tests. It would make the code difficult to read if they were externalized, so is there some web tool where I can paste in my text and it will generate the code for a StringBuffer that preserves it's formatting? Or even a String, I'm not that picky at this point.
This seems like a code generator like this must exist somewhere on the web. I tried to Google one, but I have yet to come up with a set of search terms that don't fill my results with Java examples and documentation.
I suppose I could write one myself, but I'm in a bit of a time crunch and would rather not duplicate effort.
If I understood it correctly, any text editor which supports regexps should make it an easy task. For instance Notepad++ - just replace ^(.+)$ with "\1"+, then copy the result to the code, remove the last + and add String s = to the beginning :)
If you want to externalize then, use a properties file or something like that to read the text.
If you are looking for a simple tool to break up your text into concatenated strings that are joined together by stringbuffer then, most modern IDE will help you do it automatically. Here's how.
Copy the block of text in the IDE
Surround it in double quotes and assign to a String type variable. (This step may not be required)
Enter carriage returns wherever you want to wrap the text to next line and the IDE will automatically break the literals, concatenate them using double quotes "" and add them together
All modern compilers will internally convert "addas" + "addasfdas" literals to a String using StringBuffer.
The squirrel SQL client has a function called convert to string buffer it works nice.

Regarding Java Split Command CSV File Parsing

I have a csv file in the below format. I get an issue if either one of the beow csv data is read by the program
"D",abc"def,"","0429"292"0","11","IJ80","Feb10_1.txt-2","FILE RECORD","05/02/2010","04/03/2010","","1","-91","",""
"D","abc"def","","04292920","11","IJ80","Feb10_1.txt-2","FILE RECORD","05/02/2010","04/03/2010","","1","-91","",""
The below split command is used to ignore the commas inside the double quotes i got the below split command from an earlier post. Pasted the URL that i took this command
String items[] = line.split(",(?=([^\"]\"[^\"]\")[^\"]$)",15);
System.out.println("items.length"+items.length);
Regarding Java Split Command Parsing Csv File
The items.length is printed as 14 instead of 15. The abc"def is not recognized as a individual field and it's getting incorrectly stored as
"D",abc"def in items[0]. . I want it to be stored in the below way
items[0] should be "D" and items[1] should be abc"def
The same issue happens when there is a value "abc"def". I want it to be stored as
items[0] should be "D" and items[1] should be "abc"def"
Also this split command works perfectly if the double quotes repeated inside the double quotes( field value is D,"abc""def",1 ).
How can i resolve this issue.
I think you would be much better off writing a parser to parse the CSV files rather than try to use a regular expression. Once you start dealing with CSV files with carriage returns within the lines, then the Regex will probably fall apart. It wouldn't take that much code to write a simple while loop that went through all the characters and split up the data. It would be lot easier to deal with "Non-Standard"* CSV files such as yours when you have a parser rather than a Regex.
*I say non-standard because there isn't really an official standard for CSV, and when you're dealing with CSV files from many different systems, you see lots of weird things, like the abc"def field as shown above.
opencsv is a great simple and light weight CSV parser for Java. It will easily handle your data.
If possible, changing your CSV format would make the solution very simple.
See the following for an overview of Delimiter Separated Values, a common format on Unix-based systems:
http://www.faqs.org/docs/artu/ch05s02.html#id2901882
Opencsv is very simple and best API for CSV parsing . This can be done with Linux SED commands prior processing it in java . If File is not in proper format convert it into proper delimited which is your (" , " ) into pipe or other unique delimiter , so inside field value and column delimiter can be differentiated easily by Opencsv.Use the power of linux with your java code.

What's the best way to have stringTokenizer split up a line of text into predefined variables

I'm not sure if the title is very clear, but basically what I have to do is read a line of text from a file and split it up into 8 different string variables. Each line will have the same 8 chunks in the same order (title, author, price, etc). So for each line of text, I want to end up with 8 strings.
The first problem is that the last two fields in the line may or may not be present, so I need to do something with stringTokenizer.hasMoreTokens, otherwise it will die messily when fields 7 and 8 are not present.
I would ideally like to do it in one while of for loop, but I'm not sure how to tell that loop what the order of the fields is going to be so it can fill all 8 (or 6) strings correctly. Please tell me there's a better way that using 8 nested if statements!
EDIT: The String.split solution seems definitely part of it, so I will use that instead of stringTokenizer. However, I'm still not sure what the best way of feeding the individual strings into the constructor. Would the best way be to have the class expecting an array, and then just do something like this in the constructor:
line[1] = isbn;
line[2] = title;
The best way is to not use a StringTokenizer at all, but use String's split method. It returns an array of Strings, and you can get the length from that.
For each line in your file you can do the following:
String[] tokens = line.split("#");
tokens will now have 6 - 8 Strings. Use tokens.length() to find out how many, then create your object from the array.
Regular expression is the way. You can convert your incoming String into an array of String using the split method
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#split(java.lang.String)
Would a regular expression with capture groups work for you? You can certainly make parts of the expression optional.
An example line of data or three might be helpful.
Is this a CSV or similar file by any chance? If so, there are libraries to help you, for example Apache Commons CSV (link to alternatives on their page too). It will get you a String[] for each line in the file. Just check the array size to know what optional fields are present.

Categories