A strategy for parsing a tab-separated file

A strategy for parsing a tab-separated file - java

What would be the most primitive way of parsing a tab-separated file in Java, so that the tabular data would not lose the structure? I am not looking for a way to do it with Bean or Jsoup, since they are not familiar to me, a beginner. I need advice on what would be the logic behind it and what would be the efficient way to do it, for example if I have a table like
ID reference | Identifier | Type 1| Type 2 | Type 3 |
1 | red#01 | 15% | 20% | 10% |
2 | yellow#08 | 13% | 20% | 10% |
Correction: In this example I have Types 1 - 3, but my question applies to N number of types.
Can I achieve table parsing by just using arrays or is there a different data structure in Java that would be better for this task? This is how I think I should do it:
Scan/read the first line splitting at "\t" and create a String array.
Split that array into sub-arrays of 1 table heading per sub-array
Then, start reading the next line of the table, and for each sub-array, add the corresponding values from the columns.
Does this plan sound right or am I overcomplicating things/being completely wrong? Is there an easier way to do it? (provided that I still don't know how to split arrays into subarrays and how to populate the subarrays with the values from the table)

I would strongly suggest you use a read flat file parsing library for this, like the excellent OpenCSV.
Failing that, here is a solution in Java 8.
First, create a class to represent your data:
static class Bean {
private final int id;
private final String name;
private final List<Integer> types;
public Bean(int id, String name, List<Integer> types) {
this.id = id;
this.name = name;
this.types = types;
}
//getters
}
Your suggestion to use various lists is very scripting based. Java is OO so you should use that to your advantage.
Now we just need to parse the file:
public static void main(final String[] args) throws Exception {
final Path path = Paths.get("path", "to", "file.tsv");
final List<Bean> parsed;
try (final Stream<String> lines = Files.lines(path)) {
parsed = lines.skip(1).map(line -> line.split("\\s*\\|\\s*")).map(line -> {
final int id = Integer.parseInt(line[0]);
final String name = line[1];
final List<Integer> types = Arrays.stream(line).
skip(2).map(t -> Integer.parseInt(t.replaceAll("\\D", ""))).
collect(Collectors.toList());
return new Bean(id, name, types);
}).collect(Collectors.toList());
}
}
In essence the code skips the first line then loops over lines in the file and for each line:
Split the line on the delimiter - seems to be |. This requires regex so you need to escape the pipe as it is a special character. Also we consume any spaces before/after the delimiter.
Create a new Bean for each line by parsing the array elements.
First parse the id to an int
Next get the name
Finally get a Stream of the lines, skip the first two elements, and parse the remaining to a List<Integer>

I would suggest to use Apache Commons CSV package, like described on the homepage: http://commons.apache.org/proper/commons-csv/

I'd use Guava's Splitter and Table:
https://code.google.com/p/guava-libraries/wiki/StringsExplained#Splitter
https://code.google.com/p/guava-libraries/wiki/NewCollectionTypesExplained#Table

Related

How to fix "The type List is not generic; it cannot be parameterized with arguments <String>" error in cucumber selenium JAVA

I tried to use the data table and implements my function to fetch the
values from this Cucumber data table, I used List< List< String >> but
it doesn't work !
public void myfunction(DataTable dt) throws Throwable {
List> list = dt.asList(String.class);
driver.findElement(By.id("name")).sendKeys(list.get(0).get(0));
driver.findElement(By.id("age")).sendKeys(list.get(0).get(1));
driver.findElement(By.id("nphone")).sendKeys(list.get(1).get(0));
driver.findElement(By.id("address")).sendKeys(list.get(1).get(1));
}

Using Header we can implement Data Table in much clean & precise way and considering Data Table looks like below one -
And fill up the first & last name form with the following data
| First Name | Last Name |
| Tom | Adam |
| Hyden | Pointing |
public void myfunction(DataTable table) throws Throwable {
List<Map<String, String>> list = table.asMaps(String.class,String.class);
driver.findElement(By.id("name")).sendKeys(list.get(0).get("First Name"));
driver.findElement(By.id("age")).sendKeys(list.get(0).get("Last Name"));
driver.findElement(By.id("nphone")).sendKeys(list.get(1).get("First Name"));
driver.findElement(By.id("address")).sendKeys(list.get(1).get("Last Name"));
}
Implementation Rules - Below are 2 snippet and the most interesting snippet is the first one, the one that suggest that the argument to the method is a DataTable dataTable. The snippet suggests that you should replace the DataTable dataTable argument with any of:
- List<E>
- List<List<E>>
- List<Map<K,V>>
- Map<K,V>
- Map<K, List<V>>
It also tells us that each type, E, K, V must be of any of these types:
String
Integer
Float,
Double
Byte
Short
Long
BigInteger
BigDecimal

Check your imports, please. I have downloaded java.awt.list by mistake. It worked when I imported java.util.list.
Like :
import java.util.list;

Facilitate SQL table query functionality in Java application [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
Say I have a couple of columns say First Name, LastName, Email, Phone.
I want to query for a row based on a dynamic column selection.
Say the application will ask for a record based on 1) lastname and phone or 2) FirstName 3) Phone and Email
Instead of creating a table to do a SQL query to find a row based on the column data is there a data structure which suits my needs? I am coding in Java, so if there is an inbuilt API please suggest one
FirstName | LastName | Email | Phone
abc | xyz | abc#m.com | 123
pqr | qwe | pqr#m.com | 342
ijk | uio | ijk#m.com | 987

I'd point you to any of the available in memory SQL Db libraries:
H2
Derby
HSQL
Or maybe you want an indexable, queryable in-memory store:
Hazelcast
Ehcache
Any one of these allows you to write a query against the data stored.

If you want to have the information loaded into memory and available for multiple queries, I would use a lookup structure using a Map (e.g. a HashMap) and ArrayList.
Note: If your only going to query once, I would do it directly in the look when reading the lines.
EG: HashMap<String, ArrayList<wordLocation>> lookup= new HashMap<String, ArrayList<wordLocation>>();
Example:
import java.util.ArrayList;
import java.util.HashMap;
public class WordLookup {
public static void main(String args[]) {
WordLookup wl = new WordLookup();
String[] simulatedFileRows = new String[5];
simulatedFileRows[0] = "cat,dog";
simulatedFileRows[1] = "hen,dog";
simulatedFileRows[2] = "cat,mouse";
simulatedFileRows[3] = "moose,squirrel";
simulatedFileRows[4] = "chicken,rabbit";
String columns[];
String row;
int column = 0;
for(int i=0; i<simulatedFileRows.length; i++) //Simulated readline
{
row = simulatedFileRows[i];
columns = row.split(",");
column=0;
for(String col:columns)
{
column++;
wl.addWord(col, i, column);
}
}
//Where is moose?
ArrayList<wordLocation> locs = wl.getWord("moose");
if(locs!=null)
{
System.out.println("Moose found at:");
for(wordLocation loc: locs)
System.out.println("\t line:"+ loc.line + " column" + loc.column);
}
}
private HashMap<String, ArrayList<wordLocation>> lookup= new HashMap<String, ArrayList<wordLocation>>();
public void addWord(String word, int line, int column)
{
ArrayList<wordLocation> wordLocArr = lookup.get(word);
if(wordLocArr == null)
{
wordLocArr = new ArrayList<wordLocation>();
lookup.put(word,wordLocArr);
}
wordLocArr.add( new wordLocation(line, column));
}
public ArrayList<wordLocation> getWord(String word)
{
return lookup.get(word);
}
class wordLocation{
public int line, column;
public wordLocation(int l, int c)
{this.line = l; this.column = c;}
}
}

suppose you have something like HashMap map for field=>value
then you can do (if you dont want to query with value, you can take out the where statement)
if(map.size()>0){
String whereStatement = " 1=1 ";
String selectStatement = " ";
for(String field : map.keySet()){
whereStatement+= " AND "+ field+"="+map.get(field);
selectStatement+= field+",";
}
selectStatement.replaceLast(",", "");
String query = "SELECT " + selectStatement + " FROM sometable " + whereStatement;
}

If you don't index the columns in an SQL DB, that's roughly equivalent to simply having an array, where each element corresponds to a row.
If you do index the columns, that's about the same as additionally having something like a TreeMap (of string or integer or some collection of objects, depending on the type of the fields, to array index) for each index (at least based on my somewhat limited knowledge of the underlying structure of DBs - actually I think databases typically use b-trees, but there isn't a b-tree structure in the standard Java API to my knowledge).
Actually a TreeMap to array index isn't sufficient for non-unique indices, you'll have to have a TreeMap to a list of array indices, or a MultiMap (not in the standard API).
If you don't have an index for any given query, you'll have to iterate through all the rows to find the correct one. Similarly, you'll have to iterate through the whole array to find the correct element.
So, if you only want to query single columns (and do so efficiently), and this can be any of the columns, you'll have to have a TreeMap (or similar) for each column, and an array or similar as a base structure.
If, however, we're talking about querying any combination of columns, you're unlikely to get a particularly efficient generic solution, as there would simply be too many combinations to have a structure for all of them, even for a small number of columns.
Note: I say TreeMap as opposed to HashMap, as this is closer to how databases actually work. If the types of queries you're running doesn't require sorted data, you could happily use a HashMap instead.

String Array into arraylist?

I have a csv file with unknown amount of columns and row. The only thing I know is that each entry is separated by a comma. Can I use the split method to convert each line of the data into an array and then can I store that Array into an Arraylist. One of the things that concerns me is would I be able to rearrange the Arraylist alphabetically or numerically.

I would suggest using OpenCSV. If you just split on the comma separator, and you happen to have a single cell text containing a comma, but which is enclosed in double quotes to make it clear that it's a single cell, the split method won't work:
1, "I'm a single cell, with a comma", 2
3, hello, 4
OpenCSV will let you read each line as an array of Strings, handling this problem for you, and you can of course store each array inside a List. You will need a custom comparator to sort your list of lines. Search StackOverflow: the question of how to sort a list comes back twice a day here.

Yes, you can use:
String[] array = input.split("\",\"");
List<String> words = new ArrayList<String>(Arrays.asList(array))
Note that Arrays.asList(..) also returns a List, but you can't modify it.
Also note that the above split is on ",", because CVSs usually look like this:
"foo","foo, bar"

Using split with simple comma is not a fool proof one. If your column data contains a comma, csv would be stored something like a,"b,x",c. In such case split would fail.
I'm not a regex expert maybe someone could write a EMBEDDED_COMMA_DETECTING_REGEX or GIYF.
String[] array = input.split(EMBEDDED_COMMA_DETECTING_REGEX);
List<String> words = new ArrayList<String>(Arrays.asList(array));

There are several questions here so I'll cover each point individually.
Can I use the split method convert each line of the data into an array
This would work as you expect in the naive case. However, it doesn't know anything about escaping; so if a comma is embedded within a field (and properly escaped, usually by double-quoting the whole field) the simple split won't do the job here and will chop the field in two.
If you know you'll never have to deal with embedded commas, then calling line.split(",") is acceptable. The real solution however is to write a slightly more involved parse method which keeps track of quotes, and possibly backslash escapes etc.
...into an array than can I store that Array into an Arraylist
You certainly could have an ArrayList<String[]>, but that doesn't strike me as particularly useful. A better approach would be to write a simple class for whatever it is the CSV lines are representing, and then create instances of that class when you're parsing each line. Something like this perhaps:
public class Order {
private final int orderId;
private final String productName;
private final int quantity;
private final BigDecimal price;
// Plus constructor, getters etc.
}
private Order parseCsvLine(String line) {
String[] fields = line.split(",");
// TODO validation of input/error checking
final int orderId = Integer.parseInt(fields[0]);
final String productName = fields[1];
final int quantity = Integer.parseInt(fields[2]);
final BigDecimal price = new BigDecimal(fields[3]);
return new Order(orderId, productName, quantity, price);
}
Then you'd have a list of Orders, which more accurately represents what you have in the file (and in memory) than a list of string-arrays.
One of the things that concerns me is would I be able to rearrange the Arraylist according alphabetically or numerically?
Sure - the standard collections support a sort method, into which you can pass an instance of Comparator. This takes two instances of the object in the list, and decides which one comes before the other.
So following on from the above example, if you have a List<Order> you can pass in whatever comparator you want to sort it, for example:
final Comparator<Order> quantityAsc = new Comparator<Order>() {
public int compare(Order o1, Order o2) {
return o2.quantity - o1.quantity; // smaller order comes before bigger one
}
}
final Comparator<Order> productDesc = new Comparator<Order>() {
public int compare(Order o1, Order o2) {
if (o2.productName == null) {
return o1.productName == null ? 0 : -1;
}
return o2.productName.compareTo(o1.productName);
}
}
final List<Order> orders = ...; // populated by parsing the CSV
final List<Order> ordersByQuantity = Collections.sort(orders, quantityAsc);
final List<Order> ordersByProductZToA = Collections.sort(orders, productDesc);

Why does my XML parser only returns one string, instead of multiple ones?

I got a problem regarding parsing XML data. I have divided my program into 3 different java files, each containing a class. One of them is rssparser.java. This file holds a function called iterateRSSFeed(String URL), this function returns a string containing the parsed description tag. In my main.java files where my main method is, I call this iterateRSSFeed function this way:
rssparser r = new rssparser();
String description = r.iterateRSSFeed();
And then I am planning to add this String to a JLabel, this way:
JLabel news = new JLabel(description);
which obviously works great, my program runs. BUT there are more description tags in my XML file, the JLabel only contains one(1) parsed description tag. I should say that my return statement in the iterateRSSFeed function is "packed" in a for-loop, which in my head should return all of the description tags. But no.
Please ask if something is uncleared or showing of the source code is a better way to provide a solution to my answer. Thanks in advance! :)

When Java executes a return statement, it will leave the method, and not continue running the loop.
If you want to return multiple values from a method, you have to put them in some object grouping them together. Normally one would use a List<String> as return type.
Then your loop will fill the list, and the return statement (after the loop) can return the whole list at once.
If you want to have one large string instead of multiple ones, you'll have to merge them into one.
The easiest would be to simply use the .toString() method on the list, this will give (if you are using the default list implementations) something like [element1, element2, element3].
If you don't like the [,], you could simply concatenate them:
List<String> list = r.iterateRSSFeed();
StringBuilder b = new StringBuilder();
for(String s : list) {
b.append(s);
}
String description = b.toString();
This will give element1element2element3.
As Java's JLabel has some rudimentary HTML support, you could also use this to format your list as a list:
List<String> list = r.iterateRSSFeed();
StringBuilder b = new StringBuilder();
b.append("<html><ul>");
for(String s : list) {
b.append("<li>");
b.append(s);
b.append("</li>");
}
b.append("</ul>");
String description = b.toString();
The result will be <html><ul><li>element1</li><li>element2</li><li>element3</li></ul>, which will be formatted by the JLabel as something like this:
element1
element2
element3

Groovy / Java method for converting nested List String representation back to List

I need to convert a String representation of a nested List back to a nested List (of Strings) in Groovy / Java, e.g.
String myString = "[[one, two], [three, four]]"
List myList = isThereAnyMethodForThis(myString)
I know that there's the Groovy .split method for splitting Strings by comma for example and that I could use regular expressions to identify nested Lists between [ and ], but I just want to know if there's an existing method that can do this or if I have to write this code myself.
I guess the easiest thing would be a List constructor that takes the String representation as an argument, but I haven't found anything like this.

In Groovy, if your strings are delimited as such, you can do this:
String myString = "[['one', 'two'], ['three', 'four']]"
List myList = Eval.me(myString)
However, if they are not delimited like in your example, I think you need to start playing with the shell and a custom binding...
class StringToList extends Binding {
def getVariable( String name ) {
name
}
def toList( String list ) {
new GroovyShell( this ).evaluate( list )
}
}
String myString = "[[one, two], [three, four]]"
List myList = new StringToList().toList( myString )
Edit to explain things
The Binding in Groovy "Represents the variable bindings of a script which can be altered from outside the script object or created outside of a script and passed into it."
So here, we create a custom binding which returns the name of the variable when a variable is requested (think of it as setting the default value of any variable to the name of that variable).
We set this as being the Binding that the GroovyShell will use for evaluating variables, and then run the String representing our list through the Shell.
Each time the Shell encounters one, two, etc., it assumes it is a variable name, and goes looking for the value of that variable in the Binding. The binding simply returns the name of the variable, and that gets put into our list
Another edit... I found a shorter way
You can use Maps as Binding objects in Groovy, and you can use a withDefault closure to Maps so that when a key is missing, the result of this closure is returned as a default value for that key. An example can be found here
This means, we can cut the code down to:
String myString = "[[one, two], [three, four]]"
Map bindingMap = [:].withDefault { it }
List myList = new GroovyShell( bindingMap as Binding ).evaluate( myString )
As you can see, the Map (thanks to withDefault) returns the key that was passed to it if it is missing from the Map.

I would parse this String manually. Each time you see a '[' create a new List, each time you see a ',' add an element to the list and each time you see a ']' return.
With a recursive method.
public int parseListString(String listString, int currentOffset, List list){
while(currentOffset < listString.length()){
if(listString.startsWith("[", currentOffset)){
//If there is a [ we need a new List
List newList = new ArrayList();
currentOffset = parseListString(listString, currentOffset+1, newList);
list.add(newList);
}else if(listString.startsWith("]", currentOffset){
//If it's a ], then the list is ended
return currentOffset+1;
}else{
//Here we have a string, parse it until next ',' or ']'
int nextOffset = Math.min(listString.indexOf(',', currentOffset), listString.indexOf(']', currentOffset));
String theString = listString.substring(int currentOffset, int nextOffset);
list.add(theString);
//increment currentOffset
currentOffset = nextOffset;
}
}
return currentOffset;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

A strategy for parsing a tab-separated file - java

I would suggest to use Apache Commons CSV package, like described on the homepage: http://commons.apache.org/proper/commons-csv/

I'd use Guava's Splitter and Table: https://code.google.com/p/guava-libraries/wiki/StringsExplained#Splitter https://code.google.com/p/guava-libraries/wiki/NewCollectionTypesExplained#Table

Related

How to fix "The type List is not generic; it cannot be parameterized with arguments <String>" error in cucumber selenium JAVA

Facilitate SQL table query functionality in Java application [closed]

String Array into arraylist?

Why does my XML parser only returns one string, instead of multiple ones?

Groovy / Java method for converting nested List String representation back to List

Categories

Resources