Parsing file and detection of pattern in Java - java

My problem is quite simple, i am parsing a CSV file, line after line and i want to get the values of the columns.
The separators used are simply ";", but my file can have quite a lot of columns, and they won't be always in the same order.
So as example for my .CSV file :
time;columnA;columnB,ColumnC
27-08-2013 14:43:00; this is a text; this too; same here
And i would like to be able to get all the values of time, columnA, columnB and columnC.
What would be the easiest way?
I used StringUtils.countMatches(input, ";"); to get the number of separators i have.
I started trying to make a String index[][] = {{"time", "columnA", "columnB", "columnC"}{}};
And my plan was to fill this with the number of ";" before each other variable, so that i could easily now which result stands for which variable.
But now i'm quite stuck...
If you want me to show more of my code, i can.
Hope that someone can help me ! :)
(sorry for the poor english).

You can simply use split() method
For instance:
Scanner aScanner = ...
ArrayList<String> L = new ArrayList<String>();
while(aScanner.hasNextLine()){
L.add(aScanner.nextLine());
}
int rows = L.size();
String[][] S = new String[rows][];
for (int i = 0; i < rows; i++) {
S[i] = L.get(i).split(";");
}

Related

Unable to customize String format to print column-wise features

I am trying to print the result of my computation into JTextArea in columns similar to table. I have three features (three columns) and I use the following String format solution that I found here but it does not align them properly. Since I don't understand "%1$5s %2$-40s %3$-20s";, kindly can someone fix it?
String format = "%1$5s %2$-40s %3$-20s";
for (int i = 0; i < 10; i++) {
String s = String.format(format, i*5, km.runningTime[i], km.DB[i]);
jtextarea.append(s+ "\n");
}
You can find the explanations in the Format String Syntax paragraph of the Formatter documentation of Oracle.
For example: %1$5s means: first argument (it starts to count at 1 and not 0), 5 is the width and s means string. In %2$-40s, the minus sign means that the result will be left-justified. I let you guess the last part.
I hope it helps.

parsing values from text file in java

I've got some text files I need to extract data from. The file itself contains around a hundred lines and the interesting part for me is:
AA====== test==== ====================================================/
AA normal low max max2 max3 /
AD .45000E+01 .22490E+01 .77550E+01 .90000E+01 .47330E+00 /
Say I need to extract the double values under "normal", "low" and "max". Is there any efficient and not-too-error-prone solution other than regexing the hell out of the text file?
If you really want to avoid regexes, and assuming you'll always have this same basic format, you could do something like:
HashMap<String, Double> map = new HashMap<>();
Scanner scan = new Scanner(filePath); //or your preferred input mechanism
assert (scan.nextLine().startsWith("AA====:); //remove the top line, ensure it is the top line
while (scan.hasNextLine()){
String[] headings = scan.nextLine().split("\\s+"); //("\t") can be used if you're sure the delimiters will always be tabs
String[] vals = scan.nextLine().split("\\s+");
assert headings[0].equals("AA"); //ensure
assert vals[0].equals("AD");
for (int i = 1; i< headings.length; i++){ //start with 1
map.put(headings[i], Double.parseDouble(vals[i]);
}
}
//to make sure a certain value is contained in the map:
assert map.containsKey("normal");
//use it:
double normalValue = map.get("normal");
}
Code is untested as I don't have access to an IDE at the moment. Also, I obviously don't know what's variable and what will remain constant here (read: the "AD", "AA", etc.), but hopefully you get the gist and can modify as needed.
If each line will always have this exact form you can use String.split()
String line; // Fill with one line from the file
String[] cols = line.split(".")
String normal = "."+cols[0]
String low = "."+cols[1]
String max = "."+cols[2]
If you know what index each value will start, you can just do substrings of the row. (The split method technically does a regex).
i.e.
String normal = line.substring(x, y).trim();
String low = line.substring(z, w).trim();
etc.

Java simple line parser

I could see bunch of java parsers like OpenCSV, antlr, jsapar etc, but I dont see any of those with ability to specify both custom line seperator and column seperator? Do we have any such easy to use libraries. I dont want to write one using Scanner or Stringtokenizer now!
Eg. A | B || C | D || E | F
want to break this above string to something like {{A,B},{C,D},{E,F}}
You can parse it yourself, it's quite simple to achieve. I haven't test this code practically, you may try it yourself.
line_delimiter = "||";
column_delimiter = "|";
String rows[];
rows = str.split(line_delimiter);
for (int i = 0; i < rows.length; i++) {
String columns[];
columns = rows[i].split(column_delimiter);
for (int j = 0; j < columns.length; j++) {
// Do something to your data here;
}
}
Actually, with JSaPar you can have any character sequence for both line delimiter as well as cell delimiter. You specify which delimiter to use within your schema and it can be any number of characters.
The problem you will face by using the same character in both is that the parser will not know if you have a line break or if it is just an empty cell.

Java: How To Grab Each nth Lines From a String

I'm wondering how I could grab each nth lines from a String, say each 100, with the lines in the String being seperated with a '\n'.
This is probably a simple thing to do but I really can't think of how to do it, so does anybody have a solution?
Thanks much,
Alex.
UPDATE:
Sorry I didn't explain my question very well.
Basically, imagine there's a 350 line file. I want to grab the start and end of each 100 line chunk. Pretending each line is 10 characters long, I'd finish with a 2 seperate arrays (containing start and end indexes) like this:
(Lines 0-100) 0-1000
(Lines 100-200) 1000-2000
(Lines 200-300) 2000-3000
(Lines 300-350) 3000-3500
So then if I wanted to mess around with say the second set of 100 lines (100-200) I have the regions for them.
You can split the string into an array using split() and then just get the indexes you want, like so:
String[] strings = myString.split("\n");
int nth = 100;
for(int i = nth; i < strings.length; i + nth) {
System.out.println(strings[i]);
}
String newLine = System.getProperty("line.separator");
String lines[] = text.split(newLine);
Where text is string with your whole text.
Now to get nth line, do e.g.:
System.out.println(lines[nth - 1]); // Minus one, because arrays in Java are zero-indexed
One approach is to create a StringReader from the string, wrap it in a BufferedReader and use that to read lines. Alternatively, you could just split on \n to get the lines, of course...
String[] allLines = text.split("\n");
List<String> selectedLines = new ArrayList<String>();
for (int i = 0; i < allLines.length; i += 100)
{
selectedLines.add(allLines[i]);
}
This is simpler code than using a BufferedReader, but it does mean having the complete split string in memory (as well as the original, at least temporarily, of course). It's also less flexible in terms of being adapted to reading lines from other sources such as a file. But if it's all you need, it's pretty straightforward :)
EDIT: If the start indexes are needed too, it becomes slightly more complicated... but not too bad. You probably want to encapsulate the "start and line" in a single class, but for the sake of brevity:
String[] allLines = text.split("\n");
List<String> selectedLines = new ArrayList<String>();
List<Integer> selectedIndexes = new ArrayList<Integer>();
int index = 0;
for (int i = 0; i < allLines.length; i++)
{
if (i % 100 == 0)
{
selectedLines.add(allLines[i]);
selectedIndexes.add(index);
}
index += allLines[i].length + 1; // Add 1 for the trailing "\n"
}
Of course given the start index and the line, you can get the end index just by adding the line length :)

Get csv and compare lines. ArrayList? Java

i dont't use java very often and now i got some Problem.
I want to read a CSV file like this one:
A,B,C,D
A,B,F,K
E,F,S,A
A,B,C,S
A,C,C,S
Java don't know dynamic arrays, so i choose an ArrayList. This works so far. The Problem is:
How can I store the ArrayList? I think an other ArrayList would help.
This is what I got:
BufferedReader reader = new BufferedReader(
new InputStreamReader(this.getClass().getResourceAsStream(
"../data/" + filename + ".csv")));
List rows = new ArrayList();
String line;
while ((line = reader.readLine()) != null) {
rows.add(Arrays.asList(line.split(",")));
}
Now I get an ArrayList with a size of 5 for rows.size().
How do I get row[0][0] for example?
What do I want to do? The Problem is i want to find the same row except the last column.
For example i want to find row 0 and row 3.
thank you very much
Thank you all! You helped me a lot. =) Maybe Java and I will become friends =) THANKS!
You don't need to know the row size in advance, String.split() returns a String array:
List<String[]> rows = new ArrayList<String[]>();
String line = null;
while((line = reader.readLine()) != null)
rows.add(line.split(",", -1));
To access a specific row:
int len = rows.get(0).length;
String val = rows.get(0)[0];
Also, are you always comparing by the entire row except the last column? You could just take off the last value (line.replaceFirst(",.*?$", "")) and compare the rows as strings (have to be careful of whitespace and other formatting, of course).
A slightly different way:
Set<String> rows = new HashSet<String>();
String line = null;
while((line = reader.readLine()) != null){
if(!rows.add(line.substring(0, line.lastIndexOf(','))))
System.out.println("duplicate found: " + line);
}
Of course, modify as necessary if you actually need to capture the matching lines.
You'll need to declare an ArrayList of arrays. Asuming that csv file has a known number of columns, the only dynamic list needed here are the "rows" of your "table", formed by an ArrayList(rows) of arrays char[] (columns). (If not, then an ArrayList of ArrayList is fine).
It's just like a 2D table in any other language: an array of arrays. Just that in this case one of the arrays needs to be dynamic.
To read the file you'll need two loops. One that reads each line, just as you're doing, and another one that reads char per char.
Just a quick note: if you are going to declare an array like this:
char[] row = new char[5];
and then going to add each row to the ArrayList like this:
yourList.add(row);
You will have a list full of pointers to the same array. You'll need to use the .clone() method like this:
yourList.add(row.clone());
To access it like table[1][2], you'll need to use arraylist.get(1).get(2);

Categories