I have a txt file where some columns do not appear in every row but this causes the problem that in the rows where they appear they mess up the order of my columns:
35=d|5799=00000000|980=A|779=20190721173046000465|1180=310|1300=64|462=5|207=XCME|1151=ES|6937=ES|55=ESM0|48=163235|22=8|167=FUT|461=FFIXSX|200=202006|15=USD|1142=F|562=1|1140=3000|969=25.000000000|9787=0.010000000|996=IPNT|1147=50.000000000|1150=302775.000000000|731=00000110|5796=20190724|1149=315600.000000000|1148=285500.000000000|1143=600.000000000|1146=12.500000000|9779=N|864=2|865=5|1145=20190315133000000000|865=7|1145=20200619133000000000|1141=1|1022=GBX|264=10|870=1|871=24|872=00000000000001000010000000001111|1234=0|5791=279|5792=10121|
35=d|5799=00000000|980=A|779=20190721173046000465|1180=310|1300=64|462=5|207=XCME|1151=ES|6937=ES|55=ESU9|48=191262|22=8|167=FUT|461=FFIXSX|200=201909|15=USD|1142=F|562=1|1140=3000|969=25.000000000|9787=0.010000000|996=IPNT|1147=50.000000000|1150=302150.000000000|731=00000110|5796=20190724|1149=315700.000000000|1148=285600.000000000|1143=600.000000000|1146=12.500000000|9779=N|864=2|865=5|1145=20180615133000000000|865=7|1145=20190920133000000000|1141=1|1022=GBX|264=10|870=1|871=24|872=00000000000001000010000000001111|1234=0|5791=250519|5792=452402|
35=d|5799=00000000|980=A|779=20190721173046000465|1180=310|1300=64|462=5|207=XCME|1151=$E|6937=0ES|55=0ESQ9|48=229588|22=8|167=FUT|461=FFIXSX|200=201908|15=USD|1142=F|562=1|1140=3000|969=25.000000000|9787=0.010000000|996=IPNT|1147=50.000000000|1150=25.000000000|731=00000011|5796=20190607|1143=0.000000000|1146=12.500000000|9779=N|864=2|865=5|1145=20190621133000000000|865=7|1145=20190816133000000000|1141=1|1022=GBX|264=10|870=1|871=24|872=00000000000001000010000000001111|1234=0|
35=d|5799=00000000|980=A|779=20190721173114000729|1180=441|1300=56|462=16|207=DUMX|1151=1O|6937=OQE|55=OQEH4 C6100|48=1546|22=8|167=OOF|461=OCEFPS|201=1|200=202403|15=USD|202=6100.000000000|947=USD|9850=0.100000000|1142=F|562=1|1140=999|969=1.000000000|1146=10.000000000|9787=0.010000000|996=BBL|1147=1000.000000000|731=00000001|1148=0.100000000|9779=N|5796=20190718|864=2|865=5|1145=20181031213000000000|865=7|1145=20240126193000000000|1141=1|1022=GBX|264=3|870=1|871=24|872=00000000000001000000000100000101|1234=1|1093=4|1231=1.0000|711=1|309=211120|305=8|311=OQDH4|1647=0|
35=d|5799=00000000|980=A|779=20190721173115000229|1180=441|1300=56|462=16|207=DUMX|1151=1O|6937=OQE|55=OQEM4 C5700|48=2053|22=8|167=OOF|461=OCEFPS|201=1|200=202406|15=USD|202=5700.000000000|947=USD|9850=0.100000000|1142=F|562=1|1140=999|969=1.000000000|1146=10.000000000|9787=0.010000000|996=BBL|1147=1000.000000000|731=00000001|1148=0.100000000|9779=N|5796=20190718|864=2|865=5|1145=20181031213000000000|865=7|1145=20240425183000000000|1141=1|1022=GBX|264=3|870=1|871=24|872=00000000000001000000000100000101|1234=1|1093=4|1231=1.0000|711=1|309=329748|305=8|311=OQDM4|1647=0|
For example in the first three rows there always comes 461=… and then 200=… while starting from the 4th row between 461=… and 200=… there is 201=…
Now I thought of somehow moving every column which appears later which was not there in the first row to the end of the row so that it becomes the last column but I do not know how to do exactly this operation. Here is what I have tried:
private static void ladeDatei(String datName) {
File file = new File(datName);
if (!file.canRead() || !file.isFile())
System.exit(0);
BufferedReader in = null;
try {
in = new BufferedReader(new FileReader(datName));
String row = null;
String row2 = null;
while ((row = in.readLine()) != null) {
System.out.println("Gelesene Zeile: " + row);
while(row.contains("|")) {
row2 = row.substring(row.indexOf("|") + 1);
row=row2;
row2 = row.substring(0, row.indexOf("=") + 1);
row2 = row2.replace("=", "");
if(!numbers.contains(row2)) {
numbers.add(row2);
}
System.out.println(row);
//System.out.println(row2);
}
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (in != null)
try {
in.close();
} catch (IOException e) {
}
}
}
I thought about splitting every row by | and save them in the textArr list but then I wouldn't know which rows belong together. My main problem is that I don't know a good way to check if the column exists in an earlier row and how to move it to the end of the row.
EDIT: Now I saved every new entry in the numbers arraylist (see my edit in the code above) but now I am stuck because I don't know how to shift them and all the ones which come after them to the end of each row.
That's a hell of a job. What I would do is:
(1) split the lines at |
(2) make a List where You append the numbers between | and = (append each new number at the end)
(3) make a Map where the line parts are mapped to the numbers in (2) as key
(4) make a second Map where the max-column-values of the line parts are mapped to the numbers in (2)
(5) read through the List from (2) joinig the associated line parts with | padded to the max-column-values
(if there is no line part for a specific number You must do the padding as well)
When ever possible — I would prefer to structure the line parts in a html-table.
The change of the column order will not solve the problem of broader or smaller colums.
Related
I am building a small Java utility (using Jackson) to catch errors in Java files, and one part of it is a text area, in which you might paste some JSON context and it will tell you the line and column where it's found it:
I am using the error message to take out the line and column as a string and print it out in the interface for someone using it.
This is the JSON sample I'm working with, and there is an intentional error beside "age", where it's missing a colon:
{
"name": "mkyong.com",
"messages": ["msg 1", "msg 2", "msg 3"],
"age" 100
}
What I want to do is also highlight the problematic area in a cyan color, and for that purpose, I have this code for the button that validates what's inserted in the text area:
cmdValidate.addActionListener(new ActionListener() {
public void actionPerformed(ActionEvent e) {
functionsClass ops = new functionsClass();
String JSONcontent = JSONtextArea.getText();
Results obj = new Results();
ops.validate_JSON_text(JSONcontent, obj);
String result = obj.getResult();
String caret = obj.getCaret();
//String lineNum = obj.getLineNum();
//showStatus(result);
if(result==null) {
textAreaError.setText("JSON code is valid!");
} else {
textAreaError.setText(result);
Highlighter.HighlightPainter cyanPainter;
cyanPainter = new DefaultHighlighter.DefaultHighlightPainter(Color.cyan);
int caretPosition = Integer.parseInt(caret);
int lineNumber = 0;
try {
lineNumber = JSONtextArea.getLineOfOffset(caretPosition);
} catch (BadLocationException e2) {
// TODO Auto-generated catch block
e2.printStackTrace();
}
try {
JSONtextArea.getHighlighter().addHighlight(lineNumber, caretPosition + 1, cyanPainter);
} catch (BadLocationException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
}
}
});
}
The "addHighlight" method works with a start range, end range and a color, which didn't become apparent to me immediately, thinking I had to get the reference line based on the column number. Some split functions to extract the numbers, I assigned 11 (in screenshot) to a caret value, not realizing that it only counts character positions from the beginning of the string and represents the end point of the range.
For reference, this is the class that does the work behind the scenes, and the error handling at the bottom is about extracting the line and column numbers. For the record, "x" is the error message that would generate out of an invalid file.
package parsingJSON;
import java.io.IOException;
import com.fasterxml.jackson.core.JsonParseException;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
public class functionsClass extends JSONTextCompare {
public boolean validate_JSON_text(String JSONcontent, Results obj) {
boolean valid = false;
try {
ObjectMapper objMapper = new ObjectMapper();
JsonNode validation = objMapper.readTree(JSONcontent);
valid = true;
}
catch (JsonParseException jpe){
String x = jpe.getMessage();
printTextArea(x, obj);
//return part_3;
}
catch (IOException ioe) {
String x = ioe.getMessage();
printTextArea(x, obj);
//return part_3;
}
return valid;
}
public void printTextArea(String x, Results obj) {
// TODO Auto-generated method stub
System.out.println(x);
String err = x.substring(x.lastIndexOf("\n"));
String parts[] = err.split(";");
//String part 1 is the discarded leading edge that is the closing brackets of the JSON content
String part_2 = parts[1];
//split again to get rid of the closing square bracket
String parts2[] = part_2.split("]");
String part_3 = parts2[0];
//JSONTextCompare feedback = new JSONTextCompare();
//split the output to get the exact location of the error to communicate back and highlight it in the JSONTextCompare class
//first need to get the line number from the output
String[] parts_lineNum = part_3.split("line: ");
String[] parts_lineNum_final = parts_lineNum[1].split(", column:");
String lineNum = parts_lineNum_final[0];
String[] parts_caret = part_3.split("column: ");
String caret = parts_caret[1];
System.out.println(caret);
obj.setLineNum(lineNum);
obj.setCaret(caret);
obj.setResult(part_3);
System.out.println(part_3);
}
}
Screenshot for what the interface currently looks like:
Long story short - how do I turn the coordinates Line 4, Col 11 into a caret value (e.g. it's value 189, for the sake of argument) that I can use to get the highlighter to work properly. Some kind of custom parsing formula might be possible, but in general, is that even possible to do?
how do I turn the coordinates Line 4, Col 11 into a caret value (e.g. it's value 189,
Check out: Text Utilities for methods that might be helpful when working with text components. It has methods like:
centerLineInScrollPane
getColumnAtCaret
getLineAtCaret
getLines
gotoStartOfLine
gotoFirstWordOnLine
getWrappedLines
In particular the gotoStartOfLine() method contains code you can modify to get the offset of the specified row/column.offset.
The basic code would be:
int line = 4;
int column = 11;
Element root = textArea.getDocument().getDefaultRootElement();
int offset = root.getElement( line - 1 ).getStartOffset() + column;
System.out.println(offset);
The way it works is essentially counting the number of characters in each line, up until the line in which the error is occurring, and adding the caretPosition to that sum of characters, which is what the Highlighter needs to apply the marking to the correct location.
I've added the code for the Validate button for context.
functionsClass ops = new functionsClass();
String JSONcontent = JSONtextArea.getText();
Results obj = new Results();
ops.validate_JSON_text(JSONcontent, obj);
String result = obj.getResult();
String caret = obj.getCaret();
String lineNum = obj.getLineNum();
//showStatus(result);
if(result==null) {
textAreaError.setText("JSON code is valid!");
} else {
textAreaError.setText(result);
Highlighter.HighlightPainter cyanPainter;
cyanPainter = new DefaultHighlighter.DefaultHighlightPainter(Color.cyan);
//the column number as per the location of the error
int caretPosition = Integer.parseInt(caret); //JSONtextArea.getCaretPosition();
//the line number as per the location of the error
int lineNumber = Integer.parseInt(lineNum);
//get the number of characters in the string up to the line in which the error is found
int totalChars = 0;
int counter = 0; //used to only go to the line above where the error is located
String[] lines = JSONcontent.split("\\r?\\n");
for (String line : lines) {
counter = counter + 1;
//as long as we're above the line of the error (lineNumber variable), keep counting characters
if (counter < lineNumber)
{
totalChars = totalChars + line.length();
}
//if we are at the line that contains the error, only add the caretPosition value to get the final position where the highlighting should go
if (counter == lineNumber)
{
totalChars = totalChars + caretPosition;
break;
}
}
//put down the highlighting in the area where the JSON file is having a problem
try {
JSONtextArea.getHighlighter().addHighlight(totalChars - 2, totalChars + 2, cyanPainter);
} catch (BadLocationException e1) {
// TODO Auto-generated catch block
e1.getMessage();
}
}
The contents of the JSON file is treated as a string, and that's why I'm also iterating through it in that fashion. There are certainly better ways to go through lines in the string, and I'll add some reference topics on SO:
What is the easiest/best/most correct way to iterate through the characters of a string in Java? - Link
Check if a string contains \n - Link
Split Java String by New Line - Link
What is the best way to iterate over the lines of a Java String? - Link
Generally a combination of these led to this solution, and I am also not targeting it for use on very large JSON files.
A screenshot of the output, with the interface highlighting the same area that Notepad++ would complain about, if it could debug code:
I'll post the project on GitHub after I clean it up and comment it some, and will give a link to that later, but for now, hopefully this helps the next dev in a similar situation.
What I want to do in this code: When the search button is clicked it will read a file then match the search values with the data inside the file & will show the search result in the jTable.
Problems I am facing: If GPA is selected A+ then it shows A+, A- both & when I press the search button again after giving another search value, the table just adds more data in it.
Solutions needed: I want to just read the file and show only the results in the jTable, not adding the results again & again. The search button should do search in the GPA & Class columns only. & when GPA is selected "A/B/C+" or "-" the search result should give only the data containing that particular GPA.
NOTE: I don't want to change the search options.
I m a total newbie in JAVA. So any kind of help would be appreciated! :)
Screenshot of the UI
private void srchBtnActionPerformed(java.awt.event.ActionEvent evt) {
//file read
String filepath = "E:\\Netbeans workspace\\modified\\Project\\Info.txt";
File file = new File(filepath);
try {
BufferedReader br = new BufferedReader(new FileReader(file));
model = (DefaultTableModel)jTable1.getModel();
Object[] tableLines = br.lines().toArray();
for (int i = 0; i < tableLines.length; i++){
String line = tableLines[i].toString().trim();
String[] dataRow = line.split("/");
model.addRow(dataRow);
}
} catch (Exception ex) {
Logger.getLogger(ReceiverF.class.getName()).log(Level.SEVERE, null, ex);
}
//search from file
String bGroupSrch = (String) jComboBoxBGroup.getSelectedItem();
if(positiveRBtn.isSelected())
bGroupSrch = bGroupSrch + "+";
else if(negativeRBtn.isSelected())
bGroupSrch = bGroupSrch + "-";
String areaSrch = (String)jComboBoxArea.getSelectedItem();
if (bgGroup.getSelection() != null) {
filter(bGroupSrch);
filter(areaSrch);
} else {
SrchEMsg sem = new SrchEMsg(this);
sem.setVisible(true);
sem.setDefaultCloseOperation(JDialog.DISPOSE_ON_CLOSE);
}
}
//Filter Method
private void filter(String query){
TableRowSorter<DefaultTableModel> tr= new TableRowSorter<DefaultTableModel>(model);
jTable1.setRowSorter(tr);
tr.setRowFilter(RowFilter.regexFilter(query));
}
the table just adds more data in it.
When you start the search you do:
model.setRowCount(0);
to clear the data in the table model of the table.
Or the easier solution is to NOT reload the data all the time. Instead you just change the filter that is used by the table.
Read the section from the Swing tutorial on Sorting and Filtering. The code there replaces the filter every time a character is typed.
Your code will change the filter when the search option is changed.
I am new to java. I am trying to iterate over a couple of .txt files to compare one line of the file to every line of the second file. these are my two files: listread.txt and csvread.txt.
Here is the code I am using:
try {
BufferedReader csvReader = new BufferedReader(new FileReader("/data/csvread.txt"));
BufferedReader listReader = new BufferedReader(new FileReader("/data/list.txt"));
String csvItem, listItem;
int count =0;
while((csvItem = csvReader.readLine()) != null){
System.out.println("before second loop:"+csvItem);
while ((listItem = listReader.readLine())!= null) {
System.out.println("list Item: "+listItem.toLowerCase().split("¬")[1]);
System.out.println("csv Item: "+csvItem.toLowerCase());
if(listItem.toLowerCase().split("¬")[1].contains(csvItem.toLowerCase())){
count++;
}
}
}
}catch(Exception e){
e.printStackTrace();
}
When I run this, only the first line in the csvread.txt (which is stored in the variable csvItem) is being compared to each of all the lines in listread.txt. Here is an example output:
before second loop:Record Category
list item: provisions
csv Item: record category
list item: request category
csv Item: record category
list item: elevator
csv Item: record category
list item: assessment
csv Item: record category
list item: associates
csv Item: record category
list item: score
csv Item: record category
list item: attachments
csv Item: record category
It only iterates over all the lines of list.txt file with the first line of the csvread.txt file. Doesn't move on to the second line in the csvread.txt, and the program ends throwing an error in the last:
java.lang.ArrayIndexOutOfBoundsException: 1
at test.main(test.java:52)
Which refers to the line System.out.println("list item: "+listItem.toLowerCase().split("¬")[1]);. This statement has nothing to do with the iterations I guess. Not sure why this error is thrown..
However, When I comment out the second for loop, it runs fine iterating over all the lines in the csvread.txt file. Here's is a sample output with just the first while loop and the second loop commented out:
before second loop:Record Category
before second loop:Type
before second loop:Name
before second loop:State
before second loop:Number
before second loop:ID (Self)
before second loop:Parent
before second loop:Title
This issue is occurring only when there is a nested loop. when there is a single loop, there is no problem at all. can somebody shed some light on this strange behavior? Also how do I overcome it?
EDIT:
I've added an if condition to check if the line contains the ¬ befor I split the line on that character:
if(listItem.contains("¬")){
System.out.println("list item: "+listItem.toLowerCase().split("¬")[1]);
System.out.println("csv Item: "+csvItem.toLowerCase());
if(listItem.toLowerCase().split("¬")[1].contains(csvItem.toLowerCase())){
count++;
}
}
No I don't get the exception anymore. However, The behavior is still strange. Here's the output after adding the if:
before second loop:Record Category
list item: provisions
csv Item: record category
list item: request category
csv Item: record category
list item: elevator
csv Item: record category
list item: assessment
csv Item: record category
list item: associates
csv Item: record category
list item: score
csv Item: record category
list item: attachments
csv Item: record category
before second loop:Type
before second loop:Name
before second loop:State
before second loop:Number
before second loop:ID (Self)
before second loop:Parent
before second loop:Title
The other elements are now being iterated over in the csvread.txt but the comparison with the lines in listread.txt is not hapeening except for the first element.
Any help would be appreciated. Thank you!
Expanded from my comment about listReader pointing to the end of the file after the first iteration. BufferedReader doesn't provide a mechanism to move the file pointer so a simple approach would be to move the creation of listReader to inside the outer loop:
try {
BufferedReader csvReader = new BufferedReader(new FileReader("/data/csvread.txt"));
// BufferedReader listReader = new BufferedReader(new FileReader("/data/list.txt"));
String csvItem, listItem;
int count =0;
while((csvItem = csvReader.readLine()) != null){
System.out.println("before second loop:"+csvItem);
BufferedReader listReader = new BufferedReader(new FileReader("/data/list.txt"));
while ((listItem = listReader.readLine())!= null) {
System.out.println("list Item: "+listItem.toLowerCase().split("¬")[1]);
System.out.println("csv Item: "+csvItem.toLowerCase());
if(listItem.toLowerCase().split("¬")[1].contains(csvItem.toLowerCase())){
count++;
}
}
}
}catch(Exception e){
e.printStackTrace();
}
so each iteration will have a new listReader which starts at the top of the file.
But that might be too much I/O. If the size of list.txt isn't too big, then perhaps read it once, parse it, and store in a Set<String> for later comparison:
try (BufferedReader listReader = new BufferedReader(new FileReader("/data/list.txt"));
BufferedReader csvReader = new BufferedReader(new FileReader("/data/csvread.txt"))) {
String listItem = null;
Set<String> listItems = new HashSet<>();
while ((listItem = listReader.readLine()) != null) {
listItems.add(listItem.toLowerCase().split("¬")[1]);
}
String csvItem;
int count = 0;
while ((csvItem = csvReader.readLine()) != null) {
System.out.println("before second loop:" + csvItem);
for (String item : listItems) {
System.out.println("list Item: " + item);
System.out.println("csv Item: " + csvItem.toLowerCase());
if (item.contains(csvItem.toLowerCase())) {
count++;
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
Also moved to try-with-resources to make sure csvReader and listReader are properly closed.
Your access to
listItem.toLowerCase().split("¬")[1]
is critical, since you always expect that all lines have your "¬". If this is not the case your split will not return a array and you access the returend array at position [1] which fails and returns the IndexOutOfBounds....
When you use nested loops, the inner loops get executed fully. Then the execution control comes out of the inner loop and starts the next iteration of the outer loop. Hence, if you want to compare the content of the two files line-by-line, you should not have any inner loop. Below is the sample code that you may try to do in this case. Though, I have not tested it.
try {
BufferedReader csvReader = new BufferedReader(new FileReader("/data/csvread.txt"));
BufferedReader listReader = new BufferedReader(new FileReader("/data/list.txt"));
String csvItem, listItem;
int count =0;
while((csvItem = csvReader.readLine()) != null){
System.out.println("before second loop:"+csvItem);
listItem = listReader.readLine();
if (listItem != null){
if(listItem.toLowerCase().split("¬")[1].contains(csvItem.toLowerCase())){
count++;
}
}else{
//The listItem has no more lines to compare, so ending the process.
break;
}
}
}catch(Exception e){
e.printStackTrace();
}
I hope this helps.
Note: The above answer was given with a belief that the requirement was to compare the contents of two files line-by-line.
I have two csv files. One Master CSV File around 500000 records. Another DailyCSV file has 50000 Records.
The DailyCSV files misses few columns which has to be fetched from Master CSV File.
For example
DailyCSV File
id,name,city,zip,occupation
1,Jhon,Florida,50069,Accountant
MasterCSV File
id,name,city,zip,occupation,company,exp,salary
1, Jhon, Florida, 50069, Accountant, AuditFirm, 3, $5000
What I have to do is, read both files, match the records with ID, if ID is present in the master file, then i have to fetch company, exp, salary and write it to a new csv file.
How to achieve this.??
What I have done Currently
while (true) {
line = bstream.readLine();
lineMaster = bstreamMaster.readLine();
if (line == null || lineMaster == null)
{
break;
}
else
{
while(lineMaster != null)
readlineSplit = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)", -1);
String splitId = readlineSplit[4];
String[] readLineSplitMaster =lineMaster.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)", -1);
String SplitIDMaster = readLineSplitMaster[13];
System.out.println(splitId + "|" + SplitIDMaster);
//System.out.println(splitId.equalsIgnoreCase(SplitIDMaster));
if (splitId.equalsIgnoreCase(SplitIDMaster)) {
String writeLine = readlineSplit[0] + "," + readlineSplit[1] + "," + readlineSplit[2] + "," + readlineSplit[3] + "," + readlineSplit[4] + "," + readlineSplit[5] + "," + readLineSplitMaster[15]+ "," + readLineSplitMaster[16] + "," + readLineSplitMaster[17];
System.out.println(writeLine);
pstream.print(writeLine + "\r\n");
}
}
}pstream.close();
fout.flush();
bstream.close();
bstreamMaster.close();
First of all, your current parsing approach will be painfully slow. Use a CSV parsing library dedicated for that to speed things up. With uniVocity-parsers you can process your 500K records in less than a second. This is how you can use it to solve your problem:
First let's define a few utility methods to read/write your files:
//opens the file for reading (using UTF-8 encoding)
private static Reader newReader(String pathToFile) {
try {
return new InputStreamReader(new FileInputStream(new File(pathToFile)), "UTF-8");
} catch (Exception e) {
throw new IllegalArgumentException("Unable to open file for reading at " + pathToFile, e);
}
}
//creates a file for writing (using UTF-8 encoding)
private static Writer newWriter(String pathToFile) {
try {
return new OutputStreamWriter(new FileOutputStream(new File(pathToFile)), "UTF-8");
} catch (Exception e) {
throw new IllegalArgumentException("Unable to open file for writing at " + pathToFile, e);
}
}
Then, we can start reading your daily CSV file, and generate a Map:
public static void main(String... args){
//First we parse the daily update file.
CsvParserSettings settings = new CsvParserSettings();
//here we tell the parser to read the CSV headers
settings.setHeaderExtractionEnabled(true);
//and to select ONLY the following columns.
//This ensures rows with a fixed size will be returned in case some records come with less or more columns than anticipated.
settings.selectFields("id", "name", "city", "zip", "occupation");
CsvParser parser = new CsvParser(settings);
//Here we parse all data into a list.
List<String[]> dailyRecords = parser.parseAll(newReader("/path/to/daily.csv"));
//And convert them to a map. ID's are the keys.
Map<String, String[]> mapOfDailyRecords = toMap(dailyRecords);
... //we'll get back here in a second.
This is the code to generate a Map from the list of daily records:
/* Converts a list of records to a map. Uses element at index 0 as the key */
private static Map<String, String[]> toMap(List<String[]> records) {
HashMap<String, String[]> map = new HashMap<String, String[]>();
for (String[] row : records) {
//column 0 will always have an ID.
map.put(row[0], row);
}
return map;
}
With the map of records, we can process your master file and generate the list of updates:
private static List<Object[]> processMasterFile(final Map<String, String[]> mapOfDailyRecords) {
//we'll put the updated data here
final List<Object[]> output = new ArrayList<Object[]>();
//configures the parser to process only the columns you are interested in.
CsvParserSettings settings = new CsvParserSettings();
settings.setHeaderExtractionEnabled(true);
settings.selectFields("id", "company", "exp", "salary");
//All parsed rows will be submitted to the following RowProcessor. This way the bigger Master file won't
//have all its rows stored in memory.
settings.setRowProcessor(new AbstractRowProcessor() {
#Override
public void rowProcessed(String[] row, ParsingContext context) {
// Incoming rows from MASTER will have the ID as index 0.
// If the daily update map contains the ID, we'll get the daily row
String[] dailyData = mapOfDailyRecords.get(row[0]);
if (dailyData != null) {
//We got a match. Let's join the data from the daily row with the master row.
Object[] mergedRow = new Object[8];
for (int i = 0; i < dailyData.length; i++) {
mergedRow[i] = dailyData[i];
}
for (int i = 1; i < row.length; i++) { //starts from 1 to skip the ID at index 0
mergedRow[i + dailyData.length - 1] = row[i];
}
output.add(mergedRow);
}
}
});
CsvParser parser = new CsvParser(settings);
//the parse() method will submit all rows to the RowProcessor defined above.
parser.parse(newReader("/path/to/master.csv"));
return output;
}
Finally, we can get the merged data and write everything to another file:
... // getting back to the main method here
//Now we process the master data and get a list of updates
List<Object[]> updatedData = processMasterFile(mapOfDailyRecords);
//And write the updated data to another file
CsvWriterSettings writerSettings = new CsvWriterSettings();
writerSettings.setHeaders("id", "name", "city", "zip", "occupation", "company", "exp", "salary");
writerSettings.setHeaderWritingEnabled(true);
CsvWriter writer = new CsvWriter(newWriter("/path/to/updates.csv"), writerSettings);
//Here we write everything, and get the job done.
writer.writeRowsAndClose(updatedData);
}
This should work like a charm. Hope it helps.
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).
I will approach the problem in a step by step manner.
First I will parse/read the master CSV file and keep its content into a hashmap, where the key will be each record's unique 'id' as for the value maybe you can store them in a hash or simply create a java class to store the information.
Example of hash:
{
'1' : { 'name': 'Jhon',
'City': 'Florida',
'zip' : 50069,
....
}
}
Next, read your comparer csv file. For each row, read the 'id' and check if the key exists on the hashmap you have created earlier.
if it exists, then from the hashmap access the information you need and write to a new CSV file.
Also, you might want to consider using a 3rd party CSV parser to make this task easier.
If you have maven maybe you can follow this example I found on net. Otherwise you can just google for apache 'csv parser' example on the internet.
http://examples.javacodegeeks.com/core-java/apache/commons/csv-commons/writeread-csv-files-with-apache-commons-csv-example/
Suppose csv file contains
1,112,,ASIF
Following code eliminates the null value in between two consecutive commas.
Code provided is more than it is required
String p1=null, p2=null;
while ((lineData = Buffreadr.readLine()) != null)
{
row = new Vector(); int i=0;
StringTokenizer st = new StringTokenizer(lineData, ",");
while(st.hasMoreTokens())
{
row.addElement(st.nextElement());
if (row.get(i).toString().startsWith("\"")==true)
{
while(row.get(i).toString().endsWith("\"")==false)
{
p1= row.get(i).toString();
p2= st.nextElement().toString();
row.set(i,p1+", "+p2);
}
String CellValue= row.get(i).toString();
CellValue= CellValue.substring(1, CellValue.length() - 1);
row.set(i,CellValue);
//System.out.println(" Final Cell Value : "+row.get(i).toString());
}
eror=row.get(i).toString();
try
{
eror=eror.replace('\'',' ');
eror=eror.replace('[' , ' ');
eror=eror.replace(']' , ' ');
//System.out.println("Error "+ eror);
row.remove(i);
row.insertElementAt(eror, i);
}
catch (Exception e)
{
System.out.println("Error exception "+ eror);
}
//}
i++;
}
how to read two consecutive commas from .csv file format as unique value in java.
Here is an example of doing this by splitting to String array. Changed lines are marked as comments.
// Start of your code.
row = new Vector(); int i=0;
String[] st = lineData.split(","); // Changed
for (String s : st) { // Changed
row.addElement(s); // Changed
if (row.get(i).toString().startsWith("\"") == true) {
while (row.get(i).toString().endsWith("\"") == false) {
p1 = row.get(i).toString();
p2 = s.toString(); // Changed
row.set(i, p1 + ", " + p2);
}
...// Rest of Code here
}
The StringTokenizer skpis empty tokens. This is their behavious. From the JLS
StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.
Just use String.split(",") and you are done.
Just read the whole line into a string then do string.split(",").
The resulting array should have exactly what you are looking for...
If you need to check for "escaped" commas then you will need some regex for the query instead of a simple ",".
while ((lineData = Buffreadr.readLine()) != null) {
String[] row = line.split(",");
// Now process the array however you like, each cell in the csv is one entry in the array