Using I/O stream to parse CSV file - java

I have a CSV file of US population data for every county in the US. I need to get each population from the 8th column of the file. I'm using a fileReader() and bufferedStream() and not sure how to use the split method to accomplish this. I know this isn't much information but I know that I'll be using my args[0] as the destination in my class.
I'm at a loss to where to being to be honest.
import java.io.FileReader;
public class Main {
public static void main(String[] args) {
BufferedReader() buff = new BufferedReader(new FileReader(args[0]));
String
}
try {
}
}
The output should be an integer of the total US population. Any help with pointing me in the right direction would be great.

Don't reinvent the wheel, don't parse CSV yourself: use a library. Even such a simple format as CSV has nuances: fields can be escaped with quotes or unescaped, the file can have or have not a header and so on. Besides that you have to test and maintain the code you've wrote. So writing less code and reusing libraries is good.
There are a plenty of libraries for CSV in Java:
Apache Commons CSV
OpenCSV
Super CSV
Univocity
flatpack
IMHO, the first two are the most popular.
Here is an example for Apache Commons CSV:
final Reader in = new FileReader("counties.csv");
final Iterable<CSVRecord> records = CSVFormat.DEFAULT.parse(in);
for (final CSVRecord record : records) { // Simply iterate over the records via foreach loop. All the parsing is handler for you
String populationString = record.get(7); // Indexes are zero-based
String populationString = record.get("population"); // Or, if your file has headers, you can just use them
… // Do whatever you want with the population
}
Look how easy it is! And it will be similar with other parsers.

Related

Parsing a text file using java with multiple values per line to be extracted

I'm not going to lie I'm really bad at making regular expressions. I'm currently trying to parse a text file that is giving me a lot of issues. The goal is to extract the data between their respective "tags/titles". The file in question is a .qbo file laid out as follows personal information replaced with "DATA": The parts that I care about retrieving are between the "STMTTRM" and "/STMTTRM" tags as the rest I don't plan on putting in my database, but I figured it would help others see the file content I'm working with. I apologize for any confusion prior to this update.
FXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:NONE
<OFX>
<SIGNONMSGSRSV1><SONRS>
<STATUS><CODE>0</CODE><SEVERITY>INFO</SEVERITY></STATUS>
<DTSERVER>20190917133617.000[-4:EDT]</DTSERVER>
<LANGUAGE>ENG</LANGUAGE>
<FI>
<ORG>DATA</ORG>
<FID>DATA</FID>
</FI>
<INTU.BID>DATA</INTU.BID>
<INTU.USERID>DATA</INTU.USERID>
</SONRS></SIGNONMSGSRSV1>
<BANKMSGSRSV1>
<STMTTRNRS>
<TRNUID>0</TRNUID>
<STATUS><CODE>0</CODE><SEVERITY>INFO</SEVERITY></STATUS>
<STMTRS>
<CURDEF>USD</CURDEF>
<BANKACCTFROM>
<BANKID>DATA</BANKID>
<ACCTID>DATA</ACCTID>
<ACCTTYPE>CHECKING</ACCTTYPE>
<NICKNAME>FREEDOM CHECKING</NICKNAME>
</BANKACCTFROM>
<BANKTRANLIST>
<DTSTART>20190717</DTSTART><DTEND>20190917</DTEND>
<STMTTRN><TRNTYPE>POS</TRNTYPE><DTPOSTED>20190717071500</DTPOSTED><TRNAMT>-5.81</TRNAMT><FITID>3893120190717WO</FITID><NAME>DATA</NAME><MEMO>POS Withdrawal</MEMO></STMTTRN>
<STMTTRN><TRNTYPE>DIRECTDEBIT</TRNTYPE><DTPOSTED>20190717085000</DTPOSTED><TRNAMT>-728.11</TRNAMT><FITID>4649920190717WE</FITID><NAME>CHASE CREDIT CRD</NAME><MEMO>DATA</MEMO></STMTTRN>
<STMTTRN><TRNTYPE>ATM</TRNTYPE><DTPOSTED>20190717160900</DTPOSTED><TRNAMT>-201.99</TRNAMT><FITID>6674020190717WA</FITID><NAME>DATA</NAME><MEMO>ATM Withdrawal</MEMO></STMTTRN>
</BANKTRANLIST>
<LEDGERBAL><BALAMT>2024.16</BALAMT><DTASOF>20190917133617.000[-4:EDT]</DTASOF></LEDGERBAL>
<AVAILBAL><BALAMT>2020.66</BALAMT><DTASOF>20190917133617.000[-4:EDT]</DTASOF></AVAILBAL>
</STMTRS>
</STMTTRNRS>
</BANKMSGSRSV1>
</OFX>
I want to be able to end with data that looks or acts like the following so that each row of data can easily be added to a database:
Example Parse
As David has already answered, It is good to parse the POS output XML using Java. If you are more interested about about regex to get all the information, you can use this regular expression.
<[^>]+>|\\n+
You can test in the following sites.
https://rubular.com/
https://www.regextester.com/
Given this is XML, I would do one of two things:
either use the Java DOM objects to marshall/unmarshall to/from Java objects (nodes and elements), or
use JAXB to achieve something similar but with better POJO representation.
Mkyong has tutorials for both. Try the dom parsing or jaxb. His tutorials are simple and easy to follow.
JAXB requires more work and dependencies. So try DOM first.
I would propose the following approach.
Read file line by line with Files:
final List<String> lines = Files.readAllLines(Paths.get("/path/to/file"));
At this point you would have all file line separated and ready to convert the string lines into something more useful. But you should create class beforehand.
Create a class for your data in line, something like:
public class STMTTRN {
private String TRNTYPE;
private String DTPOSTED;
...
...
//constructors
//getters and setters
}
Now when you have a data in each separate string and a class to hold the data, you can convert lines to objects with Jackson:
final XmlMapper xmlMapper = new XmlMapper();
final STMTTRN stmttrn = xmlMapper.readValue(lines[0], STMTTRN.class);
You may want to create a loop or make use of stream with a mapper and a collector to get the list of STMTTRN objects:
final List<STMTTRN> stmttrnData = lines.stream().map(this::mapLine).collect(Collectors.toList());
Where the mapper might be:
private STMTTRN mapLine(final String line) {
final XmlMapper xmlMapper = new XmlMapper();
try {
return xmlMapper.readValue(line, STMTTRN.class);
} catch (IOException e) {
throw new RuntimeException(e);
}
}

Parsing entire csv file vs parsing line by line in java

I have somewhat of a larger csv file approximately 80K to 120K rows (depending on the day). I'm successfully running the code which parses the entire csv file into a java object using #CsvBindByName annotation. Sample code:
Reader reader = Files.newBufferedReader(Paths.get(file));
CsvToBean csvToBean = new CsvToBeanBuilder<Object>(reader)
.withType(MyCustomClass.class)
.withIgnoreLeadingWhiteSpace(true)
.build();
List<MyCustomClass> myCustomClass= csvToBean.parse();`
I want to change this code to parse the csv file line by line instead of entire file but retain the neatness of mapping to java bean object. Essentially something like this:
CSVReader csvReader = new CSVReader(Files.newBufferedReader(Paths.get(csvFileLoc)));
String[] headerRow = csvReader.readNext(); // save the headerRow
String [] nextLine = null;
MyCustomClass myCustomClass = new MyCustomClass();
while ((nextLine = csvReader.readNext())!=null) {
myCustomClass.setField1(nextLine[0]);
myCustomClass.setField2(nextLine[1]);
//.... so on
}
But the above solution ties me to knowing the column positions for each field. What I would like is to map the string array I get from csv based on the header row similar to what opencsv does while parsing the entire csv file. However, I am not able to do that using opencsv, as far as I can tell. I had assumed this would be a pretty common practice but I am unable to find any references to this online. It could be that I am not understanding the CsvToBean usage correctly for opencsv library. I could use csvToBean.iterator to iterate over the beans but I think entire csv file is loaded in memory with the build method, which kind of defeats the purpose of reading line by line. Any suggestions welcome
Looking at the API docs further, I see that CsvToBean<T> implements Iterable<T> and has an iterator() method that returns an Iterator<T> that is documented as follows:
The iterator returned by this method takes one line of input at a time and returns one bean at a time.
So it looks like you could just write your loop as:
for (MyCustomClass myCustomClass : csvToBean) {
// . . . do something with the bean . . .
}
Just to clear up some potential confusion, you can see in the source code that the build() method of CsvToBeanBuilder just creates the CsvToBean object, and doesn't do the actual input, and that the parse() method and the iterator of the CsvToBean object each do perform input.

How to handle importing a CSV file with differing column lengths

Im working on a project for school and am having a really hard time figuring out how to import and format a CSV file into a usable format. The CSV contains a movie name in the first column, and showtimes in the rows, so it would look something like this.
movie1, 7pm, 8pm, 9pm, 10pm
movie2, 5pm, 8pm
movie3, 3pm, 7pm, 10pm
I think I want to split each row into its own array, maybe an arraylist of the arrays? I really dont know where to even start so any pointers would be appreciated.
Preferably dont want to use any external libraries.
I would go with a Map having movie name as key and timings as list like the one below:
Map<String, List<String>> movieTimings = new HashMap<>();
It will read through csv file and put the values into this map. If the key already exists then we just need to add the value into the list. You can use computeIfAbsent method of Map (Java 8) to see whether the entry exists or not, e.g.:
public static void main(String[] args) {
Map<String, List<String>> movieTimings = new HashMap<>();
String timing = "7pm";//It will be read from csv
movieTimings.computeIfAbsent("test", s -> new ArrayList<>()).add(timing);
System.out.println(movieTimings);
}
This will populate your map once the file is read. As far as reading of file is concerned, you can use BuffferedReader or OpenCSV (if your project allows you to use third party libraries).
I have no affiliation with Univocity - but their Java CSV parser is amazing and free. When I had a question, one of the developers got back to me immediately. http://www.univocity.com/pages/about-parsers
You read in a line and then cycle through the fields. Since you know the movie name is always there and at least one movie time, you can set it up any way you like including an arraylist of arraylists (so both are variable length arrays).
It works well with our without quotes around the fields (necessary when there are apostrophes or commas in the movie names). In the problem I solved, all rows had the same number of columns, but I did not know the number of columns before I parsed the file and each file often had a different number of columns and column names and it worked perfectly.
You can use opencsv to read the CSV file and add each String[] to an ArrayList. There are examples in the FAQ section of opencsv's website.
Edit: If you don't want to use external libraries you can read the CSV using a BufferedReader and split the lines by commas.
BufferedReader br = null;
try{
List<String[]> data = new ArrayList<String[]>();
br = new BufferedReader(new FileReader(new File("csvfile")));
String line;
while((line = br.readLine()) != null){
String[] lineData = line.split(",");
data.add(lineData);
}
}catch(Exception e){
e.printStackTrace();
}finally{
if(br != null) try{ br.close(); } catch(Exception e){}
}

How to convert exponents in a csv file from Java

I am printing some data into a CSV file using Apache Commns CSV. One of the fields contains 15 digit number and is of type String. This field prints as exponential number in CSV instead of a complete number. I know Excel does that but is there a way in java to print it as a complete number.
I am not doing anything special. Initially I thought that Commons CSV will take care of it.
public void createCSV(){
inputStream = new FileInputStream("fileName");
fileWriter = new FileWriter("fileName");
csvFileFormat = CSVFormat.Excel.withHeader("header1", "header2");
csvFilePrinter = new CSVPrinter(fileWriter, csvFileFormat);
for(List<UiIntegrationDTO dto: myList>){
String csvData = dto.getPolicyNumber();
csvFilePrinter.PrintRecord(csvData);
}
}
Prepend apostrophe
As far as I understand from the discussion in comments, it is a question about Excel interpretation of CSV file, but the file itself contains all necessary data.
I think, csvFilePrinter.PrintRecord("'" + csvData); should help. Apostrophe requires Excel to interpret a field as a string, not as a number.

JSon to CSV with Java using CDL: possible to replace comma-sep. by semi-colum sep. values?

Everything is in the title :)
I'm using org.json.CDL to convert JSONArray into CSV data but it renders a string with ',' as separator.
I'd like to know if it's possible to replace with ';' ?
Here is a simple example of what i'm doing:
public String exportAsCsv() throws Exception {
return CDL.toString(
new JSONArray(
mapper.writeValueAsString(extractAccounts()))
);
}
Thanks in advance for any advice on that question.
Edit: No replacement solution of course, as this could have impact for large data, and of course the library used enable me to specify the field separator.
Edit2: Finally the solution to extract data as JSONArray (and String...) was not very good, especially for large data file.
So i made the following changes:
use a Java CSV library (for example: http://www.csvreader.com/java_csv_samples.php)
refactor code to stream data from json input source to csv output source
This is nicer for large data treatment. If you have comments do not hesitate.
String output = "Hello,This,is,separated,by,a,comma";
// Simple call the replaceAll method.
output = output.replace(',',';');
I found this in the String documentation.
Example
String value = "Hello,tthis,is,a,string";
value = value.replace(',', ';');
System.out.println(value);
// Outputs: Hello;tthis;is;a;string

Categories