Create CSV file with columns and values from HashMap - java

Be gentle,
This is my first time using Apache Commons CSV 1.7.
I am creating a service to process some CSV inputs,
add some additional information from exterior sources,
then write out this CSV for ingestion into another system.
I store the information that I have gathered into a list of
HashMap<String, String> for each row of the final output csv.
The Hashmap contains the <ColumnName, Value for column>.
I have issues using the CSVPrinter to correctly assign the values of the HashMaps into the rows.
I can concatenate the values into a string with commas between the variables;
however,
this just inserts the whole string into the first column.
I cannot define or hardcode the headers since they are obtained from a config file and may change depending on which project uses the service.
Here is some of my code:
try (BufferedWriter writer = Files.newBufferedWriter(
Paths.get(OUTPUT + "/" + project + "/" + project + ".csv"));)
{
CSVPrinter csvPrinter = new CSVPrinter(writer,
CSVFormat.RFC4180.withFirstRecordAsHeader());
csvPrinter.printRecord(columnList);
for (HashMap<String, String> row : rowCollection)
{
//Need to map __record__ to column -> row.key, value -> row.value for whole map.
csvPrinter.printrecord(__record__);
}
csvPrinter.flush();
}
Thanks for your assistance.

You actually have multiple concerns with your technique;
How do you maintain column order?
How do you print the column names?
How do you print the column values?
Here are my suggestions.
Maintain column order.
Do not use HashMap,
because it is unordered.
Instead,
use LinkedHashMap which has a "predictable iteration order"
(i.e. maintains order).
Print column names.
Every row in your list contains the column names in the form of key values,
but you only print the column names as the first row of output.
The solution is to print the column names before you loop through the rows.
Get them from the first element of the list.
Print column values.
The "billal GHILAS" answer demonstrates a way to print the values of each row.
Here is some code:
try (BufferedWriter writer = Files.newBufferedWriter(
Paths.get(OUTPUT + "/" + project + "/" + project + ".csv"));)
{
CSVPrinter csvPrinter = new CSVPrinter(writer,
CSVFormat.RFC4180.withFirstRecordAsHeader());
// This assumes that the rowCollection will never be empty.
// An anonymous scope block just to limit the scope of the variable names.
{
HashMap<String, String> firstRow = rowCollection.get(0);
int valueIndex = 0;
String[] valueArray = new String[firstRow.size()];
for (String currentValue : firstRow.keySet())
{
valueArray[valueIndex++] = currentValue;
}
csvPrinter.printrecord(valueArray);
}
for (HashMap<String, String> row : rowCollection)
{
int valueIndex = 0;
String[] valueArray = new String[row.size()];
for (String currentValue : row.values())
{
valueArray[valueIndex++] = currentValue;
}
csvPrinter.printrecord(valueArray);
}
csvPrinter.flush();
}

for (HashMap<String,String> row : rowCollection) {
Object[] record = new Object[row.size()];
for (int i = 0; i < columnList.size(); i++) {
record[i] = row.get(columnList.get(i));
}
csvPrinter.printRecord(record);
}

Related

How to select random text value from specific row using java

I have three input fields.
First Name
Last item
Date Of Birth
I would like to get random data for each input from a property file.
This is how the property file looks. Field name and = should be ignored.
- First Name= Robert, Brian, Shawn, Bay, John, Paul
- Last Name= Jerry, Adam ,Lu , Eric
- Date of Birth= 01/12/12,12/10/12,1/2/17
Example: For First Name: File should randomly select one name from the following names
Robert, Brian, Shawn, Bay, John, Paul
Also I need to ignore anything before =
FileInputStream objfile = new FileInputStream(System.getProperty("user.dir "+path);
in = new BufferedReader(new InputStreamReader(objfile ));
String line = in.readLine();
while (line != null && !line.trim().isEmpty()) {
String eachRecord[]=line.trim().split(",");
Random rand = new Random();
//I need to pick first name randomly from the file from row 1.
send(firstName,(eachRecord[0]));
If you know that you're always going to have just those 3 lines in your property file I would get put each into a map with an index as the key then randomly generate a key in the range of the map.
// your code here to read the file in
HashMap<String, String> firstNameMap = new HashMap<String, String>();
HashMap<String, String> lastNameMap = new HashMap<String, String>();
HashMap<String, String> dobMap = new HashMap<String, String>();
String line;
while (line = in.readLine() != null) {
String[] parts = line.split("=");
if(parts[0].equals("First Name")) {
String[] values = lineParts[1].split(",");
for (int i = 0; i < values.length; ++i) {
firstNameMap.put(i, values[i]);
}
}
else if(parts[0].equals("Last Name")) {
// do the same as FN but for lastnamemap
}
else if(parts[0].equals("Date of Birth") {
// do the same as FN but for dobmap
}
}
// Now you can use the length of the map and a random number to get a value
// first name for instance:
int randomNum = ThreadLocalRandom.current().nextInt(0, firstNameMap.size(0 + 1);
System.out.println("First Name: " + firstNameMap.get(randomNum));
// and you would do the same for the other fields
The code can easily be refactored with some helper methods to make it cleaner, we'll leave that as a HW assignment :)
This way you have a cache of all your values that you can call at anytime and get a random value. I realize this isn't the most optimum solution having nested loops and 3 different maps but if your input file only contains 3 lines and you're not expecting to have millions of inputs it should be just fine.
Haven't programmed stuff like this in a long time.
Feel free to test it, and let me know if it works.
The result of this code should be a HashMap object called values
You can then get the specific fields you want from it, using get(field_name)
For example - values.get("First Name"). Make sure to use to correct case, because "first name" won't work.
If you want it all to be lower case, you can just add .toLowerCase() at the end of the line that puts the field and value into the HashMap
import java.lang.Math;
import java.util.HashMap;
public class Test
{
// arguments are passed using the text field below this editor
public static void main(String[] args)
{
// set the value of "in" here, so you actually read from it
HashMap<String, String> values = new HashMap<String, String>();
String line;
while (((line = in.readLine()) != null) && !line.trim().isEmpty()) {
if(!line.contains("=")) {
continue;
}
String[] lineParts = line.split("=");
String[] eachRecord = lineParts[1].split(",");
System.out.println("adding value of field type = " + lineParts[0].trim());
// now add the mapping to the values HashMap - values[field_name] = random_field_value
values.put(lineParts[0].trim(), eachRecord[(int) (Math.random() * eachRecord.length)].trim());
}
System.out.println("First Name = " + values.get("First Name"));
System.out.println("Last Name = " + values.get("Last Name"));
System.out.println("Date of Birth = " + values.get("Date of Birth"));
}
}

Compare Two CSV Files and Fetch Data

I have two csv files. One Master CSV File around 500000 records. Another DailyCSV file has 50000 Records.
The DailyCSV files misses few columns which has to be fetched from Master CSV File.
For example
DailyCSV File
id,name,city,zip,occupation
1,Jhon,Florida,50069,Accountant
MasterCSV File
id,name,city,zip,occupation,company,exp,salary
1, Jhon, Florida, 50069, Accountant, AuditFirm, 3, $5000
What I have to do is, read both files, match the records with ID, if ID is present in the master file, then i have to fetch company, exp, salary and write it to a new csv file.
How to achieve this.??
What I have done Currently
while (true) {
line = bstream.readLine();
lineMaster = bstreamMaster.readLine();
if (line == null || lineMaster == null)
{
break;
}
else
{
while(lineMaster != null)
readlineSplit = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)", -1);
String splitId = readlineSplit[4];
String[] readLineSplitMaster =lineMaster.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)", -1);
String SplitIDMaster = readLineSplitMaster[13];
System.out.println(splitId + "|" + SplitIDMaster);
//System.out.println(splitId.equalsIgnoreCase(SplitIDMaster));
if (splitId.equalsIgnoreCase(SplitIDMaster)) {
String writeLine = readlineSplit[0] + "," + readlineSplit[1] + "," + readlineSplit[2] + "," + readlineSplit[3] + "," + readlineSplit[4] + "," + readlineSplit[5] + "," + readLineSplitMaster[15]+ "," + readLineSplitMaster[16] + "," + readLineSplitMaster[17];
System.out.println(writeLine);
pstream.print(writeLine + "\r\n");
}
}
}pstream.close();
fout.flush();
bstream.close();
bstreamMaster.close();
First of all, your current parsing approach will be painfully slow. Use a CSV parsing library dedicated for that to speed things up. With uniVocity-parsers you can process your 500K records in less than a second. This is how you can use it to solve your problem:
First let's define a few utility methods to read/write your files:
//opens the file for reading (using UTF-8 encoding)
private static Reader newReader(String pathToFile) {
try {
return new InputStreamReader(new FileInputStream(new File(pathToFile)), "UTF-8");
} catch (Exception e) {
throw new IllegalArgumentException("Unable to open file for reading at " + pathToFile, e);
}
}
//creates a file for writing (using UTF-8 encoding)
private static Writer newWriter(String pathToFile) {
try {
return new OutputStreamWriter(new FileOutputStream(new File(pathToFile)), "UTF-8");
} catch (Exception e) {
throw new IllegalArgumentException("Unable to open file for writing at " + pathToFile, e);
}
}
Then, we can start reading your daily CSV file, and generate a Map:
public static void main(String... args){
//First we parse the daily update file.
CsvParserSettings settings = new CsvParserSettings();
//here we tell the parser to read the CSV headers
settings.setHeaderExtractionEnabled(true);
//and to select ONLY the following columns.
//This ensures rows with a fixed size will be returned in case some records come with less or more columns than anticipated.
settings.selectFields("id", "name", "city", "zip", "occupation");
CsvParser parser = new CsvParser(settings);
//Here we parse all data into a list.
List<String[]> dailyRecords = parser.parseAll(newReader("/path/to/daily.csv"));
//And convert them to a map. ID's are the keys.
Map<String, String[]> mapOfDailyRecords = toMap(dailyRecords);
... //we'll get back here in a second.
This is the code to generate a Map from the list of daily records:
/* Converts a list of records to a map. Uses element at index 0 as the key */
private static Map<String, String[]> toMap(List<String[]> records) {
HashMap<String, String[]> map = new HashMap<String, String[]>();
for (String[] row : records) {
//column 0 will always have an ID.
map.put(row[0], row);
}
return map;
}
With the map of records, we can process your master file and generate the list of updates:
private static List<Object[]> processMasterFile(final Map<String, String[]> mapOfDailyRecords) {
//we'll put the updated data here
final List<Object[]> output = new ArrayList<Object[]>();
//configures the parser to process only the columns you are interested in.
CsvParserSettings settings = new CsvParserSettings();
settings.setHeaderExtractionEnabled(true);
settings.selectFields("id", "company", "exp", "salary");
//All parsed rows will be submitted to the following RowProcessor. This way the bigger Master file won't
//have all its rows stored in memory.
settings.setRowProcessor(new AbstractRowProcessor() {
#Override
public void rowProcessed(String[] row, ParsingContext context) {
// Incoming rows from MASTER will have the ID as index 0.
// If the daily update map contains the ID, we'll get the daily row
String[] dailyData = mapOfDailyRecords.get(row[0]);
if (dailyData != null) {
//We got a match. Let's join the data from the daily row with the master row.
Object[] mergedRow = new Object[8];
for (int i = 0; i < dailyData.length; i++) {
mergedRow[i] = dailyData[i];
}
for (int i = 1; i < row.length; i++) { //starts from 1 to skip the ID at index 0
mergedRow[i + dailyData.length - 1] = row[i];
}
output.add(mergedRow);
}
}
});
CsvParser parser = new CsvParser(settings);
//the parse() method will submit all rows to the RowProcessor defined above.
parser.parse(newReader("/path/to/master.csv"));
return output;
}
Finally, we can get the merged data and write everything to another file:
... // getting back to the main method here
//Now we process the master data and get a list of updates
List<Object[]> updatedData = processMasterFile(mapOfDailyRecords);
//And write the updated data to another file
CsvWriterSettings writerSettings = new CsvWriterSettings();
writerSettings.setHeaders("id", "name", "city", "zip", "occupation", "company", "exp", "salary");
writerSettings.setHeaderWritingEnabled(true);
CsvWriter writer = new CsvWriter(newWriter("/path/to/updates.csv"), writerSettings);
//Here we write everything, and get the job done.
writer.writeRowsAndClose(updatedData);
}
This should work like a charm. Hope it helps.
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).
I will approach the problem in a step by step manner.
First I will parse/read the master CSV file and keep its content into a hashmap, where the key will be each record's unique 'id' as for the value maybe you can store them in a hash or simply create a java class to store the information.
Example of hash:
{
'1' : { 'name': 'Jhon',
'City': 'Florida',
'zip' : 50069,
....
}
}
Next, read your comparer csv file. For each row, read the 'id' and check if the key exists on the hashmap you have created earlier.
if it exists, then from the hashmap access the information you need and write to a new CSV file.
Also, you might want to consider using a 3rd party CSV parser to make this task easier.
If you have maven maybe you can follow this example I found on net. Otherwise you can just google for apache 'csv parser' example on the internet.
http://examples.javacodegeeks.com/core-java/apache/commons/csv-commons/writeread-csv-files-with-apache-commons-csv-example/

Aggregate data in CSV file using Java

I have a big CSV file, thousands of rows, and I want to aggregate some columns using java code.
The file in the form:
1,2012,T1
2,2015,T2
3,2013,T1
4,2012,T1
The results should be:
T, Year, Count
T1,2012, 2
T1,2013, 1
T2,2015, 1
Put your data to a Map like structure, each time add +1 to a stored value when a key (in your case ""+T+year) found.
You can use map like
Map<String, Integer> rowMap = new HashMap<>();
rowMap("T1", 1);
rowMap("T2", 2);
rowMap("2012", 1);
or you can define your own class with T and Year field by overriding hashcode and equals method. Then you can use
Map<YourClass, Integer> map= new HashMap<>();
T1,2012, 2
String csv =
"1,2012,T1\n"
+ "2,2015,T2\n"
+ "3,2013,T1\n"
+ "4,2012,T1\n";
Map<String, Integer> map = new TreeMap<>();
BufferedReader reader = new BufferedReader(new StringReader(csv));
String line;
while ((line = reader.readLine()) != null) {
String[] fields = line.split(",");
String key = fields[2] + "," + fields[1];
Integer value = map.get(key);
if (value == null)
value = 0;
map.put(key, value + 1);
}
System.out.println(map);
// -> {T1,2012=2, T1,2013=1, T2,2015=1}
Use uniVocity-parsers for the best performance. It should take 1 second to process 1 million rows.
CsvParserSettings settings = new CsvParserSettings();
settings.selectIndexes(1, 2); //select the columns we are going to read
final Map<List<String>, Integer> results = new LinkedHashMap<List<String>, Integer>(); //stores the results here
//Use a custom implementation of RowProcessor
settings.setRowProcessor(new AbstractRowProcessor() {
#Override
public void rowProcessed(String[] row, ParsingContext context) {
List<String> key = Arrays.asList(row); // converts the input array to a List - lists implement hashCode and equals based on their values so they can be used as keys on your map.
Integer count = results.get(key);
if (count == null) {
count = 0;
}
results.put(key, count + 1);
}
});
//creates a parser with the above configuration and RowProcessor
CsvParser parser = new CsvParser(settings);
String input = "1,2012,T1"
+ "\n2,2015,T2"
+ "\n3,2013,T1"
+ "\n4,2012,T1";
//the parse() method will parse and submit all rows to your RowProcessor - use a FileReader to read a file instead the String I'm using as example.
parser.parse(new StringReader(input));
//Here are the results:
for(Entry<List<String>, Integer> entry : results.entrySet()){
System.out.println(entry.getKey() + " -> " + entry.getValue());
}
Output:
[2012, T1] -> 2
[2015, T2] -> 1
[2013, T1] -> 1
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

writing json file using arrays with large data using java

I am trying to form a json file to source an autocomplete controlled textbox.
The file will have millions of elements so I am trying to eliminate duplicates while saving on memory and time. For small amount the following code works yet since I am using an array, the execution gets really slow as the array gets larger.
int i = 0;
JSONObject obj = new JSONObject();
JSONArray array = new JSONArray();
while (iter.hasNext()) {
Map<String,String>forJson = new HashMap<String, String>();
Statement stmt = iter.nextStatement();
object = stmt.getObject();
forJson.put("key", object.asResource().getLocalName());
forJson.put("value", object.asResource().getURI());
i++;
System.out.println(i);
if(!array.contains(forJson))
{
array.add(forJson);
}
}
obj.put("objects", array);
FileWriter file = new FileWriter("/homeDir/data.json");
file.write(obj.toJSONString());
file.flush();
file.close();
The array.contains control eliminates duplicates but it has a considerable negative effect on execution time.
The json file should have tokens like
[{"key": "exampleText1", "value": "exampleValue1"},
{"key": "exampleText2", "value": "exampleValue2"}]
Use a HashSet to contain the keys you have already added:
...
Set<String> usedKeys = new HashSet<String>();
while (iter.hasNext()) {
Map<String,String>forJson = new HashMap<String, String>();
Statement stmt = iter.nextStatement();
object = stmt.getObject();
String key = object.asResource().getLocalName();
if(!usedKeys.contains(key)) {
usedKeys.add(key);
forJson.put("key", key);
forJson.put("value", object.asResource().getURI());
array.add(forJson);
}
i++;
System.out.println(i);
}
If you need to uniqueness check to include the value, you could append the two using a character separator that you know cannot exist in the keys. For example:
String key = object.asResource().getLocalName();
String value = object.asResource().getURI();
String unique = key + "|#|#|" + value;
if(!usedKeys.contains(unique)) {
usedKeys.add(unique);
forJson.put("key", key);
forJson.put("value", value);
array.add(forJson);
}

How do I parse a column from a CSV string using jackson CsvMapper or another csv parser?

I have a Java method that receives a CSV string of values and an integer index to reference which column in the CSV string to parse. The method returns the value associated with the integer index in the CSV string.
For example if I have a CSV string with a header and a second row with values defined as:
String csvString = "Entry #,Date Created,Date Updated, IP Address
165,8/22/13 14:46,,11.222.33.444";
and the integer index received by the method was 1, I'd expect the method to return the string "165"
And if the integer index received by the method was 2, I'd expect the method to return the string "8/22/13 14:46"
etc,...
I don't want to just split up the CSV string by commas as that could get ugly and I'm sure that there is a CSV parser that already does some parsing like this. From my Google searches it sounds like OpenCSV or the Jackson CsvMapper can do this.
I've been playing with the com.fasterxml.jackson.dataformat.csv.CsvMapper libary to parse out the appropriate column of this CSV string and here's what I have so far:
int csvFieldIndex (this is the integer index passed into my method)
String csvString = "Entry #,Date Created,Date Updated, IP Address
165,8/22/13 14:46,,11.222.33.444";
CsvSchema csvSchema = CsvSchema.emptySchema().withHeader();
ObjectReader mapper = new CsvMapper().reader(Map.class).with(csvSchema);
MappingIterator<Map<String, Object>> iter = null;
iter = mapper.readValues(csvString);
while (iter.hasNext()) {
Map<String, Object> row = iter.next();
System.out.Println("row= " + row.toString());
}
But this iterator gives me all the csv values which is not what what I want; I just want the one value associated with my integer index.
Here's the output I get when I run this snippet of code:
row= {Entry #=165, Date Created=8/22/13 14:46, Date Updated=, IP Address=11.222.33.444}
Is there a way I can use the Jackson CsvMapper to do this?
======== UPDATE: ========
Based on feedback from keshlam I was able to parse a column from a CSV with the following code:
CsvSchema csvSchema = CsvSchema.emptySchema().withHeader();
ObjectReader mapper = new CsvMapper().reader(Map.class).with(csvSchema);
MappingIterator<Map<String, Object>> iter = null;
iter = mapper.readValues(csvString);
// iterate over whole csv and store in a map
Map<String, Object> row = null;
while (iter.hasNext()) {
row = iter.next();
}
// now put the set of field names (keys) from this map into an array
String[] csvKeysArray = row.keySet().toArray(new String[0]);
int j = 0;
// loop over this array of keys and compare with csvFieldIndex
for (int i = 0; i < csvKeysArray.length; i++) {
// increment j by 1 since array index starts at 0 but input index starts at 1
j = i + 1;
if (j == csvFieldIndex) {
csvValue = row.get(csvKeysArray[i]).toString();
}
}
return csvValue;
If I'm following what you're asking, and understanding your code correctly (I've never used that CsvMapper), change the final loop to
while (iter.hasNext()) {
Map<String, Object> row = iter.next();
System.out.Println("Entry #= " + row.get("Entry #");
}
If you want to access the columns numerically rather than by name (why?), set up a mapping from number to column name and use that to retrieve the key you pass to the get() operation.

Categories