Reading a huge csv file and converting to JSON with Java 8 - java

I am trying to read a csv file with many columns. And the first row is always the header for the csv file. I would like to convert the csv data into JSON. I can read it as String and convert into JSON but I am not able to assign headers to it.
For example input csv looks like:
first_name,last_name
A,A1
B,B1
C,C1
Stream<String> stream = Files.lines(Paths.get("sample.csv"))
List<String[]> readall = stream.map(l -> l.split(",")).collect(Collectors.toList());
or
List<String> test1 = readall.stream().skip(0).map(row -> row[1]).collect(Collectors.toList());
And using com.fasterxml.jackson.databind.ObjectMapper's WriteValueAsString only creates JSON with no header.
I would like the output in the format like
{
[{"first_name":"A","last_name":"A1"},{"first_name":"B"....
How do I use stream in Java to prepare this JSON format?
Please help.

I'd tackle this problem in two steps: first, read the headers, then, read the rest of the lines:
static String[] headers(String path) throws IOException {
try (BufferedReader br = new BufferedReader(new FileReader(path))) {
return br.readLine().split(",");
}
}
Now, you can use the method above as follows:
String path = "sample.csv";
// Read headers
String[] headers = headers(path);
List<Map<String, String>> result = null;
// Read data
try (Stream<String> stream = Files.lines(Paths.get(path))) {
result = stream
.skip(1) // skip headers
.map(line -> line.split(","))
.map(data -> {
Map<String, String> map = new HashMap<>();
for (int i = 0; i < data.length; i++) {
map.put(headers[i], data[i]);
}
return map;
})
.collect(Collectors.toList());
}
You can change the for loop inside the 2nd map operation:
try (Stream<String> stream = Files.lines(Paths.get(path))) {
result = stream
.skip(1) // skip headers
.map(line -> line.split(","))
.map(data -> IntStream.range(0, data.length)
.boxed()
.collect(Collectors.toMap(i -> headers[i], i -> data[i])))
.collect(Collectors.toList());
}
EDIT: If instead of collecting to a list, you want to perform an action for the maps read from each line, you can do it as follows:
try (Stream<String> stream = Files.lines(Paths.get(path))) {
stream
.skip(1) // skip headers
.map(line -> line.split(","))
.map(data -> IntStream.range(0, data.length)
.boxed()
.collect(Collectors.toMap(i -> headers[i], i -> data[i])))
.forEach(System.out::println);
}
(Here the action is to print each map).
This version can be improved, i.e. it boxes the stream of ints and then unboxes each int again to use it as the index of the headers and data arrays. Also, readability can be improved by extracting the creation of each map to a private method.
Notes: Maybe reading the file twice is not the best approach performance-wise, but the code is simple and expressive. Apart from this, null handling, data transformation (i.e. to numbers or dates, etc) and border cases (i.e. no headers, no data lines or different lengths for the arrays of data, etc) are left as an exercise for the reader ;)

I think this is what you are trying to do
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
import java.util.stream.Stream;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
public class App {
public static void main(String[] args) throws JsonProcessingException, IOException {
Stream<String> stream = Files.lines(Paths.get("src/main/resources/test1.csv"));
List<Map<String, Object>> readall = stream.map(l -> {
Map<String, Object> map = new HashMap<String, Object>();
String[] values = l.split(",");
map.put("name", values[0]);
map.put("age", values[1]);
return map;
}).collect(Collectors.toList());
ObjectMapper mapperObj = new ObjectMapper();
String jsonResp = mapperObj.writeValueAsString(readall);
System.out.println(jsonResp);
}
}
Works with Java -8 Streams, with headers, and uses jackson to convert it into json. used CSV
abc,20
bbc,30

Very Simple,
Don't convert it into List of Strings. Convert it into List of HashMaps and then use org.json library to convert it into json . Use jackson to convert the CSV to Hashmap
Let the input stream be
InputStream stream = new FileInputStream(new File("filename.csv"));
Example:
To convert CSV to HashMap
public List<Map<String, Object>> read(InputStream stream) throws JsonProcessingException, IOException {
List<Map<String, Object>> response = new LinkedList<Map<String, Object>>();
CsvMapper mapper = new CsvMapper();
CsvSchema schema = CsvSchema.emptySchema().withHeader();
MappingIterator<Map<String, String>> iterator = mapper.reader(Map.class).with(schema).readValues(stream);
while (iterator.hasNext())
{
response.add(Collections.<String, Object>unmodifiableMap(iterator.next()));
}
return response;
}
To convert List of Map to Json
JSONArray jsonArray = new JSONArray(response);
System.out.println(jsonArray.toString());

Related

Using Jackson to convert CSV to JSON - How to remove newlines embedded in CSV column header

After some quick Googling, I found an easy way to read and parse a CSV file to JSON using the Jackson library. All well and good, except ... some of the CSV header column names have embedded newlines. The program handles it, but I'm left with JSON keys with newlines embedded within. I'd like to remove these (or replace them with a space).
Here is the simple program I found:
import java.io.File;
import java.util.List;
import java.util.Map;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvSchema;
public class CSVToJSON {
public static void main(String[] args) throws Exception {
File input = new File("PDM_BOM.csv");
File output = new File("output.json");
CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).build();
CsvMapper csvMapper = new CsvMapper();
// Read data from CSV file
List<Object> readAll = csvMapper.readerFor(Map.class).with(csvSchema).readValues(input)
.readAll();
ObjectMapper mapper = new ObjectMapper();
// Write JSON formated data to output.json file
mapper.writerWithDefaultPrettyPrinter().writeValue(output, readAll);
// Write JSON formated data to stdout
System.out.println(mapper.writerWithDefaultPrettyPrinter().writeValueAsString(readAll));
}
}
So, as an example:
PARENT\nITEM\nNUMBER
Here's an example of what is produced:
"PARENT\nITEM\nNUMBER" : "208E8840040",
I need this to be:
"PARENT ITEM NUMBER" : "208E8840040",
Is there a configuration setting on the Jackson mapper that can handle this? Or, do I need to provide some sort of custom "handler" to the mapper?
Special cases
To add some complexity, there are cases where just replacing the newline with a space will not always yield what is needed.
Example 1:
Sometimes there is a column header like this:
QTY\nORDER/\nTRANSACTION
In this case, I need the newline removed and replaced with nothing, so that the result is:
QTY ORDER/TRANSACTION
, not
QTY ORDER/ TRANSACTION
Example 2:
Sometimes, for whatever reason, a column header has a space before the newline:
EFFECTIVE \nTHRU DATE
This needs to come out as:
EFFECTIVE THRU DATE
, not
EFFECTIVE THRU DATE
Any ideas on how to handle at least the main issue would be very much appreciated.
You can use the String replaceAll() method to replace all new lines with spaces.
String str = mapper.writerWithDefaultPrettyPrinter().writeValueAsString(readAll);
str = str.trim().replaceAll("[\n\s]+", " ");
OK, came up with a solution. It's ugly, but it works. Basically, after the CsvMapper finishes, I go through the giant ugly collection that's produced and do a String.replaceAll (thanks to https://stackoverflow.com/users/4402505/prem-kurian-philip for that suggestion) to remove the unwanted characters and then rebuild the map.
In any case here's the new code:
public class CSVToJSON {
public static void main(String[] args) throws Exception {
File input = new File("PDM_BOM.csv");
File output = new File("output.json");
CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).build();
CsvMapper csvMapper = new CsvMapper();
// Read data from CSV file
List<Object> readData = csvMapper.readerFor(Map.class).with(csvSchema).readValues(input)
.readAll();
for (Object mapObj : readData) {
LinkedHashMap<String, String> map = (LinkedHashMap<String, String>) mapObj;
List<String> deleteList = new ArrayList<>();
LinkedHashMap<String, String> insertMap = new LinkedHashMap<>();
for (Object entObj : map.entrySet()) {
Entry<String, String> entry = (Entry<String, String>) entObj;
String oldKey = entry.getKey();
String newKey = oldKey.replaceAll("[\n\s]+", " ");
String value = entry.getValue();
deleteList.add(oldKey);
insertMap.put(newKey, value);
}
// Delete the old ...
for (String oldKey : deleteList) {
map.remove(oldKey);
}
// and bring in the new
map.putAll(insertMap);
}
ObjectMapper mapper = new ObjectMapper();
// Write JSON formated data to output.json file
mapper.writerWithDefaultPrettyPrinter().writeValue(output, readData);
// Write JSON formated data to stdout
System.out.println(mapper.writerWithDefaultPrettyPrinter().writeValueAsString(readAll));
}
}
It seems like there should be a better way to achieve this.

Apache CSV - Convert List<String> to CSVRecord

I'm inclined to use CSVRecord because it can be used to map with a header and get the corresponding value. My application frequently uses CSVRecord class. However, I cannot instantiate the CSVRecord. I would prefer not to modify the source/create a new class since it already provides a parser that returns CSVRecord. I have got a list of strings (header as well as the values) that needed to be converted to the CSVRecord type. Is there a direct way that this can be done without going around with formatting and then parsing back? Like the one below:
private CSVRecord format(List<String> header, List<String> values)
{
CSVFormat csvFormat = CSVFormat.DEFAULT.withRecordSeparator(System.lineSeparator())
.withQuoteMode(QuoteMode.ALL);
CSVRecord csvRecord = null;
final StringWriter out = new StringWriter();
try (CSVPrinter csvPrinter = new CSVPrinter(out, csvFormat);)
{
csvPrinter.printRecord(values);
String value = out.toString().trim();
for (CSVRecord r : CSVParser.parse(value, csvFormat.withHeader(header.toArray(new String[header.size()]))))
csvRecord = r;
}
catch (IOException e)
{
logger.error("Unable to format the Iterable to CSVRecord. Header: [{}]; Values: [{}]", e,
String.join(", ", header), String.join(", ", values));
}
return csvRecord;
}
private void testMethod() throws Exception
{
List<String> header = Arrays.asList("header1", "header2", "header3");
List<String> record = Arrays.asList("val1", "val2", "val3");
CSVRecord csvRecord = format(header, record);
logger.info("{}", csvRecord.get("header2"));
}
You could pass the list as a string directly into the CSVParser instead of creating a writer.
CSVRecord csvr = CSVParser.parse(
values.stream().collect(Collectors.joining(","))
,csvFormat.withHeader(header.toArray(new String[header.size()])))
.getRecords().get(0);
The BeanIO and SimpleFlatMapper are way better at solving this problem. BeanIO uses a Map data structure and a config file to declare how the CSV file should be structured so it is very powerful. SimpleFlatMapper will take you POJO properties as the heading names by default and output the property values are column values.
BeanIO
http://beanio.org/2.1/docs/reference/index.html#CSVStreamFormat
SimpleFlatMapper
http://simpleflatmapper.org/
CsvParser
.mapTo(MyObject.class)
.stream(reader)
.forEach(System.out::println);

Decorating Java 8 Files.lines() stream with filename

Files.lines() returns a Stream<String> of each line in the file. What I want is a Stream of Map, with the key the line, and the value the filename. This must be an intermediate, not a terminal, result in a pipeline.
What is the best way of accomplishing this?
You can try the following:
final String fileName = ""; //File name
List<String> lines = new ArrayList(); //Get the lines
lines.stream()
.map(l -> {
Map<String, String> map = new HashMap<>();
map.put(l, fileName);
return map;
});
You can use Google Guava's ImmutableMap class to create a map (for simplified version), e.g.
final String fileName = "";
List<String> lines = new ArrayList();
lines.stream()
.map(l -> ImmutableMap.of(l, fileName));

How to convert a CSV file to List<Map<String,String>>

I have a CSV file which has a header in the first line. I want to convert it to List<Map<String, String>>, where each Map<String, String> in the list represents a record in the file. The key of the map is the header and the value is the actual value of the field.
What I have so far:
BufferedReader br = <handle to file>;
// Get the headers to build the map.
String[] headers = br.lines().limit(1).collect(Collectors.toArray(size -> new String[size]));
Stream<String> recordStream = br.lines().skip(1);
What further operations can I perform on recordStream so that I can transform it to List<Map<String, String>>?
Sample CSV file is:
header1,header2,header3 ---- Header line
field11,field12,field13 ----> need to transform to Map where entry would be like header1:field11 header2:field12 and so on.
field21,field22,field23
field31,field32,field33
Finally all these Maps need to be collected to a List.
The following will work. The header line is retrieved by calling readLine directly on the BufferedReader and by splitting around ,. Then, the rest of the file is read: each line is split around , and mapped to a Map with the corresponding header.
try (BufferedReader br = new BufferedReader(...)) {
String[] headers = br.readLine().split(",");
List<Map<String, String>> records =
br.lines().map(s -> s.split(","))
.map(t -> IntStream.range(0, t.length)
.boxed()
.collect(toMap(i -> headers[i], i -> t[i])))
.collect(toList());
System.out.println(headers);
System.out.println(records);
};
A very important note here is that BufferedReader.lines() does not return a fresh Stream when it is called: we must not skip 1 line after we read the header since the reader will already have advanced to the next line.
As a side note, I used a try-with-resources construct so that the BufferedReader can be properly closed.
I know this is a bit of an old question, but I ran into the same problem, and created a quick sample of the Commons CSV solution mentioned by Tagir Valeev:
Reader in = new FileReader("path/to/file.csv");
Iterable<CSVRecord> records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse(in);
List<Map> listOfMaps = new ArrayList<>();
for (CSVRecord record : records) {
listOfMaps.add(record.toMap());
}

How to read and write a HashMap to a file?

I have the following HashMap:
HashMap<String,Object> fileObj = new HashMap<String,Object>();
ArrayList<String> cols = new ArrayList<String>();
cols.add("a");
cols.add("b");
cols.add("c");
fileObj.put("mylist",cols);
I write it to a file as follows:
File file = new File("temp");
FileOutputStream f = new FileOutputStream(file);
ObjectOutputStream s = new ObjectOutputStream(f);
s.writeObject(fileObj);
s.flush();
Now I want to read this file back to a HashMap where the Object is an ArrayList.
If i simply do:
File file = new File("temp");
FileInputStream f = new FileInputStream(file);
ObjectInputStream s = new ObjectInputStream(f);
fileObj = (HashMap<String,Object>)s.readObject();
s.close();
This does not give me the object in the format that I saved it in.
It returns a table with 15 null elements and the < mylist,[a,b,c] > pair at the 3rd element. I want it to return only one element with the values I had provided to it in the first place.
//How can I read the same object back into a HashMap ?
OK So based on Cem's note: This is what seems to be the correct explanation:
ObjectOutputStream serializes the objects (HashMap in this case) in whatever format that ObjectInputStream will understand to deserialize and does so generically for any Serializable object.
If you want it to serialize in the format that you desire you should write your own serializer/deserializer.
In my case: I simply iterate through each of those elements in the HashMap when I read the Object back from the file and get the data and do whatever I want with it. (it enters the loop only at the point where there is data).
Thanks,
You appear to be confusing the internal resprentation of a HashMap with how the HashMap behaves. The collections are the same. Here is a simple test to prove it to you.
public static void main(String... args)
throws IOException, ClassNotFoundException {
HashMap<String, Object> fileObj = new HashMap<String, Object>();
ArrayList<String> cols = new ArrayList<String>();
cols.add("a");
cols.add("b");
cols.add("c");
fileObj.put("mylist", cols);
{
File file = new File("temp");
FileOutputStream f = new FileOutputStream(file);
ObjectOutputStream s = new ObjectOutputStream(f);
s.writeObject(fileObj);
s.close();
}
File file = new File("temp");
FileInputStream f = new FileInputStream(file);
ObjectInputStream s = new ObjectInputStream(f);
HashMap<String, Object> fileObj2 = (HashMap<String, Object>) s.readObject();
s.close();
Assert.assertEquals(fileObj.hashCode(), fileObj2.hashCode());
Assert.assertEquals(fileObj.toString(), fileObj2.toString());
Assert.assertTrue(fileObj.equals(fileObj2));
}
I believe you´re making a common mistake. You forgot to close the stream after using it!
File file = new File("temp");
FileOutputStream f = new FileOutputStream(file);
ObjectOutputStream s = new ObjectOutputStream(f);
s.writeObject(fileObj);
s.close();
you can also use JSON file to read and write MAP object.
To write map object into JSON file
ObjectMapper mapper = new ObjectMapper();
Map<String, Object> map = new HashMap<String, Object>();
map.put("name", "Suson");
map.put("age", 26);
// write JSON to a file
mapper.writeValue(new File("c:\\myData.json"), map);
To read map object from JSON file
ObjectMapper mapper = new ObjectMapper();
// read JSON from a file
Map<String, Object> map = mapper.readValue(
new File("c:\\myData.json"),
new TypeReference<Map<String, Object>>() {
});
System.out.println(map.get("name"));
System.out.println(map.get("age"));
and import ObjectMapper from com.fasterxml.jackson and put code in try catch block
Your first line:
HashMap<String,Object> fileObj = new HashMap<String,Object>();
gave me pause, as the values are not guaranteed to be Serializable and thus may not be written out correctly. You should really define the object as a HashMap<String, Serializable> (or if you prefer, simpy Map<String, Serializable>).
I would also consider serializing the Map in a simple text format such as JSON since you are doing a simple String -> List<String> mapping.
I believe you're getting what you're saving. Have you inspected the map before you save it? In HashMap:
/**
* The default initial capacity - MUST be a power of two.
*/
static final int DEFAULT_INITIAL_CAPACITY = 16;
e.g. the default HashMap will start off with 16 nulls. You use one of the buckets, so you only have 15 nulls left when you save, which is what you get when you load.
Try inspecting fileObj.keySet(), .entrySet() or .values() to see what you expect.
HashMaps are designed to be fast while trading off memory. See Wikipedia's Hash table entry for more details.
Same data if you want to write to a text file
public void writeToFile(Map<String, List<String>> failureMessage){
if(file!=null){
try{
BufferedWriter writer=new BufferedWriter(new FileWriter(file, true));
for (Map.Entry<String, List<String>> map : failureMessage.entrySet()) {
writer.write(map.getKey()+"\n");
for(String message:map.getValue()){
writer.write(message+"\n");
}
writer.write("\n");
}
writer.close();
}catch (Exception e){
System.out.println("Unable to write to file: "+file.getPath());
e.printStackTrace();
}
}
}

Categories