Decorating Java 8 Files.lines() stream with filename

Decorating Java 8 Files.lines() stream with filename - java

Files.lines() returns a Stream<String> of each line in the file. What I want is a Stream of Map, with the key the line, and the value the filename. This must be an intermediate, not a terminal, result in a pipeline.
What is the best way of accomplishing this?

You can try the following:
final String fileName = ""; //File name
List<String> lines = new ArrayList(); //Get the lines
lines.stream()
.map(l -> {
Map<String, String> map = new HashMap<>();
map.put(l, fileName);
return map;
});
You can use Google Guava's ImmutableMap class to create a map (for simplified version), e.g.
final String fileName = "";
List<String> lines = new ArrayList();
lines.stream()
.map(l -> ImmutableMap.of(l, fileName));

Related

Reading a huge csv file and converting to JSON with Java 8

I am trying to read a csv file with many columns. And the first row is always the header for the csv file. I would like to convert the csv data into JSON. I can read it as String and convert into JSON but I am not able to assign headers to it.
For example input csv looks like:
first_name,last_name
A,A1
B,B1
C,C1
Stream<String> stream = Files.lines(Paths.get("sample.csv"))
List<String[]> readall = stream.map(l -> l.split(",")).collect(Collectors.toList());
or
List<String> test1 = readall.stream().skip(0).map(row -> row[1]).collect(Collectors.toList());
And using com.fasterxml.jackson.databind.ObjectMapper's WriteValueAsString only creates JSON with no header.
I would like the output in the format like
{
[{"first_name":"A","last_name":"A1"},{"first_name":"B"....
How do I use stream in Java to prepare this JSON format?
Please help.

I'd tackle this problem in two steps: first, read the headers, then, read the rest of the lines:
static String[] headers(String path) throws IOException {
try (BufferedReader br = new BufferedReader(new FileReader(path))) {
return br.readLine().split(",");
}
}
Now, you can use the method above as follows:
String path = "sample.csv";
// Read headers
String[] headers = headers(path);
List<Map<String, String>> result = null;
// Read data
try (Stream<String> stream = Files.lines(Paths.get(path))) {
result = stream
.skip(1) // skip headers
.map(line -> line.split(","))
.map(data -> {
Map<String, String> map = new HashMap<>();
for (int i = 0; i < data.length; i++) {
map.put(headers[i], data[i]);
}
return map;
})
.collect(Collectors.toList());
}
You can change the for loop inside the 2nd map operation:
try (Stream<String> stream = Files.lines(Paths.get(path))) {
result = stream
.skip(1) // skip headers
.map(line -> line.split(","))
.map(data -> IntStream.range(0, data.length)
.boxed()
.collect(Collectors.toMap(i -> headers[i], i -> data[i])))
.collect(Collectors.toList());
}
EDIT: If instead of collecting to a list, you want to perform an action for the maps read from each line, you can do it as follows:
try (Stream<String> stream = Files.lines(Paths.get(path))) {
stream
.skip(1) // skip headers
.map(line -> line.split(","))
.map(data -> IntStream.range(0, data.length)
.boxed()
.collect(Collectors.toMap(i -> headers[i], i -> data[i])))
.forEach(System.out::println);
}
(Here the action is to print each map).
This version can be improved, i.e. it boxes the stream of ints and then unboxes each int again to use it as the index of the headers and data arrays. Also, readability can be improved by extracting the creation of each map to a private method.
Notes: Maybe reading the file twice is not the best approach performance-wise, but the code is simple and expressive. Apart from this, null handling, data transformation (i.e. to numbers or dates, etc) and border cases (i.e. no headers, no data lines or different lengths for the arrays of data, etc) are left as an exercise for the reader ;)

I think this is what you are trying to do
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
import java.util.stream.Stream;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
public class App {
public static void main(String[] args) throws JsonProcessingException, IOException {
Stream<String> stream = Files.lines(Paths.get("src/main/resources/test1.csv"));
List<Map<String, Object>> readall = stream.map(l -> {
Map<String, Object> map = new HashMap<String, Object>();
String[] values = l.split(",");
map.put("name", values[0]);
map.put("age", values[1]);
return map;
}).collect(Collectors.toList());
ObjectMapper mapperObj = new ObjectMapper();
String jsonResp = mapperObj.writeValueAsString(readall);
System.out.println(jsonResp);
}
}
Works with Java -8 Streams, with headers, and uses jackson to convert it into json. used CSV
abc,20
bbc,30

Very Simple,
Don't convert it into List of Strings. Convert it into List of HashMaps and then use org.json library to convert it into json . Use jackson to convert the CSV to Hashmap
Let the input stream be
InputStream stream = new FileInputStream(new File("filename.csv"));
Example:
To convert CSV to HashMap
public List<Map<String, Object>> read(InputStream stream) throws JsonProcessingException, IOException {
List<Map<String, Object>> response = new LinkedList<Map<String, Object>>();
CsvMapper mapper = new CsvMapper();
CsvSchema schema = CsvSchema.emptySchema().withHeader();
MappingIterator<Map<String, String>> iterator = mapper.reader(Map.class).with(schema).readValues(stream);
while (iterator.hasNext())
{
response.add(Collections.<String, Object>unmodifiableMap(iterator.next()));
}
return response;
}
To convert List of Map to Json
JSONArray jsonArray = new JSONArray(response);
System.out.println(jsonArray.toString());

How to convert a CSV file to List<Map<String,String>>

I have a CSV file which has a header in the first line. I want to convert it to List<Map<String, String>>, where each Map<String, String> in the list represents a record in the file. The key of the map is the header and the value is the actual value of the field.
What I have so far:
BufferedReader br = <handle to file>;
// Get the headers to build the map.
String[] headers = br.lines().limit(1).collect(Collectors.toArray(size -> new String[size]));
Stream<String> recordStream = br.lines().skip(1);
What further operations can I perform on recordStream so that I can transform it to List<Map<String, String>>?
Sample CSV file is:
header1,header2,header3 ---- Header line
field11,field12,field13 ----> need to transform to Map where entry would be like header1:field11 header2:field12 and so on.
field21,field22,field23
field31,field32,field33
Finally all these Maps need to be collected to a List.

The following will work. The header line is retrieved by calling readLine directly on the BufferedReader and by splitting around ,. Then, the rest of the file is read: each line is split around , and mapped to a Map with the corresponding header.
try (BufferedReader br = new BufferedReader(...)) {
String[] headers = br.readLine().split(",");
List<Map<String, String>> records =
br.lines().map(s -> s.split(","))
.map(t -> IntStream.range(0, t.length)
.boxed()
.collect(toMap(i -> headers[i], i -> t[i])))
.collect(toList());
System.out.println(headers);
System.out.println(records);
};
A very important note here is that BufferedReader.lines() does not return a fresh Stream when it is called: we must not skip 1 line after we read the header since the reader will already have advanced to the next line.
As a side note, I used a try-with-resources construct so that the BufferedReader can be properly closed.

I know this is a bit of an old question, but I ran into the same problem, and created a quick sample of the Commons CSV solution mentioned by Tagir Valeev:
Reader in = new FileReader("path/to/file.csv");
Iterable<CSVRecord> records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse(in);
List<Map> listOfMaps = new ArrayList<>();
for (CSVRecord record : records) {
listOfMaps.add(record.toMap());
}

How to read from particular header in opencsv?

I have a csv file. I want to extract particular column from it.For example:
Say, I have csv:
id1,caste1,salary,name1
63,Graham,101153.06,Abraham
103,Joseph,122451.02,Charlie
63,Webster,127965.91,Violet
76,Smith,156150.62,Eric
97,Moreno,55867.74,Mia
65,Reynolds,106918.14,Richard
How can i use opencsv to read only data from header caste1?

Magnilex and Sparky are right in that CSVReader does not support reading values by column name. But that being said there are two ways you can do this.
Given that you have the column names and the default CSVReader reads the header you can search the first the header for the position then use that from there on out;
private int getHeaderLocation(String[] headers, String columnName) {
return Arrays.asList(headers).indexOf(columnName);
}
so your method would look like (leaving out a lot of error checks you will need to put in)
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
int columnPosition;
nextLine = reader.readNext();
columnPosition = getHeaderLocation(nextLine, "castle1");
while ((nextLine = reader.readNext()) != null && columnPosition > -1) {
// nextLine[] is an array of values from the line
System.out.println(nextLine[columnPosition]);
}
I would only do the above if you were pressed for time and it was only one column you cared about. That is because openCSV can convert directly to an object that has the variables the same as the header column names using the CsvToBean class and the HeaderColumnNameMappingStrategy.
So first you would define a class that has the fields (and really you only need to put in the fields you want - extras are ignored and missing ones are null or default values).
public class CastleDTO {
private int id1;
private String castle1;
private double salary;
private String name1;
// have all the getters and setters here....
}
Then your code would look like
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
HeaderColumnNameMappingStrategy<CastleDTO> castleStrategy = new HeaderColumnNameMappingStrategy<CastleDTO>();
CsvToBean<CastleDTO> csvToBean = new CsvToBean<CastleDTO>();
List<CastleDTO> castleList = csvToBean.parse(castleStrategy, reader);
for (CastleDTO dto : castleList) {
System.out.println(dto.getCastle1());
}

There is no built in functionality in opencsv for reading from a column by name.
The official FAQ example has the following example on how to read from a file:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
// nextLine[] is an array of values from the line
System.out.println(nextLine[0] + nextLine[1] + "etc...");
}
You simply fetch the value in second column for each row by accesing the row with nextLine[1] (remember, arrays indices are zero based).
So, in your case you could simply read from the second line:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
System.out.println(nextLine[1]);
}
For a more sophisticated way of determining the column index from its header, refer to the answer from Scott Conway.

From the opencsv docs:
Starting with version 4.2, there’s another handy way of reading CSV files that doesn’t even require creating special classes. If your CSV file has headers, you can just initialize a CSVReaderHeaderAware and start reading the values out as a map:
reader = new CSVReaderHeaderAware(new FileReader("yourfile.csv"));
record = reader.readMap();
.readMap() will return a single record. You need to call .readMap() repeatedly to get all the records until you get null when it runs to the end (or to the first empty line), e.g.:
Map<String, String> values;
while ((values = reader.readMap()) != null) {
// consume the values here
}
The class also has another constructor which allows more customization, e.g.:
CSVReaderHeaderAware reader = new CSVReaderHeaderAware(
new InputStreamReader(inputStream),
0, // skipLines
parser, // custom parser
false, // keep end of lines
true, // verify reader
0, // multiline limit
null // null for default locale
);
One downside which I have found is that since the reader is lazy it does not offer a record count, therefore, if you need to know the total number (for example to display correct progress information), then you'll need to use another reader just for counting lines.
You also have available the CSVReaderHeaderAwareBuilder

I had a task to remove several columns from existing csv, example of csv:
FirstName, LastName, City, County, Zip
Steve,Hopkins,London,Greater London,15554
James,Bond,Vilnius,Vilniaus,03250
I needed only FirstName and LastName columns with values and it is very important that order should be the same - default rd.readMap() does not preserve the order, code for this task:
String[] COLUMN_NAMES_TO_REMOVE = new String[]{"", "City", "County", "Zip"};
CSVReaderHeaderAware rd = new CSVReaderHeaderAware(new StringReader(old.csv));
CSVWriter writer = new CSVWriter((new FileWriter(new.csv)),
CSVWriter.DEFAULT_SEPARATOR, CSVWriter.NO_QUOTE_CHARACTER, CSVWriter.NO_ESCAPE_CHARACTER, CSVWriter.DEFAULT_LINE_END);
// let's get private field
Field privateField = CSVReaderHeaderAware.class.getDeclaredField("headerIndex");
privateField.setAccessible(true);
Map<String, Integer> headerIndex = (Map<String, Integer>) privateField.get(rd);
// do ordering in natural order - 0, 1, 2 ... n
Map<String, Integer> sortedInNaturalOrder = headerIndex.entrySet().stream()
.sorted(Map.Entry.comparingByValue(Comparator.naturalOrder()))
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue,
(oldValue, newValue) -> oldValue, LinkedHashMap::new));
// let's get headers in natural order
List<String> headers = sortedInNaturalOrder.keySet().stream().distinct().collect(Collectors.toList());
// let's remove headers
List<String> removedColumns = new ArrayList<String>(Arrays.asList(COLUMN_NAMES_TO_REMOVE));
headers.removeAll(removedColumns);
// save column names
writer.writeNext(headers.toArray(new String[headers.size()]));
List<String> keys = new ArrayList<>();
Map<String, String> values;
while ((values = rd.readMap()) != null) {
for (String key : headers) {
keys.add(values.get(key));
if (keys.size() == headers.size()) {
String[] itemsArray = new String[headers.size()];
itemsArray = keys.toArray(itemsArray);
// save values
writer.writeNext(itemsArray);
keys.clear();
}
}
}
writer.flush();
Output:
FirstName, LastName
Steve,Hopkins
James,Bond

Looking at the javadoc
if you create a CSVReader object, then you can use the method .readAll to pull the entire file. It returns a List of String[], with each String[] representing a line of the file. So now you have the tokens of each line, and you only want the second element of that, so split them up as they have been nicely given to you with delimiters. And on each line you only want the second element, so:
public static void main(String[] args){
String data = "63,Graham,101153.06,Abraham";
String result[] = data.split(",");
System.out.print(result[1]);
}

read/write files in Java

How can I read a file easily in Java if I have following file format:
a|dip
a|dop
c|nap
a|dip
b|dop
b|sip
a|tang
c|dig
c|nap
I want to get all words that belongs to "a", "b", and "c". What data structure I can use to read and store this information?
You can also suggest some good file formats (two column) that is easy to read/write in Java.
I know some of you may be thinking that what is the real problem that I want to solve, I have some complex employee related data. Current (poor) system generate some files and I am trying to process them to add them in database. The current files' format is bit complex (private), I cannot copy past here.

If you can use Google Guava (http://code.google.com/p/guava-libraries/) then you'll get a few handy classes (you can use some or all of these):
com.google.common.io.Files
com.google.common.io.LineProcessor<T>
com.google.common.base.Charsets
com.google.common.collect.Multimap<K,V>
com.google.common.collect.ArrayListMultimap<K,V>
For example you could write:
LineProcessor<Multimap<String, String>> processor =
new LineProcessor<Multimap<String, String>>() {
Multimap<String, String> processed = ArrayListMultimap.create();
public boolean processLine(String line) {
String parts[] = line.split("\\|", 2); // 2 keeps any | in the rest of the line
processed.put(parts[0], parts[1]);
return true; // keep going
}
public Multimap<String, String> getResult() {
return processed;
}
};
Multimap<String, String> result = Files.readLines(
new File("filename.txt"), Charsets.UTF_8, processor);

You can use Scanner to read the text file one line at a time and then you can use String.split("\\|") to separate the parts on that line. For storing the information, a Map<String,List<String>> might work.

I'd use this data structure:
Map<String, List<String>> map = new HashMap<String, List<String>>();
And parse the file like this:
File file = new File("words.txt");
Scanner scanner = new Scanner(file);
while (scanner.hasNext()) {
String next = scanner.next();
String[] parts = next.split("\\|");
String group = parts[0];
String word = parts[1];
List<String> list = map.get(group);
if (list == null) {
list = new ArrayList<String>();
map.put(group, list);
}
list.add(word);
}
So you could get the list of words for "a" like so:
for (String word : map.get("a")) {
System.out.println(word);
}

How to read and write a HashMap to a file?

I have the following HashMap:
HashMap<String,Object> fileObj = new HashMap<String,Object>();
ArrayList<String> cols = new ArrayList<String>();
cols.add("a");
cols.add("b");
cols.add("c");
fileObj.put("mylist",cols);
I write it to a file as follows:
File file = new File("temp");
FileOutputStream f = new FileOutputStream(file);
ObjectOutputStream s = new ObjectOutputStream(f);
s.writeObject(fileObj);
s.flush();
Now I want to read this file back to a HashMap where the Object is an ArrayList.
If i simply do:
File file = new File("temp");
FileInputStream f = new FileInputStream(file);
ObjectInputStream s = new ObjectInputStream(f);
fileObj = (HashMap<String,Object>)s.readObject();
s.close();
This does not give me the object in the format that I saved it in.
It returns a table with 15 null elements and the < mylist,[a,b,c] > pair at the 3rd element. I want it to return only one element with the values I had provided to it in the first place.
//How can I read the same object back into a HashMap ?
OK So based on Cem's note: This is what seems to be the correct explanation:
ObjectOutputStream serializes the objects (HashMap in this case) in whatever format that ObjectInputStream will understand to deserialize and does so generically for any Serializable object.
If you want it to serialize in the format that you desire you should write your own serializer/deserializer.
In my case: I simply iterate through each of those elements in the HashMap when I read the Object back from the file and get the data and do whatever I want with it. (it enters the loop only at the point where there is data).
Thanks,

You appear to be confusing the internal resprentation of a HashMap with how the HashMap behaves. The collections are the same. Here is a simple test to prove it to you.
public static void main(String... args)
throws IOException, ClassNotFoundException {
HashMap<String, Object> fileObj = new HashMap<String, Object>();
ArrayList<String> cols = new ArrayList<String>();
cols.add("a");
cols.add("b");
cols.add("c");
fileObj.put("mylist", cols);
{
File file = new File("temp");
FileOutputStream f = new FileOutputStream(file);
ObjectOutputStream s = new ObjectOutputStream(f);
s.writeObject(fileObj);
s.close();
}
File file = new File("temp");
FileInputStream f = new FileInputStream(file);
ObjectInputStream s = new ObjectInputStream(f);
HashMap<String, Object> fileObj2 = (HashMap<String, Object>) s.readObject();
s.close();
Assert.assertEquals(fileObj.hashCode(), fileObj2.hashCode());
Assert.assertEquals(fileObj.toString(), fileObj2.toString());
Assert.assertTrue(fileObj.equals(fileObj2));
}

I believe you´re making a common mistake. You forgot to close the stream after using it!
File file = new File("temp");
FileOutputStream f = new FileOutputStream(file);
ObjectOutputStream s = new ObjectOutputStream(f);
s.writeObject(fileObj);
s.close();

you can also use JSON file to read and write MAP object.
To write map object into JSON file
ObjectMapper mapper = new ObjectMapper();
Map<String, Object> map = new HashMap<String, Object>();
map.put("name", "Suson");
map.put("age", 26);
// write JSON to a file
mapper.writeValue(new File("c:\\myData.json"), map);
To read map object from JSON file
ObjectMapper mapper = new ObjectMapper();
// read JSON from a file
Map<String, Object> map = mapper.readValue(
new File("c:\\myData.json"),
new TypeReference<Map<String, Object>>() {
});
System.out.println(map.get("name"));
System.out.println(map.get("age"));
and import ObjectMapper from com.fasterxml.jackson and put code in try catch block

Your first line:
HashMap<String,Object> fileObj = new HashMap<String,Object>();
gave me pause, as the values are not guaranteed to be Serializable and thus may not be written out correctly. You should really define the object as a HashMap<String, Serializable> (or if you prefer, simpy Map<String, Serializable>).
I would also consider serializing the Map in a simple text format such as JSON since you are doing a simple String -> List<String> mapping.

I believe you're getting what you're saving. Have you inspected the map before you save it? In HashMap:
/**
* The default initial capacity - MUST be a power of two.
*/
static final int DEFAULT_INITIAL_CAPACITY = 16;
e.g. the default HashMap will start off with 16 nulls. You use one of the buckets, so you only have 15 nulls left when you save, which is what you get when you load.
Try inspecting fileObj.keySet(), .entrySet() or .values() to see what you expect.
HashMaps are designed to be fast while trading off memory. See Wikipedia's Hash table entry for more details.

Same data if you want to write to a text file
public void writeToFile(Map<String, List<String>> failureMessage){
if(file!=null){
try{
BufferedWriter writer=new BufferedWriter(new FileWriter(file, true));
for (Map.Entry<String, List<String>> map : failureMessage.entrySet()) {
writer.write(map.getKey()+"\n");
for(String message:map.getValue()){
writer.write(message+"\n");
}
writer.write("\n");
}
writer.close();
}catch (Exception e){
System.out.println("Unable to write to file: "+file.getPath());
e.printStackTrace();
}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Decorating Java 8 Files.lines() stream with filename - java

Files.lines() returns a Stream<String> of each line in the file. What I want is a Stream of Map, with the key the line, and the value the filename. This must be an intermediate, not a terminal, result in a pipeline. What is the best way of accomplishing this?

Related

Reading a huge csv file and converting to JSON with Java 8

How to convert a CSV file to List<Map<String,String>>

How to read from particular header in opencsv?

read/write files in Java

How to read and write a HashMap to a file?

Categories

Resources