How to collect CSV row as array of strings using simpleflatmapper - java

I'm trying to collect CSV row as array of strings using simpleflatmapper:
try (Reader in = Files.newBufferedReader("path")) {
return org.simpleflatmapper.csv.CsvParser
// .mapTo(String[].class)
.stream(in)
// .parallel()
// .flatMap(Arrays::stream)
.map(line -> {return new ArrayList<>(Arrays.asList(line));})
// .map(Arrays::asList)
.collect(Collectors.toList());
} catch (Exception e) {
e.printStackTrace();
}
As I debug, the line is String[] but the value is entire row (one element) instead of many strings (many cells). How can I got the array of cells?
The CSV file is no special. Ex:
a\t b\t 1\t 2
x\t y\t 3\t 4
The issue as I see in this code .map(line -> {return new ArrayList<>(Arrays.asList(line));}) that the line contains one string value that is the whole line (with tab, space, ...) instead of many strings (each string is the value of each cell).
The whole result I want is List<List<String>> (List of lines). Each line is List<String> (list of cells). The current result is list of lines (rows), each line/row is the whole string.

since the file is CSV, you don't need to use any external lib, so simply you have to read the file as you read a txt file like this
Scanner scanner=new Scanner(new File("MyFile.csv"));
while(scanner.hasNextLine()){
myArray=scanner.nextLine().split(",");
}

I have found the solution:
return org.simpleflatmapper.csv.CsvParser
.separator('\t') //<-- solution
.stream(in)
.map(Arrays::asList)
.collect(Collectors.toList());
Thanks all!

Related

Java: Processing Stream line by line without forEach?

I am new to Java and trying out Streams for the first time.
I have a large input file where there is a string on each line like:
cart
dumpster
apple
cherry
tank
laptop
...
I'm trying to read the file in as a Stream and doing some analysis on the data. For example, to count all the occurrences of a particular string, I might think to do something like:
Stream<String> lines = Files.lines(Path.of("/path/to/input/file.txt"));
int count = 0;
lines.forEach((line) => {
if (line.equals("tank")) {
count++;
}
});
But, Java doesn't allow mutation of variables within the lambda.
I'm not sure if there's another way to read from the stream line by line. How would I do this properly?
You don't need a variable external to the stream. And if you have a really big file to count, long would be preferred
long tanks = lines
.filter(s -> s.equals("tank"))
.count();
To iterate a stream using a regular loop, you can get an iterator from your stream and use a for-loop:
Iterable<String> iterable = lines::iterator;
for (String line : iterable) {
if (line.equals("tank")) {
++count;
}
}
But in this particular case, you could just use the stream's count method:
int count = (int) lines.filter("tank"::equals).count();
you can read from the file line by line, with stream of each one :
try (Stream<String> lines = Files.lines(Path.of("/path/to/input/file.txt"))) {
list = stream
.filter(line -> !line.startsWith("abc"))
.map(String::toUpperCase)
.collect(Collectors.toList());
} catch (IOException e) {
e.printStackTrace();
}

How can I sort lines in a text file based on one of the comma-separated values?

Each line in my movies.txt file loooks like;
id,title,rating,year,genre (rating is an integer from 1 to 5)
1,The Godfather,5,1972,Drama
2,Pulp Fiction,4,1994,Crime
I want to list the movies sorted by their rating. I was able to sort the ratings but I don't know how to preserve the connection between ratings and lines and I couldn't sort the lines based on ratings.
BufferedReader b = new BufferedReader(new FileReader("movies.txt"));
String line = null;
int[] ratings = null;
int i;
try{
while((line = b.readLine()) != null)
{
String[] data = line.split(",");
int rating = Integer.parseInt(data[2]);
ratings[i] = rating;
i++;
}
b.close();
Arrays.sort(ratings);
}catch(IOException ex){
System.out.println("Error");
}
Is there any way I can do this by using arrays or something else, without creating a class and using a Movie object?
Instead of using only data[2], we store the whole result of each line and sort by index[2] (as we need to leave it as string, the comparator is sorting not as Integer but as String)
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
public class Main {
public static void main(String[] args) throws FileNotFoundException {
BufferedReader b = new BufferedReader(new FileReader("movies.txt"));
String line = null;
List<String[]> lines = new ArrayList();
int i;
try {
while ((line = b.readLine()) != null) {
String[] data = line.split(",");
lines.add(data);
}
b.close();
lines.sort(new CustomComparator());
lines.forEach(o -> System.out.println(Arrays.toString(o)));
} catch (IOException ex) {
System.out.println("Error");
}
}
public static class CustomComparator implements Comparator<String[]> {
#Override
public int compare(String[] o1, String[] o2) {
return o1[2].compareTo(o2[2]);
}
}
}
Create all lines in the list and sort with stream:
List<String> lines = new ArrayList<String>();
lines.add("1,The Godfather,5,1972,Drama");
lines.add("2,Pulp Fiction,4,1994,Crime");
lines.add("2,Pulp Fiction,33,1994,Crime");
lines.add("2,Pulp Fiction,1,1994,Crime");
List<Object> collect = lines.stream().sorted(new Comparator<String>() {
#Override
public int compare(String o1, String o2) {
return Integer.parseInt(o1.split(",")[2]) - Integer.parseInt(o2.split(",")[2]);
}
}).collect(Collectors.toList());
Then put sorted collection to file.
Use a CSV Parser to load the data into a List, then sort the list.
E.g. if using Apache Commons CSV, you can do it like this:
// Load data into memory
List<CSVRecord> records = new ArrayList<>();
try (Reader in = Files.newBufferedReader(Paths.get("movies.txt"));
CSVParser parser = CSVFormat.RFC4180.parse(in)) {
parser.forEach(records::add);
}
// Sort data
records.sort(Comparator.comparingInt(r -> Integer.parseInt(r.get(2))));
// Print result
try (CSVPrinter printer = CSVFormat.RFC4180.printer()) {
printer.printRecords(records);
}
Output
2,Pulp Fiction,4,1994,Crime
1,The Godfather,5,1972,Drama
If you just want to sort the lines by the 3rd element you should first read the lines, then sort them and write them back (that's what I assume you want to do). A naive approach would be to write a comparator that splits each line, parses the 3rd element to an int and compares the values, e.g. like this:
List<String> lines = ... //read, e.g. using Files.readAllLines(pathToFile)
Collections.sort(lines, Comparator.comparing( line -> {
String[] elements = lines.split(","); //split
return Integer.parseInt(elements[2]); //parse and return
}));
This, however, is very inefficient so you might try and use a couple of optimizations:
split the lines into arrays when reading and join them when writing
sort "integer" strings with a little trick: sort by length and then lexically
Example:
List<String[]> splitLines = ... //read and split
Collections.sort(splitLines,
Comparator.comparing( (String[] e)-> e[2].length()) //help the compiler with the lambda parameter type, at least in my tests it couldn't infer the type otherwise
.thenComparing( e -> e[2] ));
splitLines.forEach( elements -> writeToFile(String.join(",", elements));
This could even be done in a single stream:
Files.lines(pathToFile)
.map(line -> line.split(",")) //split each line
.sorted(Comparator.comparing( (String[] e)-> e[2].length()) //sort by length and then by natural order
.thenComparing( e -> e[2] ))
.map( elements -> String.join(",", elements) ) //join back to a single string
.forEach(line -> writeToFile(line)); //write to line
This is based on a couple of assumptions:
all lines have the same format
no title contains a comma or the split is able to handle escaped values
lines don't have leading or trailing whitespace
integers don't have leading zeros
How does the sorting "trick" work?
Basically it first sorts integer strings by order of magnitude. The higher the length the larger the number should be, i.e. "2" is shorter than "10" and thus smaller.
Within the same order of magnitude you'd then sort normally taking the order of digits into account. Thus "100" is smaller than "123" etc.
Final notes:
It would still be better to actually convert lines into Movie elements, especially if you have more complex requirements or data.
Use a proper CSV parser instead of regex and basic string operations.

Java File Parsing - Go word by word

I have a file content as Follows:
Sample.txt:
Hi my name is john
and I am an engineer. How are you
The output I want is an arrayList of string like [Hi,my,name,is,john,and,I,am,an,engineer,.,How,are,you]
The standard java function parses it as line and I would get an array containing the lines. I am confused as to which approach I should use to get the following output.
Any help is appretiated.
.nextLine() will get one whole line but .next() will go word by word
You could check out using the Scanner class with the .next() method.
This will read the file and collect all words into a list of strings.
Edit: Updated so as to handle punctuation and the likes as distinct words:
try {
List<String> words = Files.lines(Paths.get("/path/to/sample.txt"))
.map(line -> line.split("\\b"))
.flatMap(Arrays::stream)
.filter(w -> !w.trim().isEmpty())
.collect(Collectors.toList());
return words;
} catch (IOException e) {
// handle error
}
If you are getting the strings as whole lines, but just want the words, you could use .split(" ") on the words, as this would return an array containing individual words with no spaces. If you want to do this within the file reading, you could use something like the following...
public ArrayList<String> readWords(File file) throws IOException {
ArrayList<String> words = new ArrayList<String>();
String cLine = "";
BufferedReader reader = new BufferedReader(new FileReader(file));
while ((cLine = reader.readLine()) != null) {
for (String word : cLine.split(" ")) {words.add(word);}
}
reader.close();
return words;
}
which would return an ArrayList<String> containing all of the individual words in the file.
Hope this helps.

Grouping of words from a text file to Arraylist on the basis of length

public class JavaApplication13 {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
BufferedReader br;
String strLine;
ArrayList<String> arr =new ArrayList<>();
HashMap<Integer,ArrayList<String>> hm = new HashMap<>();
try {
br = new BufferedReader( new FileReader("words.txt"));
while( (strLine = br.readLine()) != null){
arr.add(strLine);
}
} catch (FileNotFoundException e) {
System.err.println("Unable to find the file: fileName");
} catch (IOException e) {
System.err.println("Unable to read the file: fileName");
}
ArrayList<Integer> lengths = new ArrayList<>(); //List to keep lengths information
System.out.println("Total Words: "+arr.size()); //Total waords read from file
int i=0;
while(i<arr.size()) //this loop will itrate our all the words of text file that are now stored in words.txt
{
boolean already=false;
String s = arr.get(i);
//following for loop will check if that length is already in lengths list.
for(int x=0;x<lengths.size();x++)
{
if(s.length()==lengths.get(x))
already=true;
}
//already = true means file is that we have an arrayist of the current string length in our map
if(already==true)
{
hm.get(s.length()).add(s); //adding that string according to its length in hm(hashmap)
}
else
{
hm.put(s.length(),new ArrayList<>()); //create a new element in hm and the adding the new length string
hm.get(s.length()).add(s);
lengths.add(s.length());
}
i++;
}
//Now Print the whole map
for(int q=0;q<hm.size();q++)
{
System.out.println(hm.get(q));
}
}
}
is this approach is right?
Explanation:
load all the words to an ArrayList.
then iterate through each index and check the length of word add it to an ArrayList of strings containing that length where these ArrayList are mapped in a hashmap with length of words it is containing.
Firstly, your code is working only for the files which contain one word by line as you're processing whole lines as words. To make your code more universal you have to process each line by splitting it to words:
String[] words = strLine.split("\\s+")
Secondly, you don't need any temporary data structures. You can add your words to the map right after you read the line from file. arr and lengths lists are actually useless here as they do not contain any logic except temporary storing. You're using lengths list just to store the lengths which has already been added to the hm map. The same can be reached by invoking hm.containsKey(s.length()).
And an additional comment on your code:
for(int x=0;x<lengths.size();x++) {
if(s.length()==lengths.get(x))
already=true;
}
when you have a loop like this when you only need to find if some condition is true for any element you don't need to proceed looping when the condition is already found. You should use a break keyword inside your if statement to terminate the loop block, e.g.
for(int x=0;x<lengths.size();x++) {
if(s.length()==lengths.get(x))
already=true;
break; // this will terminate the loop after setting the flag to true
}
But as I already mentioned you don't need it at all. That is just for educational purposes.
Your approach is long, confusing, hard to debug and from what I see it's not good performance-wise (check out the contains method). Check this:
String[] words = {"a", "ab", "ad", "abc", "af", "b", "dsadsa", "c", "ghh", "po"};
Map<Integer, List<String>> groupByLength =
Arrays.stream(words).collect(Collectors.groupingBy(String::length));
System.out.println(groupByLength);
This is just an example, but you get the point. I have an array of words, and then I use streams and Java8 magic to group them in a map by length (exactly what you're trying to do). You get the stream, then collect it to a map, grouping by length of the words, so it's gonna put every 1 letter word in a list under key 1 etc.
You can use the same approach, but you have your words in a list so remember to not use Arrays.stream() but just .stream() on your list.

How to only get the lines you want from an arraylist depending on how they start, IN JAVA

I have a very long string containing GPS data but this is not important. What I need to do is separate the string which is in an arraylist (one big string) into multiple pieces.
The tricky part is that the string is made up of multiple 'gps sentances' and I only require two types of these sentences.
The types I need start with $GPSGSV and $GPSGGA. Basically I need to dump ONLY THESE sentences into another arraylist while leaving all the rest behind.
The new arraylist must be in line-by-line form so that each sentence is followed by a new line.
Each sentence also ends in one white space which could be helpful when splitting up. The arraylist data is shown below. - This is printed from the arraylist.
[$GPGSA,A,3,28,09,26,15,08,05,21,24,07,,,,1.6,1.0,1.3*3A,
$GPRMC,151018.000,A,5225.9627,N,00401.1624,W,0.11,104.71,210214,,*14,
$GPGGA,151019.000,5225.9627,N,00401.1624,W,1,09,1.0,38.9,M,51.1,M,,0000*72,
$GPGSA,A,3,28,09,26,15,08,05,21,24,07,,,,1.6,1.0,1.3*3A,
$GPGSV,3,1,12,26,80,302,44,09,55,063,40,05,53,191,39,08,51,059,37*79,
$GPGSV,3,2,12,28,43,112,34,15,40,284,42,21,18,305,33,07,18,057,27*7E,
$GPGSV,3,3,12,10,05,153,,24,05,234,38,18,05,318,22,19,05,035,*79,
$GPRMC,151019.000,A,5225.9627,N,00401.1624,W,0.10,105.97,210214,,*1D,
$GPGGA,151020.000,5225.9627,N,00401.1624,W,1,09,1.0,38.9,M,51.1,M,,0000*78,
$GPGSA,A,3,28,09,26,15,08,05,21,24,07,,,,1.6,1.0,1.3*3A,
$GPRMC,151020.000,A,5225.9627,N,00401.1624,W,0.12,105.18,210214,,*12,
$GPGSA,A,3,28,09,26,15,08,05,21,24,07,,,,1.6,1.0,1.3*3A,
$GPRMC,151021.000,A,5225.9626,N,00401.1624,W,0.11,99.26,210214,,*28,
$GPGGA,151022.000,5225.9626,N,00401.1623,W,1,09,1.0,38.9,M,51.1,M,,0000*7C,
$GPGSA,A,3,28,09,26,15,08,05,21,24,07,,,,1.6,1.0,1.3*3A,
$GPRMC,151022.000,A,5225.9626,N,00401.1623,W,0.11,109.69,210214,,*1F,
The data continues up to 2000 sentences.
Any help would be great. Thanks
EDITS ------
Looking back at what I have.. It may be best if I just read in the lines (as the file is formatted to be one sentence per line) which start with either the GSV or the GGA tag. In the buffered reader section of the method, how could I go about doing that? Here is some of my code ....
try {
File gpsioFile = new File(gpsFile);
FileReader file = new FileReader(gpsFile);
BufferedReader buffer = new BufferedReader(file);
StringBuffer stringbuff = new StringBuffer();
String ans;
while ((ans = buffer.readLine()) != null) {
gps.add(ans);
stringbuff.append(ans);
stringbuff.append("\n");
}
} catch (Exception e) {
e.printStackTrace();
}
From this could I get an Arraylist with just the GGA and GSV sentences/lines but in the same order that they were from the file?
Thanks
OK, I'd first start by splitting your string into individual lines with spilt():
String[] split = "$GPGSA,A,3,28,09,26,15,08,05,21,24,07,,,,1.6,1.0,1.3*3A,".split(",");
you can also use "\n" as a split delimiter instead of ",". This will give you an array over which you can iterate.
List<String> filtered = new ArrayList<String>()
for (String item, split) {
if (item.startsWith("$GPGSA")) {
filtered.add(item);
}
}
filtered would be a new Array with the items you want to keep.
This approach works with JDK 6+. In JDK 8, this kind of problem can be solved more elegantly with the stream API.
My understanding is that you've got an ArrayList with a single String element. That String is a comma separated list of values. So step one is to extract the string and split it into it's constituent parts. Once you've done that you can process the each item in turn.
private static List<List<String>> splitData(final ArrayList<String> data) {
final List<List<String>> filteredData = new ArrayList<List<String>>();
String fullText = data.get(0);
String[] splitData = fullText.split(",");
List<String> currentList = null;
for (int i = 0;i < splitData.length; i++) {
final String next = splitData[i];
if (startTags.contains(next)) {
if (interestingStartTags.contains(next)) {
currentList = new ArrayList<String>();
filteredData.add(currentList);
} else {
currentList = null;
}
}
if (currentList != null) {
currentList.add(next);
}
}
return filteredData;
}
The two static Set<String> provide the set of all 'gps sentence' start tags and also the set of ones you're interested in. The split data method uses startTags to determine if it has reached the start of a new sentence. If the new tag is also interesting, then a new list is created and added to the List<List<String>>. It is this list of lists that is returned.
If you don't know all of the strings you want to use as 'startTag' then you could next.startsWith("$GP") or similar.
Reading the file
Looking at the updated question of how to read the file you could remove the StringBuffer and instead simply add each line you read to an ArrayList. The code below will step over any lines that do not start with the two tags you are interested in. The order of the lines within lineList will match the order they are found in the file.
FileReader file = new FileReader(gpsFile);
BufferedReader buffer = new BufferedReader(file);
String ans;
ArrayList<String> lineList = new ArrayList<String>();
while ((ans = buffer.readLine()) != null) {
if (ans.startsWith("$GPSGSV")||ans.startsWith("$GPSGGA")) {
lineList.add(ans);
}
}

Categories