Process string using Java streams - java

I am in need of some guidance. I am not sure how to go about reading in the sample text file into an array of objects using Java Streams. Does stream provide a functionality to correctly output a position of the character from a string that it reads from the file?
I am reading a file using Java I/O and then passing the content as string to this function to create array of Squares....
Can this creation of Array of Objects be done using Java 8 Stream ? If so How please. Thank you.

Using java streams you could do something like this:
AtomicInteger row = new AtomicInteger(-1);
// count specific characters with this:
AtomicInteger someCount = new AtomicInteger();
try (Stream<String> stringStream = Files.lines(Paths.get("yourFile.txt"))) { // read all lines from file into a stream of strings
// This Function makes an array of Square objects of each line
Function<String, Square[]> mapper = (s) -> {
AtomicInteger col = new AtomicInteger();
row.incrementAndGet();
return s.chars()
.mapToObj(i -> {
// increment counter if the char fulfills condition
if((char)i == 'M')
someCount.incrementAndGet();
return new Square(row.get(), col.getAndIncrement(), (char)i);
})
.toArray(i -> new Square[s.length()]);
};
// Now streaming all lines using the mapper function from above you can collect them into a List<Square[]> and convert this List into an Array of Square objects
Square[][] squares = stringStream
.map(mapper)
.collect(Collectors.toList()).toArray(new Square[0][]);
}
Answering your second question: If you have an array of Square[] and want to find the first Square with val == 'M' you could do this:
Optional<Square> optSquare = Stream.of(squares).flatMap(Stream::of).filter(s -> s.getVal() == 'M').findFirst();
// mySquare will be null if no Square was matching condition
Square mySquare = optSquare.orElse(null);

Related

Java: Processing Stream line by line without forEach?

I am new to Java and trying out Streams for the first time.
I have a large input file where there is a string on each line like:
cart
dumpster
apple
cherry
tank
laptop
...
I'm trying to read the file in as a Stream and doing some analysis on the data. For example, to count all the occurrences of a particular string, I might think to do something like:
Stream<String> lines = Files.lines(Path.of("/path/to/input/file.txt"));
int count = 0;
lines.forEach((line) => {
if (line.equals("tank")) {
count++;
}
});
But, Java doesn't allow mutation of variables within the lambda.
I'm not sure if there's another way to read from the stream line by line. How would I do this properly?
You don't need a variable external to the stream. And if you have a really big file to count, long would be preferred
long tanks = lines
.filter(s -> s.equals("tank"))
.count();
To iterate a stream using a regular loop, you can get an iterator from your stream and use a for-loop:
Iterable<String> iterable = lines::iterator;
for (String line : iterable) {
if (line.equals("tank")) {
++count;
}
}
But in this particular case, you could just use the stream's count method:
int count = (int) lines.filter("tank"::equals).count();
you can read from the file line by line, with stream of each one :
try (Stream<String> lines = Files.lines(Path.of("/path/to/input/file.txt"))) {
list = stream
.filter(line -> !line.startsWith("abc"))
.map(String::toUpperCase)
.collect(Collectors.toList());
} catch (IOException e) {
e.printStackTrace();
}

Read specific columns from a file in Java 8 using streams, and put them in a 2D array

I have an input file that looks like this
#id1 1.2 3.4
#id2 6.8 8.1
#id3 1.5 9.4
#id4 5.9 2.7
I would like to store the numbers only in a 2D array, and forget about the 1st column that contains the #id.
I also want to use streams only for that operation.
So far I made 2 methods :
First method read the input file and store each lines in a List, as an array of string :
private List<String[]> savefromfile(String filePath) throws IOException {
List<String[]> rowsOfFile = new LinkedList<String[]>();
try (Stream<String> lines = Files.lines(Paths.get(filePath))) {
lines.forEach(line -> {
String rows[] = line.trim().split("\\s+");
rowsOfFile.add(rows);
});
lines.close();
}
return rowsOfFile;
The second method take as an input the List, and return a 2D Array that contains only the columns numbers :
private double[][] storeAllID(List<String[]> rowsOfFile) {
int numberOfID = rowsOfFile.size();
double[][] allID = new double[numberOfID][2];
int i = 0;
for (String[] line : rowsOfFile) {
double id[] = new double[2];
id[0] = Double.parseDouble(line[1]);
id[1] = Double.parseDouble(line[2]);
allID[i++] = id;
}
return allID;
}
Is there a way to make this code more efficient ? I want only one, short method that read the input file and return a 2D array containing numbers only.
I don't think it's necessary to write 20 lines of code to do this.
You aren't really gaining any benefit on your use of a stream in savefromfile, since you are using it exactly like it was a plain for-loop. To make the code a bit cleaner, you could get rid of the local variable completely, and also the call to close() is unnecessary as you are using try-with-resources already.
private List<String[]> savefromfile(String filePath) throws IOException {
try (Stream<String> lines = Files.lines(Paths.get(filePath))) {
return lines
.map(line -> line.trim().split("\\s+"))
.collect(Collectors.toCollection(LinkedList::new));
}
}
I don't know why you want to separate the parsing to double[][] into a separate method, as you could do it within your stream with a map:
private double[][] loadFromFile(String filePath) throws IOException {
try (Stream<String> lines = Files.lines(Paths.get(filePath))) {
return lines
.map(line -> line.trim().split("\\s+"))
.map(line -> new double[] {
Double.parseDouble(line[1]),
Double.parseDouble(line[2])
})
.toArray(double[][]::new);
}
}
For performance, you'll just have to measure for yourself if using lower-level data types and loops would be worth the added complexity.

Java - Basic streams usage of forEach

I have a class called Data which has only one method:
public boolean isValid()
I have a Listof Dataand I want to loop through them via a Java 8 stream. I need to count how many valid Data objects there are in this List and print out only the valid entries.
Below is how far I've gotten but I don't understand how.
List<Data> ar = new ArrayList<>();
...
// ar is now full of Data objects.
...
int count = ar.stream()
.filter(Data::isValid)
.forEach(System.out::println)
.count(); // Compiler error, forEach() does not return type stream.
My second attempt: (horrible code)
List<Data> ar = new ArrayList<>();
...
// Must be final or compiler error will happen via inner class.
final AtomicInteger counter = new AtomicInteger();
ar.stream()
.filter(Data:isValid)
.forEach(d ->
{
System.out.println(d);
counter.incrementAndGet();
};
System.out.printf("There are %d/%d valid Data objects.%n", counter.get(), ar.size());
If you don’t need the original ArrayList, containing a mixture of valid and invalid objects, later-on, you might simply perform a Collection operation instead of the Stream operation:
ar.removeIf(d -> !d.isValid());
ar.forEach(System.out::println);
int count = ar.size();
Otherwise, you can implement it like
List<Data> valid = ar.stream().filter(Data::isValid).collect(Collectors.toList());
valid.forEach(System.out::println);
int count = valid.size();
Having a storage for something you need multiple times is not so bad. If the list is really large, you can reduce the storage memory by (typically) factor 32, using
BitSet valid = IntStream.range(0, ar.size())
.filter(index -> ar.get(index).isValid())
.collect(BitSet::new, BitSet::set, BitSet::or);
valid.stream().mapToObj(ar::get).forEach(System.out::println);
int count = valid.cardinality();
Though, of course, you can also use
int count = 0;
for(Data d: ar) {
if(d.isValid()) {
System.out.println(d);
count++;
}
}
Peek is similar to foreach, except that it lets you continue the stream.
ar.stream().filter(Data::isValid)
.peek(System.out::println)
.count();

filtering null values from an RDD<Vector> spark

I have a dataset of doubles in form of JavaRDD. I want to remove the rows(vector) containing null values. I was going to use filter function in order to do that but cannot figure out how to to do it. I am pretty new to spark and mllib and would really appreciate it if you could help me out.This is how my parsed data looks like:
String path = "data.txt";
JavaRDD<String> data = sc.textFile(path);
JavaRDD<Vector> parsedData = data.map(
new Function<String, Vector>() {
public Vector call(String s) {
String[] sarray = s.split(" ");
double[] values = new double[sarray.length];
for (int i = 0; i < sarray.length; i++)
values[i] = Double.parseDouble(sarray[i]);
return Vectors.dense(values);
}
}
);
Checking a vector[i] element against null might put you in the clear?
And then perform an operation similar to vector.remove(n). Where "n" is the element to be removed from the vector.
Vector values = Vectors.dense(new double[vector_length]);
parsedData = parsedData.filter((Vector s) -> {
return !s.equals(Vectors.dense(new double[vector_length]));
});
As mentioned in the comments, RDD vector can't be NULL. However, you might want to get red of empty (Zero) vectors utilizing the filter method. This can be done by creating an empty vector and filtering it out.

How can I use Java 8 Streams with an InputStream?

I would like to wrap a java.util.streams.Stream around an InputStream to process one Byte or one Character at a time. I didn't find any simple way of doing this.
Consider the following exercise: We wish to count the number of times each letter appears in a text file. We can store this in an array so that tally[0] will store the number of times a appears in the file, tally[1] stores the number of time b appears and so on. Since I couldn't find a way of streaming the file directly, I did this:
int[] tally = new int[26];
Stream<String> lines = Files.lines(Path.get(aFile)).map(s -> s.toLowerCase());
Consumer<String> charCount = new Consumer<String>() {
public void accept(String t) {
for(int i=0; i<t.length(); i++)
if(Character.isLetter(t.charAt(i) )
tall[t.charAt(i) - 'a' ]++;
}
};
lines.forEach(charCount);
Is there a way of accomplishing this without using the lines method? Can I just process each character directly as a Stream or Stream instead of creating Strings for each line in the text file.
Can I more direcly convert java.io.InputStream into java.util.Stream.stream ?
First, you have to redefine your task. You are reading characters, hence you do not want to convert an InputStream but a Reader into a Stream.
You can’t re-implement the charset conversion that happens, e.g. in an InputStreamReader, with Stream operations as there can be n:m mappings between the bytes of the InputStream and the resulting chars.
Creating a stream out of a Reader is a bit tricky. You will need an iterator to specify a method for getting an item and an end condition:
PrimitiveIterator.OfInt it=new PrimitiveIterator.OfInt() {
int last=-2;
public int nextInt() {
if(last==-2 && !hasNext())
throw new NoSuchElementException();
try { return last; } finally { last=-2; }
}
public boolean hasNext() {
if(last==-2)
try { last=reader.read(); }
catch(IOException ex) { throw new UncheckedIOException(ex); }
return last>=0;
}
};
Once you have the iterator you can create a stream using the detour of a spliterator and perform your desired operation:
int[] tally = new int[26];
StreamSupport.intStream(Spliterators.spliteratorUnknownSize(
it, Spliterator.ORDERED | Spliterator.IMMUTABLE | Spliterator.NONNULL), false)
// now you have your stream and you can operate on it:
.map(Character::toLowerCase)
.filter(c -> c>='a'&&c<='z')
.map(c -> c-'a')
.forEach(i -> tally[i]++);
Note that while iterators are more familiar, implementing the new Spliterator interface directly simplifies the operation as it doesn’t require to maintain state between two methods that could be called in arbitrary order. Instead, we have just one tryAdvance method which can be mapped directly to a read() call:
Spliterator.OfInt sp = new Spliterators.AbstractIntSpliterator(1000L,
Spliterator.ORDERED | Spliterator.IMMUTABLE | Spliterator.NONNULL) {
public boolean tryAdvance(IntConsumer action) {
int ch;
try { ch=reader.read(); }
catch(IOException ex) { throw new UncheckedIOException(ex); }
if(ch<0) return false;
action.accept(ch);
return true;
}
};
StreamSupport.intStream(sp, false)
// now you have your stream and you can operate on it:
…
However, note that if you change your mind and are willing to use Files.lines you can have a much easier life:
int[] tally = new int[26];
Files.lines(Paths.get(file))
.flatMapToInt(CharSequence::chars)
.map(Character::toLowerCase)
.filter(c -> c>='a'&&c<='z')
.map(c -> c-'a')
.forEach(i -> tally[i]++);

Categories