How can I use Java 8 Streams with an InputStream? - java

I would like to wrap a java.util.streams.Stream around an InputStream to process one Byte or one Character at a time. I didn't find any simple way of doing this.
Consider the following exercise: We wish to count the number of times each letter appears in a text file. We can store this in an array so that tally[0] will store the number of times a appears in the file, tally[1] stores the number of time b appears and so on. Since I couldn't find a way of streaming the file directly, I did this:
int[] tally = new int[26];
Stream<String> lines = Files.lines(Path.get(aFile)).map(s -> s.toLowerCase());
Consumer<String> charCount = new Consumer<String>() {
public void accept(String t) {
for(int i=0; i<t.length(); i++)
if(Character.isLetter(t.charAt(i) )
tall[t.charAt(i) - 'a' ]++;
}
};
lines.forEach(charCount);
Is there a way of accomplishing this without using the lines method? Can I just process each character directly as a Stream or Stream instead of creating Strings for each line in the text file.
Can I more direcly convert java.io.InputStream into java.util.Stream.stream ?

First, you have to redefine your task. You are reading characters, hence you do not want to convert an InputStream but a Reader into a Stream.
You can’t re-implement the charset conversion that happens, e.g. in an InputStreamReader, with Stream operations as there can be n:m mappings between the bytes of the InputStream and the resulting chars.
Creating a stream out of a Reader is a bit tricky. You will need an iterator to specify a method for getting an item and an end condition:
PrimitiveIterator.OfInt it=new PrimitiveIterator.OfInt() {
int last=-2;
public int nextInt() {
if(last==-2 && !hasNext())
throw new NoSuchElementException();
try { return last; } finally { last=-2; }
}
public boolean hasNext() {
if(last==-2)
try { last=reader.read(); }
catch(IOException ex) { throw new UncheckedIOException(ex); }
return last>=0;
}
};
Once you have the iterator you can create a stream using the detour of a spliterator and perform your desired operation:
int[] tally = new int[26];
StreamSupport.intStream(Spliterators.spliteratorUnknownSize(
it, Spliterator.ORDERED | Spliterator.IMMUTABLE | Spliterator.NONNULL), false)
// now you have your stream and you can operate on it:
.map(Character::toLowerCase)
.filter(c -> c>='a'&&c<='z')
.map(c -> c-'a')
.forEach(i -> tally[i]++);
Note that while iterators are more familiar, implementing the new Spliterator interface directly simplifies the operation as it doesn’t require to maintain state between two methods that could be called in arbitrary order. Instead, we have just one tryAdvance method which can be mapped directly to a read() call:
Spliterator.OfInt sp = new Spliterators.AbstractIntSpliterator(1000L,
Spliterator.ORDERED | Spliterator.IMMUTABLE | Spliterator.NONNULL) {
public boolean tryAdvance(IntConsumer action) {
int ch;
try { ch=reader.read(); }
catch(IOException ex) { throw new UncheckedIOException(ex); }
if(ch<0) return false;
action.accept(ch);
return true;
}
};
StreamSupport.intStream(sp, false)
// now you have your stream and you can operate on it:
…
However, note that if you change your mind and are willing to use Files.lines you can have a much easier life:
int[] tally = new int[26];
Files.lines(Paths.get(file))
.flatMapToInt(CharSequence::chars)
.map(Character::toLowerCase)
.filter(c -> c>='a'&&c<='z')
.map(c -> c-'a')
.forEach(i -> tally[i]++);

Related

Java: Processing Stream line by line without forEach?

I am new to Java and trying out Streams for the first time.
I have a large input file where there is a string on each line like:
cart
dumpster
apple
cherry
tank
laptop
...
I'm trying to read the file in as a Stream and doing some analysis on the data. For example, to count all the occurrences of a particular string, I might think to do something like:
Stream<String> lines = Files.lines(Path.of("/path/to/input/file.txt"));
int count = 0;
lines.forEach((line) => {
if (line.equals("tank")) {
count++;
}
});
But, Java doesn't allow mutation of variables within the lambda.
I'm not sure if there's another way to read from the stream line by line. How would I do this properly?
You don't need a variable external to the stream. And if you have a really big file to count, long would be preferred
long tanks = lines
.filter(s -> s.equals("tank"))
.count();
To iterate a stream using a regular loop, you can get an iterator from your stream and use a for-loop:
Iterable<String> iterable = lines::iterator;
for (String line : iterable) {
if (line.equals("tank")) {
++count;
}
}
But in this particular case, you could just use the stream's count method:
int count = (int) lines.filter("tank"::equals).count();
you can read from the file line by line, with stream of each one :
try (Stream<String> lines = Files.lines(Path.of("/path/to/input/file.txt"))) {
list = stream
.filter(line -> !line.startsWith("abc"))
.map(String::toUpperCase)
.collect(Collectors.toList());
} catch (IOException e) {
e.printStackTrace();
}

How can I sort lines in a text file based on one of the comma-separated values?

Each line in my movies.txt file loooks like;
id,title,rating,year,genre (rating is an integer from 1 to 5)
1,The Godfather,5,1972,Drama
2,Pulp Fiction,4,1994,Crime
I want to list the movies sorted by their rating. I was able to sort the ratings but I don't know how to preserve the connection between ratings and lines and I couldn't sort the lines based on ratings.
BufferedReader b = new BufferedReader(new FileReader("movies.txt"));
String line = null;
int[] ratings = null;
int i;
try{
while((line = b.readLine()) != null)
{
String[] data = line.split(",");
int rating = Integer.parseInt(data[2]);
ratings[i] = rating;
i++;
}
b.close();
Arrays.sort(ratings);
}catch(IOException ex){
System.out.println("Error");
}
Is there any way I can do this by using arrays or something else, without creating a class and using a Movie object?
Instead of using only data[2], we store the whole result of each line and sort by index[2] (as we need to leave it as string, the comparator is sorting not as Integer but as String)
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
public class Main {
public static void main(String[] args) throws FileNotFoundException {
BufferedReader b = new BufferedReader(new FileReader("movies.txt"));
String line = null;
List<String[]> lines = new ArrayList();
int i;
try {
while ((line = b.readLine()) != null) {
String[] data = line.split(",");
lines.add(data);
}
b.close();
lines.sort(new CustomComparator());
lines.forEach(o -> System.out.println(Arrays.toString(o)));
} catch (IOException ex) {
System.out.println("Error");
}
}
public static class CustomComparator implements Comparator<String[]> {
#Override
public int compare(String[] o1, String[] o2) {
return o1[2].compareTo(o2[2]);
}
}
}
Create all lines in the list and sort with stream:
List<String> lines = new ArrayList<String>();
lines.add("1,The Godfather,5,1972,Drama");
lines.add("2,Pulp Fiction,4,1994,Crime");
lines.add("2,Pulp Fiction,33,1994,Crime");
lines.add("2,Pulp Fiction,1,1994,Crime");
List<Object> collect = lines.stream().sorted(new Comparator<String>() {
#Override
public int compare(String o1, String o2) {
return Integer.parseInt(o1.split(",")[2]) - Integer.parseInt(o2.split(",")[2]);
}
}).collect(Collectors.toList());
Then put sorted collection to file.
Use a CSV Parser to load the data into a List, then sort the list.
E.g. if using Apache Commons CSV, you can do it like this:
// Load data into memory
List<CSVRecord> records = new ArrayList<>();
try (Reader in = Files.newBufferedReader(Paths.get("movies.txt"));
CSVParser parser = CSVFormat.RFC4180.parse(in)) {
parser.forEach(records::add);
}
// Sort data
records.sort(Comparator.comparingInt(r -> Integer.parseInt(r.get(2))));
// Print result
try (CSVPrinter printer = CSVFormat.RFC4180.printer()) {
printer.printRecords(records);
}
Output
2,Pulp Fiction,4,1994,Crime
1,The Godfather,5,1972,Drama
If you just want to sort the lines by the 3rd element you should first read the lines, then sort them and write them back (that's what I assume you want to do). A naive approach would be to write a comparator that splits each line, parses the 3rd element to an int and compares the values, e.g. like this:
List<String> lines = ... //read, e.g. using Files.readAllLines(pathToFile)
Collections.sort(lines, Comparator.comparing( line -> {
String[] elements = lines.split(","); //split
return Integer.parseInt(elements[2]); //parse and return
}));
This, however, is very inefficient so you might try and use a couple of optimizations:
split the lines into arrays when reading and join them when writing
sort "integer" strings with a little trick: sort by length and then lexically
Example:
List<String[]> splitLines = ... //read and split
Collections.sort(splitLines,
Comparator.comparing( (String[] e)-> e[2].length()) //help the compiler with the lambda parameter type, at least in my tests it couldn't infer the type otherwise
.thenComparing( e -> e[2] ));
splitLines.forEach( elements -> writeToFile(String.join(",", elements));
This could even be done in a single stream:
Files.lines(pathToFile)
.map(line -> line.split(",")) //split each line
.sorted(Comparator.comparing( (String[] e)-> e[2].length()) //sort by length and then by natural order
.thenComparing( e -> e[2] ))
.map( elements -> String.join(",", elements) ) //join back to a single string
.forEach(line -> writeToFile(line)); //write to line
This is based on a couple of assumptions:
all lines have the same format
no title contains a comma or the split is able to handle escaped values
lines don't have leading or trailing whitespace
integers don't have leading zeros
How does the sorting "trick" work?
Basically it first sorts integer strings by order of magnitude. The higher the length the larger the number should be, i.e. "2" is shorter than "10" and thus smaller.
Within the same order of magnitude you'd then sort normally taking the order of digits into account. Thus "100" is smaller than "123" etc.
Final notes:
It would still be better to actually convert lines into Movie elements, especially if you have more complex requirements or data.
Use a proper CSV parser instead of regex and basic string operations.

Process string using Java streams

I am in need of some guidance. I am not sure how to go about reading in the sample text file into an array of objects using Java Streams. Does stream provide a functionality to correctly output a position of the character from a string that it reads from the file?
I am reading a file using Java I/O and then passing the content as string to this function to create array of Squares....
Can this creation of Array of Objects be done using Java 8 Stream ? If so How please. Thank you.
Using java streams you could do something like this:
AtomicInteger row = new AtomicInteger(-1);
// count specific characters with this:
AtomicInteger someCount = new AtomicInteger();
try (Stream<String> stringStream = Files.lines(Paths.get("yourFile.txt"))) { // read all lines from file into a stream of strings
// This Function makes an array of Square objects of each line
Function<String, Square[]> mapper = (s) -> {
AtomicInteger col = new AtomicInteger();
row.incrementAndGet();
return s.chars()
.mapToObj(i -> {
// increment counter if the char fulfills condition
if((char)i == 'M')
someCount.incrementAndGet();
return new Square(row.get(), col.getAndIncrement(), (char)i);
})
.toArray(i -> new Square[s.length()]);
};
// Now streaming all lines using the mapper function from above you can collect them into a List<Square[]> and convert this List into an Array of Square objects
Square[][] squares = stringStream
.map(mapper)
.collect(Collectors.toList()).toArray(new Square[0][]);
}
Answering your second question: If you have an array of Square[] and want to find the first Square with val == 'M' you could do this:
Optional<Square> optSquare = Stream.of(squares).flatMap(Stream::of).filter(s -> s.getVal() == 'M').findFirst();
// mySquare will be null if no Square was matching condition
Square mySquare = optSquare.orElse(null);

Java - Basic streams usage of forEach

I have a class called Data which has only one method:
public boolean isValid()
I have a Listof Dataand I want to loop through them via a Java 8 stream. I need to count how many valid Data objects there are in this List and print out only the valid entries.
Below is how far I've gotten but I don't understand how.
List<Data> ar = new ArrayList<>();
...
// ar is now full of Data objects.
...
int count = ar.stream()
.filter(Data::isValid)
.forEach(System.out::println)
.count(); // Compiler error, forEach() does not return type stream.
My second attempt: (horrible code)
List<Data> ar = new ArrayList<>();
...
// Must be final or compiler error will happen via inner class.
final AtomicInteger counter = new AtomicInteger();
ar.stream()
.filter(Data:isValid)
.forEach(d ->
{
System.out.println(d);
counter.incrementAndGet();
};
System.out.printf("There are %d/%d valid Data objects.%n", counter.get(), ar.size());
If you don’t need the original ArrayList, containing a mixture of valid and invalid objects, later-on, you might simply perform a Collection operation instead of the Stream operation:
ar.removeIf(d -> !d.isValid());
ar.forEach(System.out::println);
int count = ar.size();
Otherwise, you can implement it like
List<Data> valid = ar.stream().filter(Data::isValid).collect(Collectors.toList());
valid.forEach(System.out::println);
int count = valid.size();
Having a storage for something you need multiple times is not so bad. If the list is really large, you can reduce the storage memory by (typically) factor 32, using
BitSet valid = IntStream.range(0, ar.size())
.filter(index -> ar.get(index).isValid())
.collect(BitSet::new, BitSet::set, BitSet::or);
valid.stream().mapToObj(ar::get).forEach(System.out::println);
int count = valid.cardinality();
Though, of course, you can also use
int count = 0;
for(Data d: ar) {
if(d.isValid()) {
System.out.println(d);
count++;
}
}
Peek is similar to foreach, except that it lets you continue the stream.
ar.stream().filter(Data::isValid)
.peek(System.out::println)
.count();

Limit a stream and find out if there are pending elements

I have the following code that I want to translate to Java 8 streams:
public ReleaseResult releaseReources() {
List<String> releasedNames = new ArrayList<>();
Stream<SomeResource> stream = this.someResources();
Iterator<SomeResource> it = stream.iterator();
while (it.hasNext() && releasedNames.size() < MAX_TO_RELEASE) {
SomeResource resource = it.next();
if (!resource.isTaken()) {
resource.release();
releasedNames.add(resource.getName());
}
}
return new ReleaseResult(releasedNames, it.hasNext(), MAX_TO_RELEASE);
}
Method someResources() returns a Stream<SomeResource> and ReleaseResult class is as follows:
public class ReleaseResult {
private int releasedCount;
private List<String> releasedNames;
private boolean hasMoreItems;
private int releaseLimit;
public ReleaseResult(List<String> releasedNames,
boolean hasMoreItems, int releaseLimit) {
this.releasedNames = releasedNames;
this.releasedCount = releasedNames.size();
this.hasMoreItems = hasMoreItems;
this.releaseLimit = releaseLimit;
}
// getters & setters
}
My attempt so far:
public ReleaseResult releaseReources() {
List<String> releasedNames = this.someResources()
.filter(resource -> !resource.isTaken())
.limit(MAX_TO_RELEASE)
.peek(SomeResource::release)
.map(SomeResource::getName)
.collect(Collectors.toList());
return new ReleasedResult(releasedNames, ???, MAX_TO_RELEASE);
}
The problem is that I can't find a way to know if there are pending resources to process. I've thought of using releasedNames.size() == MAX_TO_RELEASE, but this doesn't take into account the case where the stream of resources has exactly MAX_TO_RELEASE elements.
Is there a way to do the same with Java 8 streams?
Note: I'm not looking for answers like "you don't have to do everything with streams" or "using loops and iterators is fine". I'm OK if using an iterator and a loop is the only way or just the best way. It's just that I'd like to know if there's a non-murky way to do the same.
Since you don’t wanna hear that you don’t need streams for everything and loops and iterators are fine, let’s demonstrate it by showing a clean solution, not relying on peek:
public ReleaseResult releaseReources() {
return this.someResources()
.filter(resource -> !resource.isTaken())
.limit(MAX_TO_RELEASE+1)
.collect(
() -> new ReleaseResult(new ArrayList<>(), false, MAX_TO_RELEASE),
(result, resource) -> {
List<String> names = result.getReleasedNames();
if(names.size() == MAX_TO_RELEASE) result.setHasMoreItems(true);
else {
resource.release();
names.add(resource.getName());
}
},
(r1, r2) -> {
List<String> names = r1.getReleasedNames();
names.addAll(r2.getReleasedNames());
if(names.size() > MAX_TO_RELEASE) {
r1.setHasMoreItems(true);
names.remove(MAX_TO_RELEASE);
}
}
);
}
This assumes that // getters & setters includes getters and setters for all non-final fields of your ReleaseResult. And that getReleasedNames() returns the list by reference. Otherwise you would have to rewrite it to provide a specialized Collector having special non-public access to ReleaseResult (implementing another builder type or temporary storage would be an unnecessary complication, it looks like ReleaseResult is already designed exactly for that use case).
We could conclude that for any nontrivial loop code that doesn’t fit into the stream’s intrinsic operations, you can find a collector solution that basically does the same as the loop in its accumulator function, but suffers from the requirement of always having to provide a combiner function. Ok, in this case we can prepend a filter(…).limit(…) so it’s not that bad…
I just noticed, if you ever dare to use that with a parallel stream, you need a way to reverse the effect of releasing the last element in the combiner in case the combined size exceeds MAX_TO_RELEASE. Generally, limits and parallel processing never play well.
I don't think there's a nice way to do this. I've found a hack that does it lazily. What you can do is convert the Stream to an Iterator, convert the Iterator back to another Stream, do the Stream operations, then finally test the Iterator for a next element!
Iterator<SomeResource> it = this.someResource().iterator();
List<String> list = StreamSupport.stream(Spliterators.spliteratorUnknownSize(it, Spliterator.ORDERED), false)
.filter(resource -> !resource.isTaken())
.limit(MAX_TO_RELEASE)
.peek(SomeResource::release)
.map(SomeResource::getName)
.collect(Collectors.toList());
return new ReleaseResult(list, it.hasNext(), MAX_TO_RELEASE);
The only thing I can think of is
List<SomeResource> list = someResources(); // A List, rather than a Stream, is required
List<Integer> indices = IntStream.range(0, list.size())
.filter(i -> !list.get(i).isTaken())
.limit(MAX_TO_RELEASE)
.collect(Collectors.toList());
List<String> names = indices.stream()
.map(list::get)
.peek(SomeResource::release)
.map(SomeResource::getName)
.collect(Collectors.toList());
Then (I think) there are unprocessed elements if
names.size() == MAX_TO_RELEASE
&& (indices.isEmpty() || indices.get(indices.size() - 1) < list.size() - 1)

Categories