Recursive Streaming API - java

I'm currently working on a small graph theory algorithm, which uses a recursive Depth-First Search.
Since it's recursive, I'm asking myself, if I should use the Stream API to perform such a task or use Iterators and for each Loops.
This is my code:
private void processNext(Node node) {
//METHOD A
for (Node neighbour : node) {
if (!connectedNodes.contains(neighbour)) {
connectedNodes.add(neighbour);
processNext(neighbour);
}
}
//OR METHOD B
node.getNodes().stream().filter(not(connectedNodes::contains)).forEach(e -> {
connectedNodes.add(e);
processNext(e);
});
//OR METHOD C
node.getNodes().stream().forEach(e -> {
if (!connectedNodes.contains(e)) {
connectedNodes.add(e);
processNext(e);
}
});
}
Method A and C will work 100% intended, but I'm not sure about B...
Does the filter method in the streaming API filter non matching objects out before foreach or while foreach? (Do B and C the exact same thing?)
And which Method will be the fastest?
Any help is apreciated!

OK Method B and C work exactly the same!
Not sure if the iterator way is faster, but since B requires less space, I'm going for that one!

Related

Simple data stream: Go being super slow compared to Java

As a Java dev, I'm currently looking at Go because I think it's an interesting language.
To start with it, I decided to take a simple Java project I wrote months ago, and re-write it in Go to compare performances and (mainly, actually) compare the code readability/complexity.
The Java code sample is the following:
public static void main(String[] args) {
long start = System.currentTimeMillis();
Stream<Container> s = Stream.from(new Iterator<Container>() {
int i = 0;
#Override
public boolean hasNext() {
return i < 10000000;
}
#Override
public Container next() {
return new Container(i++);
}
});
s = s.map((Container _source) -> new Container(_source.value * 2));
int j = 0;
while (s.hasNext()) {
s.next();
j++;
}
System.out.println(System.currentTimeMillis() - start);
System.out.println("j:" + j);
}
public static class Container {
int value;
public Container(int v) {
value = v;
}
}
Where the map function is:
return new Stream<R>() {
#Override
public boolean hasNext() {
return Stream.this.hasNext();
}
#Override
public R next() {
return _f.apply(Stream.this.next());
}
};
And the Stream class is just an extension to java.util.Iterator to add custom methods to it. Other methods than map differs from standard Java Stream API.
Anyway, to reproduce this, I wrote the following Go code:
package main
import (
"fmt"
)
type Iterator interface {
HasNext() bool
Next() interface{}
}
type Stream interface {
HasNext() bool
Next() interface{}
Map(transformer func(interface{}) interface{}) Stream
}
///////////////////////////////////////
type incremetingIterator struct {
i int
}
type SampleEntry struct {
value int
}
func (s *SampleEntry) Value() int {
return s.value
}
func (s *incremetingIterator) HasNext() bool {
return s.i < 10000000
}
func (s *incremetingIterator) Next() interface{} {
s.i = s.i + 1
return &SampleEntry{
value: s.i,
}
}
func CreateIterator() Iterator {
return &incremetingIterator{
i: 0,
}
}
///////////////////////////////////////
type stream struct {
source Iterator
}
func (s *stream) HasNext() bool {
return s.source.HasNext()
}
func (s *stream) Next() interface{} {
return s.source.Next()
}
func (s *stream) Map(tr func(interface{}) interface{}) Stream {
return &stream{
source: &mapIterator{
source: s,
transformer: tr,
},
}
}
func FromIterator(it Iterator) Stream {
return &stream{
source: it,
}
}
///////////////////////////////////////
type mapIterator struct {
source Iterator
transformer func(interface{}) interface{}
}
func (s *mapIterator) HasNext() bool {
return s.source.HasNext()
}
func (s *mapIterator) Next() interface{} {
return s.transformer(s.source.Next())
}
///////////////////////////////////////
func main() {
it := CreateIterator()
ss := FromIterator(it)
ss = ss.Map(func(in interface{}) interface{} {
return &SampleEntry{
value: 2 * in.(*SampleEntry).value,
}
})
fmt.Println("Start")
for ss.HasNext() {
ss.Next()
}
fmt.Println("Over")
}
Both producing the same result but when Java takes about 20ms, Go takes 1050ms (with 10M items, test ran several times).
I'm very new to Go (started couple of hours ago) so please be indulgent if I did something really bad :-)
Thank you!
The other answer changed the original task quite "dramatically", and reverted to a simple loop. I consider it to be different code, and as such, it cannot be used to compare execution times (that loop could be written in Java as well, which would give smaller execution time).
Now let's try to keep the "streaming manner" of the problem at hand.
Note beforehand:
One thing to note beforehand. In Java, the granularity of System.currentTimeMillis() could be around 10 ms (!!) which is in the same order of magnitude of the result! This means the error rate could be huge in Java's 20 ms! So instead you should use System.nanoTime() to measure code execution times! For details, see Measuring time differences using System.currentTimeMillis().
Also this is not the correct way to measure execution times, as running things for the first time might run several times slower. For details, see Order of the code and performance.
Genesis
Your original Go proposal runs on my computer roughly for 1.1 seconds, which is about the same as yours.
Removing interface{} item type
Go doesn't have generics, trying to mimic this behavior with interface{} is not the same and have serious performance impact if the value you want to work with is a primitive type (e.g. int) or some simple structs (like the Go equivalent of your Java Container type). See: The Laws of Reflection #The representation of an interface. Wrapping an int (or any other concrete type) in an interface requires creating a (type;value) pair holding the dynamic type and value to be wrapped (creation of this pair also involves copying the value being wrapped; see an analysis of this in the answer How can a slice contain itself?). Moreover when you want to access the value, you have to use a type assertion which is a runtime check, so the compiler can't be of any help optimizing that (and the check will add to the code execution time)!
So let's not use interface{} for our items, but instead use a concrete type for our case:
type Container struct {
value int
}
We will use this in the iterator's and stream's next method: Next() Container, and in the mapper function:
type Mapper func(Container) Container
Also we may utilize embedding, as the method set of Iterator is a subset of that of Stream.
Without further ado, here is the complete, runnable example:
package main
import (
"fmt"
"time"
)
type Container struct {
value int
}
type Iterator interface {
HasNext() bool
Next() Container
}
type incIter struct {
i int
}
func (it *incIter) HasNext() bool {
return it.i < 10000000
}
func (it *incIter) Next() Container {
it.i++
return Container{value: it.i}
}
type Mapper func(Container) Container
type Stream interface {
Iterator
Map(Mapper) Stream
}
type iterStream struct {
Iterator
}
func NewStreamFromIter(it Iterator) Stream {
return iterStream{Iterator: it}
}
func (is iterStream) Map(f Mapper) Stream {
return mapperStream{Stream: is, f: f}
}
type mapperStream struct {
Stream
f Mapper
}
func (ms mapperStream) Next() Container {
return ms.f(ms.Stream.Next())
}
func (ms mapperStream) Map(f Mapper) Stream {
return nil // Not implemented / needed
}
func main() {
s := NewStreamFromIter(&incIter{})
s = s.Map(func(in Container) Container {
return Container{value: in.value * 2}
})
fmt.Println("Start")
start := time.Now()
j := 0
for s.HasNext() {
s.Next()
j++
}
fmt.Println(time.Since(start))
fmt.Println("j:", j)
}
Execution time: 210 ms. Nice, we're already sped it up 5 times, yet we're far from Java's Stream performance.
"Removing" Iterator and Stream types
Since we can't use generics, the interface types Iterator and Stream doesn't really need to be interfaces, since we would need new types of them if we'd wanted to use them to define iterators and streams of another types.
So the next thing we do is we remove Stream and Iterator, and we use their concrete types, their implementations above. This will not hurt readability at all, in fact the solution is shorter:
package main
import (
"fmt"
"time"
)
type Container struct {
value int
}
type incIter struct {
i int
}
func (it *incIter) HasNext() bool {
return it.i < 10000000
}
func (it *incIter) Next() Container {
it.i++
return Container{value: it.i}
}
type Mapper func(Container) Container
type iterStream struct {
*incIter
}
func NewStreamFromIter(it *incIter) iterStream {
return iterStream{incIter: it}
}
func (is iterStream) Map(f Mapper) mapperStream {
return mapperStream{iterStream: is, f: f}
}
type mapperStream struct {
iterStream
f Mapper
}
func (ms mapperStream) Next() Container {
return ms.f(ms.iterStream.Next())
}
func main() {
s0 := NewStreamFromIter(&incIter{})
s := s0.Map(func(in Container) Container {
return Container{value: in.value * 2}
})
fmt.Println("Start")
start := time.Now()
j := 0
for s.HasNext() {
s.Next()
j++
}
fmt.Println(time.Since(start))
fmt.Println("j:", j)
}
Execution time: 50 ms, we've again sped it up 4 times compared to our previous solution! Now that's the same order of magnitude of the Java's solution, and we've lost nothing from the "streaming manner". Overall gain from the asker's proposal: 22 times faster.
Given the fact that in Java you used System.currentTimeMillis() to measure execution, this may even be the same as Java's performance. Asker confirmed: it's the same!
Regarding the same performance
Now we're talking about roughly the "same" code which does pretty simple, basic tasks, in different languages. If they're doing basic tasks, there is not much one language could do better than the other.
Also keep in mind that Java is a mature adult (over 21 years old), and had an enormous time to evolve and be optimized; actually Java's JIT (just-in-time compilation) is doing a pretty good job for long running processes, such as yours. Go is much younger, still just a kid (will be 5 years old 11 days from now), and probably will have better performance improvements in the foreseeable future than Java.
Further improvements
This "streamy" way may not be the "Go" way to approach the problem you're trying to solve. This is merely the "mirror" code of your Java's solution, using more idiomatic constructs of Go.
Instead you should take advantage of Go's excellent support for concurrency, namely goroutines (see go statement) which are much more efficient than Java's threads, and other language constructs such as channels (see answer What are golang channels used for?) and select statement.
Properly chunking / partitioning your originally big task to smaller ones, a goroutine worker pool might be quite powerful to process big amount of data. See
Is this an idiomatic worker thread pool in Go?
Also you claimed in your comment that "I don't have 10M items to process but more 10G which won't fit in memory". If this is the case, think about IO time and the delay of the external system you're fetching the data from to process. If that takes significant time, it might out-weight the processing time in the app, and app's execution time might not matter (at all).
Go is not about squeezing every nanosecond out of execution time, but rather providing you a simple, minimalist language and tools, by which you can easily (by writing simple code) take control of and utilize your available resources (e.g. goroutines and multi-core CPU).
(Try to compare the Go language spec and the Java language spec. Personally I've read Go's lang spec multiple times, but could never get to the end of Java's.)
This is I think an interesting question as it gets to the heart of differences between Java and Go and highlights the difficulties of porting code. Here is the same thing in go minus all the machinery (time ~50ms here):
values := make([]int64, 10000000)
start := time.Now()
fmt.Println("Start")
for i := int64(0); i < 10000000; i++ {
values[i] = 2 * i
}
fmt.Println("Over after:", time.Now().Sub(start))
More seriously here is the same thing with a map over a slice of entries which is a more idiomatic version of what you have above and could work with any sort of Entry struct. This actually works out at a faster time on my machine of 30ms than the for loop above (anyone care to explain why?), so probably similar to your Java version:
package main
import (
"fmt"
"time"
)
type Entry struct {
Value int64
}
type EntrySlice []*Entry
func New(l int64) EntrySlice {
entries := make(EntrySlice, l)
for i := int64(0); i < l; i++ {
entries[i] = &Entry{Value: i}
}
return entries
}
func (entries EntrySlice) Map(fn func(i int64) int64) {
for _, e := range entries {
e.Value = fn(e.Value)
}
}
func main() {
entries := New(10000000)
start := time.Now()
fmt.Println("Start")
entries.Map(func(v int64) int64 {
return 2 * v
})
fmt.Println("Over after:", time.Now().Sub(start))
}
Things that will make operations more expensive -
Passing around interface{}, don't do this
Building a separate iterator type - use range or for loops
Allocations - so building new types to store answers, transform in place
Re using interface{}, I would avoid this - this means you have to write a separate map (say) for each type, not a great hardship. Instead of building an iterator, a range is probably more appropriate. Re transforming in place, if you allocate new structs for each result it'll put pressure on the garbage collector, using a Map func like this is an order of magnitude slower:
entries.Map(func(e *Entry) *Entry {
return &Entry{Value: 2 * e.Value}
})
To stream split the data into chunks and do the same as above (keeping a memo of last object if you depend on previous calcs). If you have independent calculations (not as here) you could also fan out to a bunch of goroutines doing the work and get it done faster if there is a lot of it (this has overhead, in simple examples it won't be faster).
Finally, if you're interested in data processing with go, I'd recommend visiting this new site: http://gopherdata.io/
Just as a complement to the previous comments, I changed the code of both Java and Go implementations to run the test 100 times.
What's interesting here is that Go takes a constant time between 69 and 72ms.
Owever, Java takes 71ms the first time (71ms, 19ms, 12ms) and then between 5 and 7ms.
From my test and understanding, this comes from the fact that the JVM takes a bit of time to properly load the classes and do some optimization.
In the end I'm still having this 10 times performance difference but I'm not giving up and I'll try to have a better understanding of how Go works to try to have it more fast :)

Retaining the stack position of a recursive function between calls

This question is general, but I feel it is best explained with a specific example. Let's say I have a directory with many nested sub directories and in some of those sub directories there are text files ending with ".txt". A sample structure could be:
dir1
dir2
file1.txt
dir3
file2.txt
file3.txt
I'd be interested if there were a way in Java to build a method that could be called to return the successive text files:
TextCrawler crawler = new TextCrawler(new File("dir1"));
File textFile;
textFile = crawler.nextFile(); // value is file1.txt
textFile = crawler.nextFile(); // value is file2.txt
textFile = crawler.nextFile(); // value is file3.txt
Here is the challenge: No internal list of all the text files can be saved in the crawler object. That is trivial. In that case you'd simply build into the initialization a method that recursively builds the list of files.
Is there a general way of pausing a recursive method so that when it is called again it returns to the specific point in the stack where it left? Or will we have to write something that is specific to each situation and solutions necessarily have to vary for file crawlers, org chart searches, recursive prime finders, etc.?
If you want a solution that works on any recursive function, you can accept a Consumer object. It may look something like this:
public void recursiveMethod(Consumer<TreeNode> func, TreeNode node){
if(node.isLeafNode()){
func.accept(node);
} else{
//Perform recursive call
}
}
For a bunch of files, it might look like this:
public void recursiveMethod(Consumer<File> func, File curFile){
if(curFile.isFile()){
func.accept(curFile);
} else{
for(File f : curFile.listFiles()){
recursiveMethod(func, f);
}
}
}
You can then call it with:
File startingFile;
//Initialize f as pointing to a directory
recursiveMethod((File file)->{
//Do something with file
}, startingFile);
Adapt as necessary.
I think the state should be saved while you return from your recursive function, then you need to restore the state as you call that recursive function again. There is no generic way to save such a state, however a template can probably be created. Something like this:
class Crawler<T> {
LinkedList<T> innerState;
Callback<T> callback;
constructor Crawler(T base,Callback<T> callback) {
innerState=new LinkedList<T>();
innerState.push(base);
this.callback=callback; // I want functions passed here
}
T recursiveFunction() {
T base=innerState.pop();
T result=return recursiveInner(base);
if (!result) innerState.push(base); // full recursion complete
return result;
}
private T recursiveInner(T element) {
ArrayList<T> c=callback.getAllSubElements(element);
T d;
for each (T el in c) {
if (innerState.length()>0) {
d=innerState.pop();
c.skipTo(d);
el=d;
if (innerState.length()==0) el=c.getNext();
// we have already processed "d", if full inner state is restored
}
T result=null;
if (callback.testFunction(el)) result=el;
if ((!result) && (callback.recursiveFunction(el))) result=recursiveInner(el); // if we can recurse on this element, go for it
if (result) {
// returning true, go save state
innerState.push(el); // push current local state to "stack"
return result;
}
} // end foreach
return null;
}
}
interface Callback<T> {
bool testFunction(T element);
bool recursiveFunction(T element);
ArrayList<t> getAllSubElements(T element);
}
Here, skipTo() is a method that modifies the iterator on c to point to provided element. Callback<T> is a means to pass functions to class to be used as condition checkers. Say "Is T a folder" for recursive check, "Is T a *.txt" for return check, and "getAllSubclassElements" should also belong here. The for each loop is fron lack of knowledge on how to work with modifiable iterators in Java, please adapt to actual code.
The only way I can think of that would meet your exact requirement would be to perform the recursive tree walk in a separate thread, and have that thread deliver results back to the main thread one at a time. (For simplicity you could use a bounded queue for the delivery, but it is also possible to implement is using wait / notify, a lock object and a single shared reference variable.)
In Python, for example, this would be a good fit for coroutines. Unfortunately, Java doesn't have a direct equivalent.
I should add that using threads is likely to incur significant overhead in synchronization and thread context switching. Using a queue will reduce them to a degree provided that rate of "producing" and "consuming" is well matched.

Java 8 implementation for recursive method

What is the correct way of using lambdas for a recursive method? I have been trying to write a depth-first-search recursive function for a Graph. I have tried implementing the Lambda version, but not sure if my implementation is the correct way of using it in a recursive function.
Outline of the code:
a) Old fashioned way
private void depthFirstSearch(final Graph graph, final int sourceVertex){
count++;
marked[sourceVertex]= true;
for(int vertex:graph.getAllVerticesConnectedTo(sourceVertex)){
if(marked[vertex]!=true){
edgeTo[vertex]=sourceVertex;
depthFirstSearch(graph,vertex);
}
}
}
b) Java 8 Lambdas way:
private void depthFirstSearchJava8(final Graph graph, final int sourceVertex){
count++;
marked[sourceVertex]= true;
StreamSupport.stream(graph.getAllVerticesConnectedTo(sourceVertex).spliterator(),false)
.forEach(vertex -> {
if(marked[vertex]!=true){
edgeTo[vertex]=sourceVertex;
depthFirstSearchJava8(graph,sourceVertex);
}
});
}
I have tried to write a lambda version as above but could not figure out the advantage it is providing as compared to the traditional way.
Thanks
Just because lambdas exist, this doesn't mean you have to use them everywhere.
You are looping over an iterable, without filtering or mapping or transforming anything (which are the typical use cases for lambdas).
The for loop does what you want in a one-liner. Therefore, lambdas should not be used here.
That's because there is no advantage, at least not in this case. Lambdas are useful when you want to create a small function to be used in just one place in the program, e.g. when passing the lambda as an argument for another function. If your lambda takes more than one line of code, you should reconsider the idea of using it.
You could rewrite your depthFirstSearch method as follows:
private void depthFirstSearchJava8(Graph graph, int sourceVertex){
count++;
marked[sourceVertex] = true;
graph.getAllVerticesConnectedTo(sourceVertex).stream()
.filter(vertex -> !marked[vertex])
.peek(vertex -> edgeTo[vertex] = sourceVertex)
.forEach(vertex -> depthFirstSearchJava8(graph, vertex));
}
This code assumes getAllVerticesConnectedTo() method returns a collection of integers. If it returns an array of integers instead, then use the following code:
private void depthFirstSearchJava8(Graph graph, int sourceVertex){
count++;
marked[sourceVertex] = true;
Arrays.stream(graph.getAllVerticesConnectedTo(sourceVertex))
.filter(vertex -> !marked[vertex])
.peek(vertex -> edgeTo[vertex] = sourceVertex)
.forEach(vertex -> depthFirstSearchJava8(graph, vertex));
}
In the first solution, I've used the Collection.stream() method to get a stream of connected vertices, while in the second one, I've used the Arrays.stream() method. Then, in both solutions, I've first used filter() to keep only non marked vertices and peek() to modify the edgeTo array. Finally, forEach() is used to terminate the stream by invoking depthFirstSearchJava8() method recursively.

Is it possible to write a loop in Java that does not actually use a loop method?

I was curious if, in Java, you could create a piece of code that keeps iterating a piece of code without the use of a for or while loop, and if so, what methods could be used to solve this?
Look at recursion. A recursive function is a function which calls itself until a base case is reached. An example is the factorial function:
int fact(int n)
{
int result;
if(n==1)
return 1;
result = fact(n-1) * n;
return result;
}
You could use the Java 8 Streams methods for iterating over the elements of a Collection. Among the methods you can use are filtering methods (get all the elements of a collection that satisfy some conditions), mapping methods (map a Collection of one type to a Collection of another type) and aggregation methods (like computing the sum of all the elements in a Collection, based on some integer member of the Element stored in the collection).
For example - Stream forEach :
List<Element> = new ArrayList<Element>();
...
list.stream().forEach (element -> System.out.println(element));
Or you can do it without a Stream :
List<Element> = new ArrayList<Element>();
...
list.forEach (element -> System.out.println(element));
Another variant of recursion:
public class LoopException extends Exception {
public LoopException(int i, int max) throws LoopException {
System.out.println( "Loop variable: "+i);
if (i < max)
throw new LoopException( i+1, max );
}
}
Of course this is just a bit of fun, don't ever do it for real.
Java does not have a goto statement (that's a lie), so that way is a dead end.
But you could always make a piece of code endlessly iterate using recursion. Old factorial function seems to be the favorite, but since it is not an infinite loop, I will go for this simple function:
int blowMyStack(int a) {
return blowMyStack(a + 1);
}
There will be many ways to do this using various features of the language. But it always falls to an underlying recursion.
In case you're referring of something like C's goto, the answer is no.
In other cases, you can use recursive functions.

Implementing Stack's Pop method with Recursion

I am self-studying java. I have been studying data structures for the past couple of days. I am reading the book "Data Structures and Algorithms in Java". there is an exercise that I have problem with. it asks for implementing the pop method with recursion so that when the method is called it should delete all the items at once. can anyone help on this? a pointer on how to do it would be much appreciated. thanks. (following is the pop method currently implemented).
public double pop() // take item from top of stack
{
return stackArray[top--]; // access item, decrement top
}
First IMO you should understand how to implement a non-recursive counterpart of this method.
It can be something like this:
public void popAll() {
while(!stack.isEmpty()) {
stack.pop();
}
}
Once you understand this, the recursive version should be easy:
public void popAllRecursive() {
if(stack.isEmpty()) {
//nothing to remove, return
return;
}
stack.pop(); // remove one stack element
popAllRecursive(); // recursive invocation of your method
}
Since its an exercise I just provide you an idea and leave the implementation to you (you can consider to provide the method in class Stack and use the top counter and stackArray - an implementation of your stack.
Hope this helps
You need to think about the base case where there is nothing in the stack, i.e. stack.pop() == null.
For the recursive case, it is quite intuitive as you just need to recursively call pop() until the base case is met.
Call pop() repeatedly till end of stack.
As you have not mentioned how the data is stored cant help you providing code.
thanks every one, i solved the problem. don't know if efficient, but i did like below:
public void pop()
{
if(isEmpty()){
return;
}
if (top>=0){
stackArray[top] = stackArray[top--];
pop();
}
}

Categories