Simple data stream: Go being super slow compared to Java

Simple data stream: Go being super slow compared to Java - java

As a Java dev, I'm currently looking at Go because I think it's an interesting language.
To start with it, I decided to take a simple Java project I wrote months ago, and re-write it in Go to compare performances and (mainly, actually) compare the code readability/complexity.
The Java code sample is the following:
public static void main(String[] args) {
long start = System.currentTimeMillis();
Stream<Container> s = Stream.from(new Iterator<Container>() {
int i = 0;
#Override
public boolean hasNext() {
return i < 10000000;
}
#Override
public Container next() {
return new Container(i++);
}
});
s = s.map((Container _source) -> new Container(_source.value * 2));
int j = 0;
while (s.hasNext()) {
s.next();
j++;
}
System.out.println(System.currentTimeMillis() - start);
System.out.println("j:" + j);
}
public static class Container {
int value;
public Container(int v) {
value = v;
}
}
Where the map function is:
return new Stream<R>() {
#Override
public boolean hasNext() {
return Stream.this.hasNext();
}
#Override
public R next() {
return _f.apply(Stream.this.next());
}
};
And the Stream class is just an extension to java.util.Iterator to add custom methods to it. Other methods than map differs from standard Java Stream API.
Anyway, to reproduce this, I wrote the following Go code:
package main
import (
"fmt"
)
type Iterator interface {
HasNext() bool
Next() interface{}
}
type Stream interface {
HasNext() bool
Next() interface{}
Map(transformer func(interface{}) interface{}) Stream
}
///////////////////////////////////////
type incremetingIterator struct {
i int
}
type SampleEntry struct {
value int
}
func (s *SampleEntry) Value() int {
return s.value
}
func (s *incremetingIterator) HasNext() bool {
return s.i < 10000000
}
func (s *incremetingIterator) Next() interface{} {
s.i = s.i + 1
return &SampleEntry{
value: s.i,
}
}
func CreateIterator() Iterator {
return &incremetingIterator{
i: 0,
}
}
///////////////////////////////////////
type stream struct {
source Iterator
}
func (s *stream) HasNext() bool {
return s.source.HasNext()
}
func (s *stream) Next() interface{} {
return s.source.Next()
}
func (s *stream) Map(tr func(interface{}) interface{}) Stream {
return &stream{
source: &mapIterator{
source: s,
transformer: tr,
},
}
}
func FromIterator(it Iterator) Stream {
return &stream{
source: it,
}
}
///////////////////////////////////////
type mapIterator struct {
source Iterator
transformer func(interface{}) interface{}
}
func (s *mapIterator) HasNext() bool {
return s.source.HasNext()
}
func (s *mapIterator) Next() interface{} {
return s.transformer(s.source.Next())
}
///////////////////////////////////////
func main() {
it := CreateIterator()
ss := FromIterator(it)
ss = ss.Map(func(in interface{}) interface{} {
return &SampleEntry{
value: 2 * in.(*SampleEntry).value,
}
})
fmt.Println("Start")
for ss.HasNext() {
ss.Next()
}
fmt.Println("Over")
}
Both producing the same result but when Java takes about 20ms, Go takes 1050ms (with 10M items, test ran several times).
I'm very new to Go (started couple of hours ago) so please be indulgent if I did something really bad :-)
Thank you!

The other answer changed the original task quite "dramatically", and reverted to a simple loop. I consider it to be different code, and as such, it cannot be used to compare execution times (that loop could be written in Java as well, which would give smaller execution time).
Now let's try to keep the "streaming manner" of the problem at hand.
Note beforehand:
One thing to note beforehand. In Java, the granularity of System.currentTimeMillis() could be around 10 ms (!!) which is in the same order of magnitude of the result! This means the error rate could be huge in Java's 20 ms! So instead you should use System.nanoTime() to measure code execution times! For details, see Measuring time differences using System.currentTimeMillis().
Also this is not the correct way to measure execution times, as running things for the first time might run several times slower. For details, see Order of the code and performance.
Genesis
Your original Go proposal runs on my computer roughly for 1.1 seconds, which is about the same as yours.
Removing interface{} item type
Go doesn't have generics, trying to mimic this behavior with interface{} is not the same and have serious performance impact if the value you want to work with is a primitive type (e.g. int) or some simple structs (like the Go equivalent of your Java Container type). See: The Laws of Reflection #The representation of an interface. Wrapping an int (or any other concrete type) in an interface requires creating a (type;value) pair holding the dynamic type and value to be wrapped (creation of this pair also involves copying the value being wrapped; see an analysis of this in the answer How can a slice contain itself?). Moreover when you want to access the value, you have to use a type assertion which is a runtime check, so the compiler can't be of any help optimizing that (and the check will add to the code execution time)!
So let's not use interface{} for our items, but instead use a concrete type for our case:
type Container struct {
value int
}
We will use this in the iterator's and stream's next method: Next() Container, and in the mapper function:
type Mapper func(Container) Container
Also we may utilize embedding, as the method set of Iterator is a subset of that of Stream.
Without further ado, here is the complete, runnable example:
package main
import (
"fmt"
"time"
)
type Container struct {
value int
}
type Iterator interface {
HasNext() bool
Next() Container
}
type incIter struct {
i int
}
func (it *incIter) HasNext() bool {
return it.i < 10000000
}
func (it *incIter) Next() Container {
it.i++
return Container{value: it.i}
}
type Mapper func(Container) Container
type Stream interface {
Iterator
Map(Mapper) Stream
}
type iterStream struct {
Iterator
}
func NewStreamFromIter(it Iterator) Stream {
return iterStream{Iterator: it}
}
func (is iterStream) Map(f Mapper) Stream {
return mapperStream{Stream: is, f: f}
}
type mapperStream struct {
Stream
f Mapper
}
func (ms mapperStream) Next() Container {
return ms.f(ms.Stream.Next())
}
func (ms mapperStream) Map(f Mapper) Stream {
return nil // Not implemented / needed
}
func main() {
s := NewStreamFromIter(&incIter{})
s = s.Map(func(in Container) Container {
return Container{value: in.value * 2}
})
fmt.Println("Start")
start := time.Now()
j := 0
for s.HasNext() {
s.Next()
j++
}
fmt.Println(time.Since(start))
fmt.Println("j:", j)
}
Execution time: 210 ms. Nice, we're already sped it up 5 times, yet we're far from Java's Stream performance.
"Removing" Iterator and Stream types
Since we can't use generics, the interface types Iterator and Stream doesn't really need to be interfaces, since we would need new types of them if we'd wanted to use them to define iterators and streams of another types.
So the next thing we do is we remove Stream and Iterator, and we use their concrete types, their implementations above. This will not hurt readability at all, in fact the solution is shorter:
package main
import (
"fmt"
"time"
)
type Container struct {
value int
}
type incIter struct {
i int
}
func (it *incIter) HasNext() bool {
return it.i < 10000000
}
func (it *incIter) Next() Container {
it.i++
return Container{value: it.i}
}
type Mapper func(Container) Container
type iterStream struct {
*incIter
}
func NewStreamFromIter(it *incIter) iterStream {
return iterStream{incIter: it}
}
func (is iterStream) Map(f Mapper) mapperStream {
return mapperStream{iterStream: is, f: f}
}
type mapperStream struct {
iterStream
f Mapper
}
func (ms mapperStream) Next() Container {
return ms.f(ms.iterStream.Next())
}
func main() {
s0 := NewStreamFromIter(&incIter{})
s := s0.Map(func(in Container) Container {
return Container{value: in.value * 2}
})
fmt.Println("Start")
start := time.Now()
j := 0
for s.HasNext() {
s.Next()
j++
}
fmt.Println(time.Since(start))
fmt.Println("j:", j)
}
Execution time: 50 ms, we've again sped it up 4 times compared to our previous solution! Now that's the same order of magnitude of the Java's solution, and we've lost nothing from the "streaming manner". Overall gain from the asker's proposal: 22 times faster.
Given the fact that in Java you used System.currentTimeMillis() to measure execution, this may even be the same as Java's performance. Asker confirmed: it's the same!
Regarding the same performance
Now we're talking about roughly the "same" code which does pretty simple, basic tasks, in different languages. If they're doing basic tasks, there is not much one language could do better than the other.
Also keep in mind that Java is a mature adult (over 21 years old), and had an enormous time to evolve and be optimized; actually Java's JIT (just-in-time compilation) is doing a pretty good job for long running processes, such as yours. Go is much younger, still just a kid (will be 5 years old 11 days from now), and probably will have better performance improvements in the foreseeable future than Java.
Further improvements
This "streamy" way may not be the "Go" way to approach the problem you're trying to solve. This is merely the "mirror" code of your Java's solution, using more idiomatic constructs of Go.
Instead you should take advantage of Go's excellent support for concurrency, namely goroutines (see go statement) which are much more efficient than Java's threads, and other language constructs such as channels (see answer What are golang channels used for?) and select statement.
Properly chunking / partitioning your originally big task to smaller ones, a goroutine worker pool might be quite powerful to process big amount of data. See
Is this an idiomatic worker thread pool in Go?
Also you claimed in your comment that "I don't have 10M items to process but more 10G which won't fit in memory". If this is the case, think about IO time and the delay of the external system you're fetching the data from to process. If that takes significant time, it might out-weight the processing time in the app, and app's execution time might not matter (at all).
Go is not about squeezing every nanosecond out of execution time, but rather providing you a simple, minimalist language and tools, by which you can easily (by writing simple code) take control of and utilize your available resources (e.g. goroutines and multi-core CPU).
(Try to compare the Go language spec and the Java language spec. Personally I've read Go's lang spec multiple times, but could never get to the end of Java's.)

This is I think an interesting question as it gets to the heart of differences between Java and Go and highlights the difficulties of porting code. Here is the same thing in go minus all the machinery (time ~50ms here):
values := make([]int64, 10000000)
start := time.Now()
fmt.Println("Start")
for i := int64(0); i < 10000000; i++ {
values[i] = 2 * i
}
fmt.Println("Over after:", time.Now().Sub(start))
More seriously here is the same thing with a map over a slice of entries which is a more idiomatic version of what you have above and could work with any sort of Entry struct. This actually works out at a faster time on my machine of 30ms than the for loop above (anyone care to explain why?), so probably similar to your Java version:
package main
import (
"fmt"
"time"
)
type Entry struct {
Value int64
}
type EntrySlice []*Entry
func New(l int64) EntrySlice {
entries := make(EntrySlice, l)
for i := int64(0); i < l; i++ {
entries[i] = &Entry{Value: i}
}
return entries
}
func (entries EntrySlice) Map(fn func(i int64) int64) {
for _, e := range entries {
e.Value = fn(e.Value)
}
}
func main() {
entries := New(10000000)
start := time.Now()
fmt.Println("Start")
entries.Map(func(v int64) int64 {
return 2 * v
})
fmt.Println("Over after:", time.Now().Sub(start))
}
Things that will make operations more expensive -
Passing around interface{}, don't do this
Building a separate iterator type - use range or for loops
Allocations - so building new types to store answers, transform in place
Re using interface{}, I would avoid this - this means you have to write a separate map (say) for each type, not a great hardship. Instead of building an iterator, a range is probably more appropriate. Re transforming in place, if you allocate new structs for each result it'll put pressure on the garbage collector, using a Map func like this is an order of magnitude slower:
entries.Map(func(e *Entry) *Entry {
return &Entry{Value: 2 * e.Value}
})
To stream split the data into chunks and do the same as above (keeping a memo of last object if you depend on previous calcs). If you have independent calculations (not as here) you could also fan out to a bunch of goroutines doing the work and get it done faster if there is a lot of it (this has overhead, in simple examples it won't be faster).
Finally, if you're interested in data processing with go, I'd recommend visiting this new site: http://gopherdata.io/

Just as a complement to the previous comments, I changed the code of both Java and Go implementations to run the test 100 times.
What's interesting here is that Go takes a constant time between 69 and 72ms.
Owever, Java takes 71ms the first time (71ms, 19ms, 12ms) and then between 5 and 7ms.
From my test and understanding, this comes from the fact that the JVM takes a bit of time to properly load the classes and do some optimization.
In the end I'm still having this 10 times performance difference but I'm not giving up and I'll try to have a better understanding of how Go works to try to have it more fast :)

Related

Omitting an instance field at run time in Java

Java's assert mechanism allows disabling putting in assertions which have essentially no run time cost (aside from a bigger class file) if assertions are disabled. But this may cover all situations.
For instance, many of Java's collections feature "fail-fast" iterators that attempt to detect when you're using them in a thread-unsafe way. But this requires both the collection and the iterator itself to maintain extra state that would not be needed if these checks weren't there.
Suppose someone wanted to do something similar, but allow the checks to be disabled and if they are disabled, it saves a few bytes in the iterator and likewise a few more bytes in the ArrayList, or whatever.
Alternatively, suppose we're doing some sort of object pooling that we want to be able to turn on and off at runtime; when it's off, it should just use Java's garbage collection and take no room for reference counts, like this (note that the code as written is very broken):
class MyClass {
static final boolean useRefCounts = my.global.Utils.useRefCounts();
static {
if(useRefCounts)
int refCount; // want instance field, not local variable
}
void incrementRefCount(){
if(useRefCounts) refCount++; // only use field if it exists;
}
/**return true if ready to be collected and reused*/
boolean decrementAndTestRefCount(){
// rely on Java's garbage collector if ref counting is disabled.
return useRefCounts && --refCount == 0;
}
}
The trouble with the above code is that the static bock makes no sense. But is there some trick using low-powered magic to make something along these lines work? (If high powered magic is allowed, the nuclear option is generate two versions of MyClass and arrange to put the correct one on the class path at start time.)

NOTE: You might not need to do this at all. The JIT is very good at inlining constants known at runtime especially boolean and optimising away the code which isn't used.
The int field is not ideal, however, if you are using a 64 bit JVM, the object size might not change.
On the OpenJDK/Oracle JVM (64-bit), the header is 12 bytes by default. The object alignment is 8 byte so the object will use 16 bytes. The field, adds 4 bytes, which after alignment is also 16 bytes.
To answer the question, you need two classes (unless you use generated code or hacks)
class MyClass {
static final boolean useRefCounts = my.global.Utils.useRefCounts();
public static MyClass create() {
return useRefCounts ? new MyClassPlus() : new MyClass();
}
void incrementRefCount() {
}
boolean decrementAndTestRefCount() {
return false;
}
}
class MyClassPlus extends MyClass {
int refCount; // want instance field, not local variable
void incrementRefCount() {
refCount++; // only use field if it exists;
}
boolean decrementAndTestRefCount() {
return --refCount == 0;
}
}

If you accept a slightly higher overhead in the case you’re using your ref count, you may resort to external storage, i.e.
class MyClass {
static final WeakHashMap<MyClass,Integer> REF_COUNTS
= my.global.Utils.useRefCounts()? new WeakHashMap<>(): null;
void incrementRefCount() {
if(REF_COUNTS != null) REF_COUNTS.merge(this, 1, Integer::sum);
}
/**return true if ready to be collected and reused*/
boolean decrementAndTestRefCount() {
return REF_COUNTS != null
&& REF_COUNTS.compute(this, (me, i) -> --i == 0? null: i) == null;
}
}
There is a behavioral difference for the case that someone invokes decrementAndTestRefCount() more often than incrementRefCount(). While your original code silently runs into a negative ref count, this code will throw a NullPointerException. I prefer failing with an exception in this case…
The code above will leave you with the overhead of a single static field in case you’re not using the feature. Most JVMs should have no problems eliminating the conditionals regarding the state of a static final variable.
Note further that the code allows MyClass instances to get garbage collected while having a non-zero ref count, just like when it was an instance field, but also actively removes the mapping when the count reaches the initial state of zero again, to minimize the work needed for cleanup.

why does this Java method leak—and why does inlining it fix the leak?

I wrote a minimal somewhat-lazy (int) sequence class, GarbageTest.java, as an experiment, to see if I could process very long, lazy sequences in Java, the way I can in Clojure.
Given a naturals() method that returns the lazy, infinite, sequence of natural numbers; a drop(n,sequence) method that drops the first n elements of sequence and returns the rest of the sequence; and an nth(n,sequence) method that returns simply: drop(n, lazySeq).head(), I wrote two tests:
static int N = (int)1e6;
// succeeds # N = (int)1e8 with java -Xmx10m
#Test
public void dropTest() {
assertThat( drop(N, naturals()).head(), is(N+1));
}
// fails with OutOfMemoryError # N = (int)1e6 with java -Xmx10m
#Test
public void nthTest() {
assertThat( nth(N, naturals()), is(N+1));
}
Note that the body of dropTest() was generated by copying the body of nthTest() and then invoking IntelliJ's "inline" refactoring on the nth(N, naturals()) call. So it seems to me that the behavior of dropTest() should be identical to the behavior of nthTest().
But it isn't identical! dropTest() runs to completion with N up to 1e8 whereas nthTest() fails with OutOfMemoryError for N as small as 1e6.
I've avoided inner classes. And I've experimented with a variant of my code, ClearingArgsGarbageTest.java, that nulls method parameters before calling other methods. I've applied the YourKit profiler. I've looked at the byte code. I just cannot find the leak that causes nthTest() to fail.
Where's the "leak"? And why does nthTest() have the leak while dropTest() does not?
Here's the rest of the code from GarbageTest.java in case you don't want to click through to the Github project:
/**
* a not-perfectly-lazy lazy sequence of ints. see LazierGarbageTest for a lazier one
*/
static class LazyishSeq {
final int head;
volatile Supplier<LazyishSeq> tailThunk;
LazyishSeq tailValue;
LazyishSeq(final int head, final Supplier<LazyishSeq> tailThunk) {
this.head = head;
this.tailThunk = tailThunk;
tailValue = null;
}
int head() {
return head;
}
LazyishSeq tail() {
if (null != tailThunk)
synchronized(this) {
if (null != tailThunk) {
tailValue = tailThunk.get();
tailThunk = null;
}
}
return tailValue;
}
}
static class Incrementing implements Supplier<LazyishSeq> {
final int seed;
private Incrementing(final int seed) { this.seed = seed;}
public static LazyishSeq createSequence(final int n) {
return new LazyishSeq( n, new Incrementing(n+1));
}
#Override
public LazyishSeq get() {
return createSequence(seed);
}
}
static LazyishSeq naturals() {
return Incrementing.createSequence(1);
}
static LazyishSeq drop(
final int n,
final LazyishSeq lazySeqArg) {
LazyishSeq lazySeq = lazySeqArg;
for( int i = n; i > 0 && null != lazySeq; i -= 1) {
lazySeq = lazySeq.tail();
}
return lazySeq;
}
static int nth(final int n, final LazyishSeq lazySeq) {
return drop(n, lazySeq).head();
}

In your method
static int nth(final int n, final LazyishSeq lazySeq) {
return drop(n, lazySeq).head();
}
the parameter variable lazySeq hold a reference to the first element of your sequence during the entire drop operation. This prevents the entire sequence from getting garbage collected.
In contrast, with
public void dropTest() {
assertThat( drop(N, naturals()).head(), is(N+1));
}
the first element of your sequence is returned by naturals() and directly passed to the invocation of drop, thus removed from the operand stack and does not exist during the execution of drop.
Your attempt to set the parameter variable to null, i.e.
static int nth(final int n, /*final*/ LazyishSeq lazySeqArg) {
final LazyishSeq lazySeqLocal = lazySeqArg;
lazySeqArg = null;
return drop(n,lazySeqLocal).head();
}
does not help, as now, the lazySeqArg variable is null, but the lazySeqLocal holds a reference to the first element.
A local variable does not prevent garbage collection in general, the collection of otherwise unused objects is permitted, but that doesn’t imply that a particular implementation is capable of doing it.
In case of the HotSpot JVM, only optimized code will get rid of such unused references. But here, nth is not a hot spot, as the heavy things happen within drop method.
This is the reason why the same issue does not appear at the drop method, despite it also holds a reference to the first element in its parameter variable. The drop method contains the loop doing the actual work, hence, is very likely to get optimized by the JVM, which may cause it to eliminate unused variables, allowing the already processed part of the sequence to become collected.
There are many factors which may affect the JVM’s optimizations. Besides the different shape of the code, it seems that that rapid memory allocations during the unoptimized phase may also reduce the optimizer’s improvements. Indeed, when I run with -Xcompile, to forbid interpreted execution altogether, both variants run successfully, even int N = (int)1e9 is no problem anymore. Of course, forcing compilation raises the startup time.
I have to admit that I do not understand why the mixed mode performs that much worse and I’ll investigate further. But generally, you have to be aware that the efficiency of the garbage collector is implementation dependent, so objects collected in one environment may stay in memory in another.

Clojure implements a strategy for dealing with this sort of scenario which it calls "locals clearing". There's support for it in the compiler that makes it kick in automatically where required in pure Clojure code (unless disabled at compilation time – this is sometimes useful for debugging). Clojure does also clear locals in various places in its Java runtime, however, and the way it does that could be used in Java libraries and possibly even application code, though it would undoubtedly be somewhat cumbersome.
Before I get into what Clojure does, here's a short summary of what is going on in this example:
nth(int, LazyishSeq) is implemented in terms of drop(int, LazyishSeq) and LazyishSeq.head().
nth passes both its arguments to drop and has no further use for them.
drop can easily be implemented so as to avoid holding on to the head of the passed-in sequence.
Here nth still holds on to the head of its sequence argument. The runtime may potentially discard that reference, but it is not guaranteed that it will.
The way Clojure deals with this is by clearing the reference to the sequence explicitly before control is handed off to drop. This is done using a rather elegant trick (link to the below snippet on GitHub as of Clojure 1.9.0):
// clojure/src/jvm/clojure/lang/Util.java
/**
* Copyright (c) Rich Hickey. All rights reserved.
* The use and distribution terms for this software are covered by the
* Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php)
* which can be found in the file epl-v10.html at the root of this distribution.
* By using this software in any fashion, you are agreeing to be bound by
* the terms of this license.
* You must not remove this notice, or any other, from this software.
**/
// … beginning of the file omitted …
// the next line is the 190th in the file as of Clojure 1.9.0
static public Object ret1(Object ret, Object nil){
return ret;
}
static public ISeq ret1(ISeq ret, Object nil){
return ret;
}
// …
Given the above, the call to drop inside nth can be changed to
drop(n, ret1(lazySeq, lazySeq = null))
Here lazySeq = null is evaluated as an expression before control is transferred to ret1; the value is null and there is also the side effect of setting the lazySeq reference to null. The first argument to ret1 will have been evaluated by this point, however, so ret1 receives the reference to the sequence in its first argument and returns it as expected, and that value is then passed to drop.
Thus drop receives the original value held by the lazySeq local, but the local itself is cleared before control is transferred to drop.
Consequently nth no longer holds on to the head of the sequence.

Enough to stack overflow?

Java.
Is it normal that i get stack overflow error after 10 000 recursive void function calls with a reference and two integers as arguments?
Got 6gb ram memory, tried running through IDE and command line. I'm pretty sure the code is correct and recursion should finish.
It's about a Fill tool for a tile map editor. It start's at a certain tile and goes up, down, right and left if the coincident tile is of the same type and doesn't come back.
Tried different approaches, here is the one with additional boolean table indicating whether [x][y] tile was visited and replacing marked tiles after recursion is done:
public void fillRec(Tile t, int column, int row) {
if (affected[column][row] || t.getName() != pattern)
return;
/*t.replaceMe(editor.currentTileButton.spawnTile(column, row,
editor.tileMap));*/
affected[column][row] = true;
if (column < editor.tileMap.tilesX - 1) {
fillRec(editor.tileMap.tiles[column + 1][row], column + 1, row);
}
if (column > 0) {
fillRec(editor.tileMap.tiles[column - 1][row], column - 1, row);
}
if (row < editor.tileMap.tilesY - 1) {
fillRec(editor.tileMap.tiles[column][row + 1], column, row + 1);
}
if (row > 0) {
fillRec(editor.tileMap.tiles[column][row - 1], column, row - 1);
}
}
This works fine with ~75x75 map, so did functions replacing tile and doing other heavy stuff in their bodies.

Yes, each method call uses up a Stack frame. If you want to use large scale recursion in Java, you'll need to use a Trampoline - which can swap stack space for heap space. A trampoline typically has two states
completed
more work to do
The completed state holds the final result, and the more work to do can be implemented with a Supplier (in Java 8) or similar construct, that makes the next recursive call. The Trampoline implementation should manages the calls to your method, and iterates rather than recurses.
Here is a simple looping example with a Trampoline.
Trampoline<Integer> loop(int times,int sum){
if(times==0)
return Trampoline.done(sum);
else
return Trampoline.more(()->loop(times-1,sum+times));
}
To make the call to loop
loop(100,10).result();
Note the method returns a lazy Trampoline Object immediately (i.e. it doesn't perform the summing), and the Trampoline runs through the simple summing algorithm when result is called - in an iterative, rather than recursive fashion.
There is a Trampoline implementation in a library I wrote called cyclops-trampoline that you can use. Or if you prefer here is how to roll your own (this implementation makes use of a nice technique by Mario Fusco of managing the trampoline iteration in a Java 8 Stream).
public interface Trampoline<T> {
default Trampoline<T> bounce(){
return this;
}
T result();
default boolean complete() {
return true;
}
public static <T> Trampoline<T> done(T result) {
return () -> result;
}
public static <T> Trampoline<T> more(Trampoline<Trampoline<T>> trampoline) {
return new Trampoline<T>() {
#Override
public boolean complete() {
return false;
}
#Override
public Trampoline<T> bounce() {
return trampoline.result();
}
public T result() {
return trampoline(this);
}
T trampoline(Trampoline<T> trampoline) {
return Stream.iterate(trampoline,Trampoline::bounce)
.filter(Trampoline::complete)
.findFirst()
.get()
.result();
}
};
}
}

It depends on how much data these functions place on the stack in relation to the configured (or default) stack size, so it's not only the stack size used by the arguments to the function call.
So yes, it does not sound unnormal. You should play with the stack size or implement it differently.

That sounds normal. If you don't specify otherwise, the default Java stack size is 1Mb or less, depending on your JVM and execution platform. A stack overflow at ~10,000 recursive calls sounds quite plausible for a stack with the default size.
You can change the JVM's default stack size with the -Xss option; e.g. -Xss10m sets the default size to 10MB.
You can also specify a thread stack size directly via the Thread constructor.
However, this does illustrate a point that is not obvious to new Java programmers. Unlike typical functional programming languages (and many others) the standard Java implementations do not do "tail call optimization". This means that a recursive call sequence always needs stack space that is proportional to the maximum recursion depth.
This is a potential problem for programmers who prefer to use recursion rather than iteration. Unfortunately, if your data is such that deep recursion is a possibility, you really need to convert to an iterative solution. (Or find some other way to move the "recursion state" off the stack.)

Is it possible to write a loop in Java that does not actually use a loop method?

I was curious if, in Java, you could create a piece of code that keeps iterating a piece of code without the use of a for or while loop, and if so, what methods could be used to solve this?

Look at recursion. A recursive function is a function which calls itself until a base case is reached. An example is the factorial function:
int fact(int n)
{
int result;
if(n==1)
return 1;
result = fact(n-1) * n;
return result;
}

You could use the Java 8 Streams methods for iterating over the elements of a Collection. Among the methods you can use are filtering methods (get all the elements of a collection that satisfy some conditions), mapping methods (map a Collection of one type to a Collection of another type) and aggregation methods (like computing the sum of all the elements in a Collection, based on some integer member of the Element stored in the collection).
For example - Stream forEach :
List<Element> = new ArrayList<Element>();
...
list.stream().forEach (element -> System.out.println(element));
Or you can do it without a Stream :
List<Element> = new ArrayList<Element>();
...
list.forEach (element -> System.out.println(element));

Another variant of recursion:
public class LoopException extends Exception {
public LoopException(int i, int max) throws LoopException {
System.out.println( "Loop variable: "+i);
if (i < max)
throw new LoopException( i+1, max );
}
}
Of course this is just a bit of fun, don't ever do it for real.

Java does not have a goto statement (that's a lie), so that way is a dead end.
But you could always make a piece of code endlessly iterate using recursion. Old factorial function seems to be the favorite, but since it is not an infinite loop, I will go for this simple function:
int blowMyStack(int a) {
return blowMyStack(a + 1);
}
There will be many ways to do this using various features of the language. But it always falls to an underlying recursion.

In case you're referring of something like C's goto, the answer is no.
In other cases, you can use recursive functions.

Java Lambda Expression for if condition - not expected here

Consider the case where an if condition needs to evaluate an array or a List. A simple example: check if all elements are true. But I'm looking for generic way to do it
Normally I'd do it like that:
boolean allTrue = true;
for (Boolean bool : bools){
if (!bool) {
allTrue = false;
break;
}
}
if (allTrue){
// do Something
}
But now I'd like to hide it into my if condition. I tried using Lambda Expressions for this, but it's not working:
if (() -> {
for (Boolean bool : bools)
if (!bool)
return false;
return true;
}){
// do something
}
If this were working I could do something more complicated like
if (() -> {
int number = 0;
for (MyObject myobject : myobjects)
if (myObject.getNumber() != 0)
numbers++;
if (numbers > 2)
return false;
return true;
}{
//do something
}
Is there a better way to do it is it just a syntax error?
UPDATE
I'm not talking about the boolean array, rather looking for a generic way to achieve that.

You can write, given for instance a List<Boolean>:
if (!list.stream().allMatch(x -> x)) {
// not every member is true
}
Or:
if (list.stream().anyMatch(x -> !x)) {
// at least one member is false
}
If you have an array of booleans, then use Arrays.stream() to obtain a stream out of it instead.
More generally, for a Stream providing elements of (generic) type X, you have to provide a Predicate<? super X> to .{all,any}Match() (either a "full" predicate, or a lambda, or a method reference -- many things go). The return value of these methods are self explanatory -- I think.
Now, to count elements which obey a certain predicate, you have .count(), which you can combine with .filter() -- which also takes (whatever is) a Predicate as an argument. For instance checking if you have more than 2 elements in a List<String> whose length is greater than 5 you'd do:
if (list.stream().filter(s -> s.length() > 5).count() > 2L) {
// Yup...
}

Your problem
Your current problem is that you use directly a lambda expression. Lambdas are instances of functional interfaces. Your lambda does not have the boolean type, that's why your if does not accept it.
This special case's solution
You can use a stream from your collections of booleans here.
if (bools.stream().allMatch((Boolean b)->b)) {
// do something
}
It is actually much more powerful than this, but this does the trick I believe.
General hint
Basically, since you want an if condition, you want a boolean result.
Since your result depends on a collection, you can use Java 8 streams on collections.
Java 8 streams allow you to do many operations on a collection, and finish with a terminal operation. You can do whatever complicated stuff you want with Stream's non-terminal operations. In the end you need one of 2 things:
use a terminal operation that returns a boolean (such as allMatch, anyMatch...), and you're done
use any terminal operation, but use it in a boolean expression, such as myStream.filter(...).limit(...).count() > 2
You should have a look at your possibilities in this Stream documentation or this one.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Simple data stream: Go being super slow compared to Java - java

Related

Omitting an instance field at run time in Java

why does this Java method leak—and why does inlining it fix the leak?

Enough to stack overflow?

Is it possible to write a loop in Java that does not actually use a loop method?

Java Lambda Expression for if condition - not expected here

Categories

Resources