Related
I have read a lot about Java 8 streams lately, and several articles about lazy loading with Java 8 streams specifically: here and over here. I can't seem to shake the feeling that lazy loading is COMPLETELY useless (or at best, a minor syntactic convenience offering zero performance value).
Let's take this code as an example:
int[] myInts = new int[]{1,2,3,5,8,13,21};
IntStream myIntStream = IntStream.of(myInts);
int[] myChangedArray = myIntStream
.peek(n -> System.out.println("About to square: " + n))
.map(n -> (int)Math.pow(n, 2))
.peek(n -> System.out.println("Done squaring, result: " + n))
.toArray();
This will log in the console, because the terminal operation, in this case toArray(), is called, and our stream is lazy and executes only when the terminal operation is called. Of course I can also do this:
IntStream myChangedInts = myIntStream
.peek(n -> System.out.println("About to square: " + n))
.map(n -> (int)Math.pow(n, 2))
.peek(n -> System.out.println("Done squaring, result: " + n));
And nothing will be printed, because the map isn't happening, because I don't need the data. Until I call this:
int[] myChangedArray = myChangedInts.toArray();
And voila, I get my mapped data, and my console logs. Except I see zero benefit to it whatsoever. I realize I can define the filter code long before I call to toArray(), and I can pass around this "not-really-filtered stream around), but so what? Is this the only benefit?
The articles seem to imply there is a performance gain associated with laziness, for example:
In the Java 8 Streams API, the intermediate operations are lazy and their internal processing model is optimized to make it being capable of processing the large amount of data with high performance.
and
Java 8 Streams API optimizes stream processing with the help of short circuiting operations. Short Circuit methods ends the stream processing as soon as their conditions are satisfied. In normal words short circuit operations, once the condition is satisfied just breaks all of the intermediate operations, lying before in the pipeline. Some of the intermediate as well as terminal operations have this behavior.
It sounds literally like breaking out of a loop, and not associated with laziness at all.
Finally, there is this perplexing line in the second article:
Lazy operations achieve efficiency. It is a way not to work on stale data. Lazy operations might be useful in the situations where input data is consumed gradually rather than having whole complete set of elements beforehand. For example consider the situations where an infinite stream has been created using Stream#generate(Supplier<T>) and the provided Supplier function is gradually receiving data from a remote server. In those kind of the situations server call will only be made at a terminal operation when it's needed.
Not working on stale data? What? How does lazy loading keep someone from working on stale data?
TLDR: Is there any benefit to lazy loading besides being able to run the filter/map/reduce/whatever operation at a later time (which offers zero performance benefit)?
If so, what's a real-world use case?
Your terminal operation, toArray(), perhaps supports your argument given that it requires all elements of the stream.
Some terminal operations don't. And for these, it would be a waste if streams weren't lazily executed. Two examples:
//example 1: print first element of 1000 after transformations
IntStream.range(0, 1000)
.peek(System.out::println)
.mapToObj(String::valueOf)
.peek(System.out::println)
.findFirst()
.ifPresent(System.out::println);
//example 2: check if any value has an even key
boolean valid = records.
.map(this::heavyConversion)
.filter(this::checkWithWebService)
.mapToInt(Record::getKey)
.anyMatch(i -> i % 2 == 0)
The first stream will print:
0
0
0
That is, intermediate operations will be run just on one element. This is an important optimization. If it weren't lazy, then all the peek() calls would have to run on all elements (absolutely unnecessary as you're interested in just one element). Intermediate operations can be expensive (such as in the second example)
Short-circuiting terminal operation (of which toArray isn't) make this optimization possible.
Laziness can be very useful for the users of your API, especially when the final result of the Stream pipeline evaluation might be very large!
The simple example is the Files.lines method in the Java API itself. If you don't want to read the whole file into the memory and you only need the first N lines, then just write:
Stream<String> stream = Files.lines(path); // lazy operation
List<String> result = stream.limit(N).collect(Collectors.toList()); // read and collect
You're right that there won't be a benefit from map().reduce() or map().collect(), but there's a pretty obvious benefit with findAny() findFirst(), anyMatch(), allMatch(), etc. Basically, any operation that can be short-circuited.
Good question.
Assuming you write textbook perfect code, the difference in performance between a properly optimized for and a stream is not noticeable (streams tend to be slightly better class loading wise, but the difference should not be noticeable in most cases).
Consider the following example.
// Some lengthy computation
private static int doStuff(int i) {
try { Thread.sleep(1000); } catch (InterruptedException e) { }
return i;
}
public static OptionalInt findFirstGreaterThanStream(int value) {
return IntStream
.of(MY_INTS)
.map(Main::doStuff)
.filter(x -> x > value)
.findFirst();
}
public static OptionalInt findFirstGreaterThanFor(int value) {
for (int i = 0; i < MY_INTS.length; i++) {
int mapped = Main.doStuff(MY_INTS[i]);
if(mapped > value){
return OptionalInt.of(mapped);
}
}
return OptionalInt.empty();
}
Given the above methods, the next test should show they execute in about the same time.
public static void main(String[] args) {
long begin;
long end;
begin = System.currentTimeMillis();
System.out.println(findFirstGreaterThanStream(5));
end = System.currentTimeMillis();
System.out.println(end-begin);
begin = System.currentTimeMillis();
System.out.println(findFirstGreaterThanFor(5));
end = System.currentTimeMillis();
System.out.println(end-begin);
}
OptionalInt[8]
5119
OptionalInt[8]
5001
Anyway, we spend most of the time in the doStuff method. Let's say we want to add more threads to the mix.
Adjusting the stream method is trivial (considering your operations meets the preconditions of parallel streams).
public static OptionalInt findFirstGreaterThanParallelStream(int value) {
return IntStream
.of(MY_INTS)
.parallel()
.map(Main::doStuff)
.filter(x -> x > value)
.findFirst();
}
Achieving the same behavior without streams can be tricky.
public static OptionalInt findFirstGreaterThanParallelFor(int value, Executor executor) {
AtomicInteger counter = new AtomicInteger(0);
CompletableFuture<OptionalInt> cf = CompletableFuture.supplyAsync(() -> {
while(counter.get() != MY_INTS.length-1);
return OptionalInt.empty();
});
for (int i = 0; i < MY_INTS.length; i++) {
final int current = MY_INTS[i];
executor.execute(() -> {
int mapped = Main.doStuff(current);
if(mapped > value){
cf.complete(OptionalInt.of(mapped));
} else {
counter.incrementAndGet();
}
});
}
try {
return cf.get();
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
return OptionalInt.empty();
}
}
The tests execute in about the same time again.
public static void main(String[] args) {
long begin;
long end;
begin = System.currentTimeMillis();
System.out.println(findFirstGreaterThanParallelStream(5));
end = System.currentTimeMillis();
System.out.println(end-begin);
ExecutorService executor = Executors.newFixedThreadPool(10);
begin = System.currentTimeMillis();
System.out.println(findFirstGreaterThanParallelFor(5678, executor));
end = System.currentTimeMillis();
System.out.println(end-begin);
executor.shutdown();
executor.awaitTermination(10, TimeUnit.SECONDS);
executor.shutdownNow();
}
OptionalInt[8]
1004
OptionalInt[8]
1004
In conclusion, although we don't squeeze a big performance benefit out of streams (considering you write excellent multi-threaded code in your for alternative), the code itself tends to be more maintainable.
A (slightly off-topic) final note:
As with programming languages, higher level abstractions (streams relative to fors) make stuff easier to develop at the cost of performance. We did not move away from assembly to procedural languages to object-oriented languages because the later offered greater performance. We moved because it made us more productive (develop the same thing at a lower cost). If you are able to get the same performance out of a stream as you would do with a for and properly written multi-threaded code, I would say it's already a win.
I have a real example from our code base, since I'm going to simplify it, not entirely sure you might like it or fully grasp it...
We have a service that needs a List<CustomService>, I am suppose to call it. Now in order to call it, I am going to a database (much simpler than reality) and obtaining a List<DBObject>; in order to obtain a List<CustomService> from that, there are some heavy transformations that need to be done.
And here are my choices, transform in place and pass the list. Simple, yet, probably not that optimal. Second option, refactor the service, to accept a List<DBObject> and a Function<DBObject, CustomService>. And this sounds trivial, but it enables laziness (among other things). That service might sometimes need only a few elements from that List, or sometimes a max by some property, etc. - thus no need for me to do the heavy transformation for all elements, this is where Stream API pull based laziness is a winner.
Before Streams existed, we used to use guava. It had Lists.transform( list, function) that was lazy too.
It's not a fundamental feature of streams as such, it could have been done even without guava, but it's s lot simpler that way. The example here provided with findFirst is great and the simplest to understand; this is the entire point of laziness, elements are pulled only when needed, they are not passed from an intermediate operation to another in chunks, but pass from one stage to another one at a time.
One interesting use case that hasn't been mentioned is arbitrary composition of operations on streams, coming from different parts of the code base, responding to different sorts of business or technical requisites.
For example, say you have an application where certain users can see all the data but certain other users can only see part of it. The part of the code that checks user permissions can simply impose a filter on whatever stream is being handed about.
Without lazy streams, that same part of the code could be filtering the already realized full collection, but that may have been expensive to obtain, for no real gain.
Alternatively, that same part of the code might want to append its filter to a data source, but now it has to know whether the data comes from a database, so it can impose an additional WHERE clause, or some other source.
With lazy streams, it's a filter that can be implemented ever which way. Filters imposed on streams from the database can translate into the aforementioned WHERE clause, with obvious performance gains over filtering in-memory collections resulting from whole table reads.
So, a better abstraction, better performance, better code readability and maintainability, sounds like a win to me. :)
Non-lazy implementation would process all input and collect output to a new collection on each operation. Obviously, it's impossible for unlimited or large enough sources, memory-consuming otherwise, and unnecessarily memory-consuming in case of reducing and short-circuiting operations, so there are great benefits.
Check the following example
Stream.of("0","0","1","2","3","4")
.distinct()
.peek(a->System.out.println("after distinct: "+a))
.anyMatch("1"::equals);
If it was not behaving as lazy you would expect that all elements would pass through the distinct filtering first. But because of lazy execution it behaves differently. It will stream the minimum amount of elements needed to calculate the result.
The above example will print
after distinct: 0
after distinct: 1
How it works analytically:
First "0" goes until the terminal operation but does not satisfy it. Another element must be streamed.
Second "0" is filtered through .distinct() and never reaches terminal operation.
Since the terminal operation is not satisfied yet, next element is streamed.
"1" goes through terminal operation and satisfies it.
No more elements need to be streamed.
I have a very large file (10^8 lines) with counts of events as follows,
A 10
B 11
C 23
A 11
I need to accumulate the counts for each event, so that my map contains
A 21
B 11
C 23
My current approach:
Read the lines, maintain a map, and update the counts in the map as follows
updateCount(Map<String, Long> countMap, String key,
Long c) {
if (countMap.containsKey(key)) {
Long val = countMap.get(key);
countMap.put(key, val + c);
} else {
countMap.put(key, c);
}
}
Currently this is the slowest part of the code, (takes around 25 ms).
Note that the map is based on MapDB, but I doubt that updates are slow due to that (are they?)
This is the mapdb configs for the map,
DBMaker.newFileDB(dbFile).freeSpaceReclaimQ(3)
.mmapFileEnablePartial()
.transactionDisable()
.cacheLRUEnable()
.closeOnJvmShutdown();
Are there ways to speed this up?
EDIT:
The number of unique keys is of the order of the pages in wikipedia. The data is actually page traffic data from here.
You might try
class Counter {
long count;
}
void updateCount(Map<String, Counter> countMap, String key, int c) {
Counter counter = countMap.get(key);
if (counter == null) {
counter = new Counter();
countMap.put(key, counter);
counter.count = c;
} else {
counter.count += c;
}
}
This does not create many Long wrappers, but just allocates Counters the number of keys.
Note: do not create Long's. Above I made c an int to not oversee long/Long.
As a starting point, I'd suggest thinking about:
What is yardstick by which you're saying that 25ms is actually an unreasonable amount of time for the amount of data involved and for a generic map implementation? if you quantify that, it might help you work out if there is anything wrong.
How much time is being spent re-hashing the map versus other operations (e.g. calculation of hash codes on each put)?
What do your "events" as you call them consist of? How many unique events-- and hence unique keys-- are there? How are keys to the map being generated, and is there a more efficient way to do so? (In a standard hash map, for example, you create additional objects for each association, and actually store the key objects increasing the memory footprint.)
Depending on the answers to the previous, you could potentially roll a more efficient map structure yourself (see this example that you might be able to adapt). Essentially, you need to look specifically at what is taking the time (e.g. hash code calculation per put / cost of rehashing) and try and optimise that part.
If you are using a TreeMap, there are performance tuning options like
The number of entries in each node.
You could also use specific key and value serializer that will speed up the serialization and de-serilization.
You could use Pump mode to build the tree, which is very very fast. But one caveat is that this is useful when you are building a new map from scratch. You can find the full example here
https://github.com/jankotek/MapDB/blob/master/src/test/java/examples/Huge_Insert.java
Context: I'm working on an analytics system for an ordering system. There are about 100,000 orders per day and the analytics need to run for the last N (say, 100) days months. The relevant data fits in memory. After N days, all orders are evicted from the memory cache, with an entire day in the past being evicted. Orders can be created or updated.
A traditional approach would use a ConcurrentHashMap<Date, Queue<Order>>. Every day, values for keys representing dates more than N days in the past will be deleted. But, of course, the whole point of using Guava is to avoid this. EDIT: changed Map to ConcurrentHashMap, see the end of the question for rationale.
With Guava collections, a MultiMap <Date, Order> would be simpler. Eviction is similar, implemented explicitly.
While the Cache implementation looks appealing (after all, I am implementing a Cache), I'm not sure about the eviction options. Eviction only happens once a day and its best initiated from outside the cache, I don't want the cache to have to check the age of an order. I'm not even sure if the cache would use a MultiMap, which I think it's a suitable data structure in this case.
Thus, my question is: is it possible to use a Cache that uses and exposes the semantics of a MultiMap and allows evictions controlled from outside itself, in particular with the rule I need ("delete all orders older than N days") ?
As an important clarification, I'm not interested in a LoadingCache but I do need bulk loads (if the application needs to be restarted, the cache has to be populated, from the database, with the last N days of orders).
EDIT: Forgot to mention that the map needs to be concurrent, as orders come in they are evaluated live against the previous orders for the same customer or location etc.
EDIT2: Just stumbled over Guava issue 135. It looks like the MultiMap is not concurrent.
I would use neither a Cache nor a Multimap here. While I like and use both of them, there's not much to gain here.
You want to evict your entries manually, so the features of Cache don't really get used here.
You're considering ConcurrentHashMap<Date, Queue<Order>>, which is in a sense more powerful than a Multimap<Date, Order>.
I'd use a Cache, if I thought about different eviction criteria and if I felt like losing any of its entries anytime1 is fine.
You may find out that you need a ConcurrentMap<Date, Dequeue<Order>> or maybe ConcurrentMap<Date, YouOwnQueueFastSearchList<Order>> or whatever. This could probably be managed somehow by the Multimap, but IMHO it gets more complicated instead of simpler.
I'd ask myself "what do I gain by using Cache or Multimap here?". To me it looks like the plain old ConcurrentMap offers about everything you need.
1 By no means I'm suggesting this would happen with Guava. On the opposite, without an eviction reason (capacity, expiration, ...) it works just like a ConcurrentMap. It's just that what you've described feels more like a Map rather than a Cache.
IMHO The simplest thing to do is to include the date of the order in the order record. (I would expect it is a field already) As you only need to clean the cache once per day it doesn't have to be very efficient, just reasonably timely.
e.g.
public class Main {
static class Order {
final long time;
Order(long time) {
this.time = time;
}
public long getTime() {
return time;
}
}
final Map<String, Order> orders = new LinkedHashMap<String, Order>();
public void expireOrdersOlderThan(long dateTime) {
for (Iterator<Order> iter = orders.values().iterator(); iter.hasNext(); )
if (iter.next().getTime() < dateTime)
iter.remove();
}
private void generateOrders() {
for (int i = 0; i < 120000; i++) {
orders.put("order-" + i, new Order(i));
}
}
public static void main(String... args) {
for (int t = 0; t < 3; t++) {
Main m = new Main();
m.generateOrders();
long start = System.nanoTime();
for (int i = 0; i < 20; i++)
m.expireOrdersOlderThan(i * 1000);
long time = System.nanoTime() - start;
System.out.printf("Took an average of %.3f ms to expire 1%% of entries%n", time / 20 / 1e6);
}
}
}
prints
Took an average of 9.164 ms to expire 1% of entries
Took an average of 8.345 ms to expire 1% of entries
Took an average of 7.812 ms to expire 1% of entries
For 100,000 orders, I would expect this to take ~10 ms which is not so much to incur at a quiet period in the middle of the night.
BTW: You can make this more efficient if your OrderIds are sorted by time. ;)
Have you considered using a sorted list of some sort? It would allow you to pull entries until you hit one that's fresh enough to stay. Of course this assumes that's your primary functio. If what you most need is the O(1) access with a hashmap, my answer doesn't apply.
From the document of ConcurrentHashMap:
A hash table supporting full concurrency of retrievals and adjustable expected concurrency for updates.
Can we fully believe that ConcurrentHashMap does thread safe operation?
I am using ConcurrentHashMap for mapping key with their values.My key-value pair is:
Map<Integer,ArrayList<Double>> map1 = new ConcurrentHashMap();
The size of key ranges from [0,1000000]. I have 20 threads which can access/modify value corresponding to a key at a time. This not so frequent but that condition is possible. I am
getting an infinity from following method:
Double sum =0.0;
sum = sum + Math.exp(getScore(contextFeatureVector,entry.getValue())+constant);
contextFeatureVector and entry.getValue()are arraylist associated with a key.
[EDIT]
constant =0.0001
private double getScore(List<Double> featureVector,List<Double>weightVector) throws NullPointerException
{
double score =0.0;
int length = featureVector.size();
for (int i =0 ; i< length ; i++){
score = score + (featureVector.get(i)*weightVector.get(i));
}
return score;
}
Both featureVector<> and weightVector looks like
[-0.005554038592516575, 0.0048966974158881175, -0.05315976588195846, -0.030837804373964654, 0.014483064988148562, -0.018962129117649, -0.015221386014208877, 0.015825702365331477, -0.11363620479662287, 0.00802609847263844, -0.062106636476812194, 0.008108854471293185, -0.03193255218671684, 0.04949650992670292, -0.0545583154094599, -0.04873314092706468, 0.013534731656877033, 0.08433117163682455, 0.050310355477044114, -0.002420513353516017, -0.02708299928442614, -0.023489187394176294, -0.1277699782685597, -0.10071004855129333, 0.08649040730064464, -0.04940329664431305, -0.027481729446035053, -0.0571846057609884, -0.036738550618481455, -0.035608113682344365]
thus the value returned from getScore does not go exceptionally too large. it will be in
some thousands.
It is thread safe, but can use it an manner which is not thread safe.
I suspect you haven't investigated the problem enough to determine that there is a bug in a JDK library which has been used for more than a decade.
The data structure you use makes me believe there must some bug in your code. Most likely you are fetching the list from map and updating it:
map1.get(42).add(5);
Note that add(5) is not thread-safe as it operates on ordinary ArrayList. You either need thread safe ArrayList or replace(K key, V oldValue, V newValue) method.
If you read carefully through the guarantees ConcurrentHashMap is giving, you can use it effectively.
If you call Math.exp(...) on an input that is too large you will get an Infinity. That is the probably cause of your problems ... not some imagined problem with the thread safety.
I suggest that you add some trace code to see what
getScore(contextFeatureVector, entry.getValue())
is returning when sum becomes an Infinity. Beyond that, I don't think we'll be able to help without seeing more of your code.
The largest number that can be stored in a Java double is approximately exp(709). So if you pass anything larger than 709 into exp(), you should expect to get an arithmetic overflow.
From the document of ConcurrentHashMap:
A hash table supporting full concurrency of retrievals and adjustable expected concurrency for updates.
Can we fully believe that ConcurrentHashMap does thread safe operation?
I am using ConcurrentHashMap for mapping key with their values.My key-value pair is:
Map<Integer,ArrayList<Double>> map1 = new ConcurrentHashMap();
The size of key ranges from [0,1000000]. I have 20 threads which can access/modify value corresponding to a key at a time. This not so frequent but that condition is possible. I am
getting an infinity from following method:
Double sum =0.0;
sum = sum + Math.exp(getScore(contextFeatureVector,entry.getValue())+constant);
contextFeatureVector and entry.getValue()are arraylist associated with a key.
[EDIT]
constant =0.0001
private double getScore(List<Double> featureVector,List<Double>weightVector) throws NullPointerException
{
double score =0.0;
int length = featureVector.size();
for (int i =0 ; i< length ; i++){
score = score + (featureVector.get(i)*weightVector.get(i));
}
return score;
}
Both featureVector<> and weightVector looks like
[-0.005554038592516575, 0.0048966974158881175, -0.05315976588195846, -0.030837804373964654, 0.014483064988148562, -0.018962129117649, -0.015221386014208877, 0.015825702365331477, -0.11363620479662287, 0.00802609847263844, -0.062106636476812194, 0.008108854471293185, -0.03193255218671684, 0.04949650992670292, -0.0545583154094599, -0.04873314092706468, 0.013534731656877033, 0.08433117163682455, 0.050310355477044114, -0.002420513353516017, -0.02708299928442614, -0.023489187394176294, -0.1277699782685597, -0.10071004855129333, 0.08649040730064464, -0.04940329664431305, -0.027481729446035053, -0.0571846057609884, -0.036738550618481455, -0.035608113682344365]
thus the value returned from getScore does not go exceptionally too large. it will be in
some thousands.
It is thread safe, but can use it an manner which is not thread safe.
I suspect you haven't investigated the problem enough to determine that there is a bug in a JDK library which has been used for more than a decade.
The data structure you use makes me believe there must some bug in your code. Most likely you are fetching the list from map and updating it:
map1.get(42).add(5);
Note that add(5) is not thread-safe as it operates on ordinary ArrayList. You either need thread safe ArrayList or replace(K key, V oldValue, V newValue) method.
If you read carefully through the guarantees ConcurrentHashMap is giving, you can use it effectively.
If you call Math.exp(...) on an input that is too large you will get an Infinity. That is the probably cause of your problems ... not some imagined problem with the thread safety.
I suggest that you add some trace code to see what
getScore(contextFeatureVector, entry.getValue())
is returning when sum becomes an Infinity. Beyond that, I don't think we'll be able to help without seeing more of your code.
The largest number that can be stored in a Java double is approximately exp(709). So if you pass anything larger than 709 into exp(), you should expect to get an arithmetic overflow.