Is it safe to share a stream instance among multiple threads? - java

I have a set of keys.
class X {
private static String[] keys = {"k1", "k2", ... };
I have to extract values for the keys from a request. I think I can use map to extract values and create a list necessary objects in some request processing method like this:
public void processReq(Request req) {
...
Stream.of(keys).map(k-> new Pack(k, req.getHeader(k)));
But creating Stream per every request looks unnecessary task. If sharing Stream instance among multiple threads is safe, I think I can modify the code like this:
class X {
private static Stream<String> keys = Stream.of("k1", "k2", ...);
...
public void processReq(Request req) {
...
keys..map(k-> new Pack(k, req.getHeader(k)));
So, is sharing Stream instance among multiple threads like this safe?

Streams are not intended to be used more than once, even in the same thread. If you want to have a collection, use a List (or an array)
private static final List<String> keys = Arrays.asList("k1", "k2", ...);
This can be used multiple times.
List<Pack> packs = keys.stream()
.map(k-> new Pack(k, req.getHeader(k)))
.collect(Collectors.toList());
In your code, the new Pack or req.getHeader is where most of the time is spent.

No; it's not generally safe. "Unless the source was explicitly designed for concurrent modification, unpredictable or erroneous behavior may result from modifying the stream source while it is being queried... A stream should be operated on only once."

Related

Aggregate values and convert into single type within the same Java stream

I have a class with a collection of Seed elements. One of the method's return type of Seed is Optional<Pair<Boolean, String>>.
I'm trying to loop over all seeds, find if any boolean value is true and at the same time, create a set with all the String values. For instance, my input is in the form Optional<Pair<Boolean, String>>, the output should be Optional<Signal> where Signal is like:
class Signal {
public boolean exposure;
public Set<String> alarms;
// constructor and getters (can add anything to this class, it's just a bag)
}
This is what I currently have that works:
// Seed::hadExposure yields Optional<Pair<Boolean, String>> where Pair have key/value or left/right
public Optional<Signal> withExposure() {
if (seeds.stream().map(Seed::hadExposure).flatMap(Optional::stream).findAny().isEmpty()) {
return Optional.empty();
}
final var exposure = seeds.stream()
.map(Seed::hadExposure)
.flatMap(Optional::stream)
.anyMatch(Pair::getLeft);
final var alarms = seeds.stream()
.map(Seed::hadExposure)
.flatMap(Optional::stream)
.map(Pair::getRight)
.filter(Objects::nonNull)
.collect(Collectors.toSet());
return Optional.of(new Signal(exposure, alarms));
}
Now I have time to make it better because Seed::hadExposure could become and expensive call, so I was trying to see if I could make all of this with only one pass. I've tried (some suggestions from previous questions) with reduce, using collectors (Collectors.collectingAndThen, Collectors.partitioningBy, etc.), but nothing so far.
It's possible to do this in a single stream() expression using map to convert the non-empty exposure to a Signal and then a reduce to combine the signals:
Signal signal = exposures.stream()
.map(exposure ->
new Signal(
exposure.getLeft(),
exposure.getRight() == null
? Collections.emptySet()
: Collections.singleton(exposure.getRight())))
.reduce(
new Signal(false, new HashSet<>()),
(leftSig, rightSig) -> {
HashSet<String> alarms = new HashSet<>();
alarms.addAll(leftSig.alarms);
alarms.addAll(rightSig.alarms);
return new Signal(
leftSig.exposure || rightSig.exposure, alarms);
});
However, if you have a lot of alarms it would be expensive because it creates a new Set and adds the new alarms to the accumulated alarms for each exposure in the input.
In a language that was designed from the ground-up to support functional programming, like Scala or Haskell, you'd have a Set data type that would let you efficiently create a new set that's identical to an existing set but with an added element, so there'd be no efficiency worries:
filteredSeeds.foldLeft((false, Set[String]())) { (result, exposure) =>
(result._1 || exposure.getLeft, result._2 + exposure.getRight)
}
But Java doesn't come with anything like that out of the box.
You could create just a single Set for the result and mutate it in your stream's reduce expression, but some would regard that as poor style because you'd be mixing a functional paradigm (map/reduce over a stream) with a procedural one (mutating a set).
Personally, in Java, I'd just ditch the functional approach and use a for loop in this case. It'll be less code, more efficient, and IMO clearer.
If you have enough space to store an intermediate result, you could do something like:
List<Pair<Boolean, String>> exposures =
seeds.stream()
.map(Seed::hadExposure)
.flatMap(Optional::stream)
.collect(Collectors.toList());
Then you'd only be calling the expensive Seed::hadExposure method once per item in the input list.

java 8 parallel stream Issue

_logger.info("data size : "+saleData.size);
saleData.parallelStream().forEach(data -> {
SaleAggrData saleAggrData = new SaleAggrData() {
{
setCatId(data.getCatId());
setRevenue(RoundUpUtil.roundUpDouble(data.getRevenue()));
setMargin(RoundUpUtil.roundUpDouble(data.getMargin()));
setUnits(data.getUnits());
setMarginRate(ComputeUtil.marginRate(data.getRevenue(), data.getMargin()));
setOtd(ComputeUtil.OTD(data.getRevenue(), data.getUnits()));
setSaleDate(data.getSaleDate());
setDiscountDepth(ComputeUtil.discountDepth(data.getRegularPrice(), data.getRevenue()));
setTransactions(data.getTransactions());
setUpt(ComputeUtil.UPT(data.getUnits(), data.getTransactions()));
}
};
salesAggrData.addSaleAggrData(saleAggrData);
});
The Issue with code is that when I am getting an response from DB, and while iterating using a parallel stream, the data size is different every time, while when using a sequential stream it's working fine.
I can't use a sequential Stream because the data is huge and it's taking time.
Any lead would be helpful.
You are adding elements in parallel to salesAggrData which I'm assuming is some Collection. If it's not a thread-safe Collection, no wonder you get inconsistent results.
Instead of forEach, why don't you use map() and then collect the result into some Collection?
List<SaleAggrData> salesAggrData =
saleData.parallelStream()
.map(data -> {
SaleAggrData saleAggrData = new SaleAggrData() {
{
setCatId(data.getCatId());
setRevenue(RoundUpUtil.roundUpDouble(data.getRevenue()));
setMargin(RoundUpUtil.roundUpDouble(data.getMargin()));
setUnits(data.getUnits());
setMarginRate(ComputeUtil.marginRate(data.getRevenue(), data.getMargin()));
setOtd(ComputeUtil.OTD(data.getRevenue(), data.getUnits()));
setSaleDate(data.getSaleDate());
setDiscountDepth(ComputeUtil.discountDepth(data.getRegularPrice(), data.getRevenue()));
setTransactions(data.getTransactions());
setUpt(ComputeUtil.UPT(data.getUnits(), data.getTransactions()));
}
};
return saleAggrData;
})
.collect(Collectors.toList());
BTW, I'd probably change that anonymous class instance creation, and use a constructor of a named class to create the SaleAggrData instances.

Iterate efficiently through 2 different List with same Type of Object(Java8)

I have two list containing an important number of object with each N elements:
List<Foo> objectsFromDB = {{MailId=100, Status=""}, {{MailId=200, Status=""}, {MailId=300, Status=""} ... {MailId=N , Status= N}}
List <Foo> feedBackStatusFromCsvFiles = {{MailId=100, Status= "OPENED"}, {{MailId=200, Status="CLICKED"}, {MailId=300, Status="HARDBOUNCED"} ... {MailId=N , Status= N}}
Little Insights:
objectFromDB retrieves row of my database by calling a Hibernate method.
feedBackStatusFromCsvFiles calls a CSVparser method and unmarshall to Java objects.
My entity class Foo has all setters and getters. So I know that the basic idea is to use a foreach like this:
for (Foo fooDB : objectsFromDB) {
for(Foo fooStatus: feedBackStatusFromCsvFiles){
if(fooDB.getMailId().equals(fooStatus.getMailId())){
fooDB.setStatus(fooStatus.getStatus());
}
}
}
As far as my modest knowledge of junior developer is, I think it is a very bad practice doing it like this? Should I implement a Comparator and use it for iterating on my list of objects? Should I also check for null cases?
Thanks to all of you for your answers!
Assuming Java 8 and considering the fact that feedbackStatus may contain more than one element with the same ID.
Transform the list into a Map using ID as key and having a list of elements.
Iterate the list and use the Map to find all messages.
The code would be:
final Map<String, List<Foo>> listMap =
objectsFromDB.stream().collect(
Collectors.groupingBy(item -> item.getMailId())
);
for (final Foo feedBackStatus : feedBackStatusFromCsvFiles) {
listMap.getOrDefault(feedBackStatus.getMailId(), Colleactions.emptyList()).forEach(item -> item.setStatus(feedBackStatus.getStatus()));
}
Use maps from collections to avoid the nested loops.
List<Foo> aList = new ArrayList<>();
List<Foo> bList = new ArrayList<>();
for(int i = 0;i<5;i++){
Foo foo = new Foo();
foo.setId((long) i);
foo.setValue("FooA"+String.valueOf(i));
aList.add(foo);
foo = new Foo();
foo.setId((long) i);
foo.setValue("FooB"+String.valueOf(i));
bList.add(foo);
}
final Map<Long,Foo> bMap = bList.stream().collect(Collectors.toMap(Foo::getId, Function.identity()));
aList.stream().forEach(it->{
Foo bFoo = bMap.get(it.getId());
if( bFoo != null){
it.setValue(bFoo.getValue());
}
});
The only other solution would be to have the DTO layer return a map of the MailId->Foo object, as you could then use the CVS list to stream, and simply look up the DB Foo object. Otherwise, the expense of sorting or iterating over both of the lists is not worth the trade-offs in performance time. The previous statement holds true until it definitively causes a memory constraint on the platform, until then let the garbage collector do its job, and you do yours as easy as possible.
Given that your lists may contain tens of thousands of elements, you should be concerned that you simple nested-loop approach will be too slow. It will certainly perform a lot more comparisons than it needs to do.
If memory is comparatively abundant, then the fastest suitable approach would probably be to form a Map from mailId to (list of) corresponding Foo from one of your lists, somewhat as #MichaelH suggested, and to use that to match mailIds. If mailId values are not certain to be unique in one or both lists, however, then you'll need something a bit different than Michael's specific approach. Even if mailIds are sure to be unique within both lists, it will be a bit more efficient to form only one map.
For the most general case, you might do something like this:
// The initial capacity is set (more than) large enough to avoid any rehashing
Map<Long, List<Foo>> dbMap = new HashMap<>(3 * objectFromDb.size() / 2);
// Populate the map
// This could be done more effciently if the objects were ordered by mailId,
// which perhaps the DB could be enlisted to ensure.
for (Foo foo : objectsFromDb) {
Long mailId = foo.getMailId();
List<Foo> foos = dbMap.get(mailId);
if (foos == null) {
foos = new ArrayList<>();
dbMap.put(mailId, foos);
}
foos.add(foo);
}
// Use the map
for (Foo fooStatus: feedBackStatusFromCsvFiles) {
List<Foo> dbFoos = dbMap.get(fooStatus.getMailId());
if (dbFoos != null) {
String status = fooStatus.getStatus();
// Iterate over only the Foos that we already know have matching Ids
for (Foo fooDB : dbFoos) {
fooDB.setStatus(status);
}
}
}
On the other hand, if you are space-constrained, so that creating the map is not viable, yet it is acceptable to reorder your two lists, then you should still get a performance improvement by sorting both lists first. Presumably you would use Collections.sort() with an appropriate Comparator for this purpose. Then you would obtain an Iterator over each list, and use them to iterate cooperatively over the two lists. I present no code, but it would be reminiscent of the merge step of a merge sort (but the two lists are not actually merged; you only copy status information from one to the other). But this makes sense only if the mailIds from feedBackStatusFromCsvFiles are all distinct, for otherwise the expected result of the whole task is not well determined.
your problem is merging Foo's last status into Database objects.so you can do it in two steps that will make it more clearly & readable.
filtering Foos that need to merge.
merging Foos with last status.
//because the status always the last,so you needn't use groupingBy methods to create a complex Map.
Map<String, String> lastStatus = feedBackStatusFromCsvFiles.stream()
.collect(toMap(Foo::getMailId, Foo::getStatus
, (previous, current) -> current));
//find out Foos in Database that need to merge
Predicate<Foo> fooThatNeedMerge = it -> lastStatus.containsKey(it.getMailId());
//merge Foo's last status from cvs.
Consumer<Foo> mergingFoo = it -> it.setStatus(lastStatus.get(it.getMailId()));
objectsFromDB.stream().filter(fooThatNeedMerge).forEach(mergingFoo);

How can we have entry-level locking in HashMap?

Since, statckoverflow does not allow add more thing to your question in the original question (you can only add comment, not code) I am asking a sequential question to my original question here:
Can we use Synchronized for each entry instead of ConcurrentHashMap?
The problem is very simple, and I don't know why for such a simple problem that probably many people have encountered before me I should spend this much time :/
The problem is: I have a hashmap, I want when one thread is working on one of the entries of the hashMap, no any other thread access that object, and I don't want to lock the whole hashMap.
I know that java provides ConcurrentHashMap, but ConcurrentHashMap does not solve the problem, when you want to do thing more complex than simple put and get. Even newly added functions (in Java 8) like merge is not enough for complex scenarios.
For example:
Suppose I want a hash map that maps strings to ArrayLists. Then for example suppose I want to do this:
For key k, if there is any entry, add newString to its ArrayList, but if there is no entry for k, create the entry for k such that its ArrayList has newString.
I was thinking I can do it as follows:
ArrayList<String> tm =new ArrayList<String>();
tm.add(newString);
Object result = map.putIfAbsent(k, tm);
if (result != null)
{
map.get(k).add(newString);
}
But it does not work, why? suppose putIfAbset return something other than null, then it means that map already has an entry with key k, so I will try to add newString to the ArrayList of the already existing entry, but right before adding, another thread may remove the entry, and then I will get NullPointerException!
So, I found it very difficult to code such things properly.
But I was thinking that if I can simply lock that entry, life will be wonderful!
In my previous post I suggested something very simple that in fact eliminates the need for concurrentHashMap, and provide entry-level locking but some said that is not true because Long is not immutable ... that I didn't get it well.
Now, I implemented and tested it, it looks good to me, but I don't know why other more experienced developers here told me it is not thread-safe :(
This is the exact code that I tested:
MainThread:
import java.util.HashMap;
public class mainThread {
public static HashMap<String, Long> map = new HashMap<String, Long>();
public static void main (String args[])
{
map.put("k1", new Long(32));
synchronized(map.get("k1"))
{
Thread t = new Thread(new threadA());
t.start();
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
ThreadA:
public class ThreadA implements Runnable {
#Override
public void run() {
mainThread.map.put("k2", new Long(21));
System.out.println(mainThread.map.get("k2"));
synchronized (mainThread.map.get("k1")) {
System.out.println("Insdie synchronized of threadA");
}
}
}
It works fine! It prints 21, and after 5 seconds, that mainThread release the lock of map.get("k1"), it prints "Insdie synchronized of threadA"
So, why using this simple approach we cannot provide entry-level locking?! why concurrency should be that much complicated Lol (just kidding)
First of all, there is no standard map implementation that I am aware of that provides entry level locking.
But I think you can avoid the need for that. For example
UPDATE ... corrected mistake
ArrayList<String> tm = new ArrayList<String>();
ArrayList<String> old = map.putIfAbsent(k, tm);
if (old != null) {
tm = old;
}
synchronized (tm) {
// can now add / remove entries and this will appear as an atomic
// actions to other threads that are using `synchronized` to
// access or update the list
tm.add(string1);
tm.add(string2);
}
Yes it is possible that another thread will update the list in the hashmap entry between this thread (possibly) inserting it, and this thread locking it. However, that doesn't matter. The (corrected) putIfAbsent and the test that follows ensures that everyone will use and lock the same list.
(Assumption: that all threads use this logic when inserting / updating an entry.)
Atomically removing the list if it becomes empty is difficult, but I would argue that it is usually unnecessary to do that.
UPDATE 2
There is a better way:
ArrayList<String> tm = map.computeIfAbsent(k, ArrayList::new);
synchronized (tm) {
...
}
(Thanks Stuart)
UPDATE 3
We can do it with merger too.
Maybe, yes. Something like this:
ArrayList<String> tm = new ArrayList<String>;
tm.add(...);
...
map.merge(key, tm, (oldV, newV) -> {oldV.addAll(newV); return oldV});
The downside is that you are double-handling all the elements of tm; i.e. adding to 2 separate lists (one of which you throw way).
But you could also do this:
map.merge(key, tm, (oldV, newV) -> {
oldV.removeAll(newV);
return oldV.size() == 0 ? null : oldV}
);
The thing that concerns me is that the javadoc does not state explicitly that the value oldV will be locked while this is happening. It says:
"The entire method invocation is performed atomically. Some attempted update operations on this map by other threads may be blocked while computation is in progress ..."
... but it does not explicitly state that there is mutual exclusion on the value while this is happening. (For instance, mixing this approach with putIfAbsent / computeIfAbsent and an explicit synchronized block would most likely be hazardous. The locking would most likely be on different objects.)
Well, the first huge problem is that you don't even attempt to do any locking for the put calls. Those aren't automatically threadsafe for a regular HashMap. You seem to be under the impression that separate HashMap entries are completely independent automatically, but HashMaps don't work that way.
Even if you fix the put problem (probably requiring ConcurrentHashMap or a whole-map lock anyway), the parts you actually are locking for aren't locking safely.
Say thread 1 puts the entry "k1": 1, and thread 2 tries to get("k1"). What will thread 2 see?
Well, thread 2 doesn't even try to acquire any locks until the get call is already done. The get call is completely unprotected! Without any happens-before relation between the put and the get, the get call might not see the entry, or it might see the entry, or it might see the map in an inconsistent intermediate state and crash horribly.
Synchronizing on the result of the get call is synchronizing far too late.
I think I have finally found the solution using merge function. I provide an example, I will edit this post to make it easier for others to read, but I just post now to have your feedback.
Here is the example of a ConcurrentHashMap that has ConcurrentHashMaps as its values (23 and 1 are just two random value for sake of example):
Long intialValue = new Long(3);
Long addedValue = new Long(10);
Long removingValue = new Long (5);
ConcurrentHashMap<Integer, ConcurrentHashMap<Integer, Long>> map = new ConcurrentHashMap<>();
//Initialization....
ConcurrentHashMap<Integer, Long> i = new ConcurrentHashMap<Integer, Long>();
i.put(1, intialValue);
map.put(23, i);
//......
//addition
ConcurrentHashMap<Integer, Long> c = new ConcurrentHashMap<Integer, Long>();
c.put(1, addedValue);
map.merge(23, c, (oldHashMap, newHashMap) -> {
oldHashMap.merge (1, c.get(1), (oldV, newV) -> {
if (oldV < newV) return newV; else return oldV;
});
return oldHashMap;
});
//removal
// we want to remove entry 1 from the inner HashMap if its value is less than 2, and if the entry is empty remove the entry from the outer HashMap
ConcurrentHashMap<Integer, Long> r = new ConcurrentHashMap<Integer, Long>();
r.put(1, removingValue);
map.merge (23, r, (oldHashMap, newHashMap) -> {
oldHashMap.merge(1, newHashMap.get(1), (oldV, newV) -> {if (oldV < newV) return newV; else return oldV;});
return oldHashMap;
});
map.remove(23, r);
if (map.containsKey(23))
{
System.out.println("Map contains key 23");
if (map.get(23).containsKey(1))
{
System.out.println("The value for <23,1> is " + map.get(23).get(1));
}
}
This is what the code does:
initialization: first creates the map and puts another map into it for key 23 which has value initialValue for key 1.
addition: Then checks, 1) if for key 23, there is no value, it puts a map that has value addedValue for key 1, otherwise 2) if key 23 has already a value, it checks its value if the value has a value less than addedValue, it overwrites it with the addedValue, otherwise it leaves it alone.
removal: Finally, it checks, if for key 23, and for key 1 in the value for 23, the value is less than removingValue, it removes that, and if the hashMap of key 23 is empty after this removal, it removes key 23 from the main map.
I tested this code. So for example:
for 3, 10, 5, the final value for <23,1> is 10.
for 20, 10, 11, the final value is 20.
for 3, 10, 11, the final value is nothing,because entry 23 is
removed.
I hope it is thread-safe as I just used merge method. One disadvantage of this code is that I am adding something to map and then remove it, just because ConcurrentHashMap does not have a method for remove similar to merge. I wish I had this method:
map.remove (keyToRemove, condition)

Conditional mapping to new objects with a Java Stream

I have a stream of objects (a List) and want to create new objects from that stream, to be inserted into a Set. However, two or more objects in the incoming List may hash to the same Key in the Set, in which case I want to append a String from the nth List object to the one already in the Set instead of creating a new one.
Something like this, but in functional form:
HashSet<ClassB> mySet = new HashSet<>();
for (ClassA instanceA : classAList) {
if (mySet.contains(ClassB.key(instanceA))) { //static method call to find the key
mySet.get(instanceA).appendFieldA(instanceA.getFieldA());
} else {
mySet.add(new ClassB(instanceA));
}
}
return mySet;
In functional form I though of creating something like this:
List classAList = new ArrayList<>();
classAList.stream()
.map(instanceA -> new ClassB(instanceA))
.collect(Collectors.toSet());
But then of course that ignores the hashmap and I don't get to combine fields my multiple instances of ClassA that would all resolve to the same ClassB. I'm not sure how to put that in there. Do I need ignore the map() call and create a custom collector instead to do this job? There seems to be more than one way to do this, but I'm new to Streams.
It’s hard to understand what you actually want as your code example does not work at all. The problem is that a Set does not work like a Map, you can’t ask it for the contained equivalent object. Besides that, you are using different objects for your contains(…) and get(…) call. Also, it’s not clear what the difference between ClassB.key(instanceA) and new ClassB(instanceA) is.
Let’s try to redefine it:
Suppose we have a key type Key and a method Key.key(instanceA) to define the group candidates. Then we have a ClassB which is the resulting type, created via new ClassB(instanceA) for a single (or primary ClassA instance), having an .appendFieldA(…) method to receive a value of another ClassA instance when merging two group members. Then, the original (pre Java 8) code will look as follows:
HashMap<Key, ClassB> myMap = new HashMap<>();
for(ClassA instanceA: classAList) {
Key key=Key.key(instanceA);
if(myMap.containsKey(key)) {
myMap.get(key).appendFieldA(instanceA.getFieldA());
} else {
myMap.put(key, new ClassB(instanceA));
}
}
Then, myMap.values() provides you a collection of the ClassB instances. If it has to be a Set, you may create it via
Set<ClassB> result=new HashSet<>(myMap.values());
Note that this also works, when Key and ClassB are identical as it seems to be in your code, but you may ask youself, whether you really need both, the instance created via .key(instanceA) and the one created via new ClassB(instanceA)…
This can be simplified via the Java 8 API as:
for(ClassA instanceA: classAList) {
myMap.compute(Key.key(instanceA), (k,b)-> {
if(b==null) b=new ClassB(instanceA);
else b.appendFieldA(instanceA.getFieldA());
return b;
});
}
or, if you want it look even more function-stylish:
classAList.forEach(instanceA ->
myMap.compute(Key.key(instanceA), (k,b)-> {
if(b==null) b=new ClassB(instanceA);
else b.appendFieldA(instanceA.getFieldA());
return b;
})
);
For a stream solution, there is the problem, that a merge function will get two instances of the same type, here ClassB, and can’t access the ClassA instance via the surrounding context like we did with the compute solution above. For a stream solution, we need a method in ClassB which returns that first ClassA instance, which we passed to its constructor, say getFirstInstanceA(). Then we can use:
Map<Key, ClassB> myMap = classAList.stream()
.collect(Collectors.toMap(Key::key, ClassB::new, (b1,b2)->{
b1.appendFieldA(b2.getFirstInstanceA().getFieldA());
return b1;
}));
You can group the entries into a map that maps the hashed key to the list of elements and then call map again to convert that map into the set you are after. Something like this:
List classAList = new ArrayList<>();
classAList.stream()
.collect(Collectors.groupingBy(instanceA -> ClassB.key(instanceB)))
.entrySet()
.map(entry -> entry.getValue().stream()
.map(instanceA -> new ClassB(instanceA))
.reduce(null, (a,b) -> a.appendFieldA(b)))
.collect(Collectors.toSet());

Categories