Map merge-function (shouldn't be called!?) - java

I don't get in this rather short method posted below why the merger() function is called (to determine what happens with values which are associated with the same key).
The method is supposed to group the list of search configurations by their application and sort the map keys (the applications by their names), as well as the map values (the search configurations by their names). Maybe the second stream isn't straight forward and I could/should use another approach, but nontheless I want to understand what's happening.
Output is something along the lines:
App1
Search Config Title1
Search Config Title2
App2
Search Config Title
App3
Search Config Title1
Search Config Title2
Search Config Title3
The ApplicationInfo implementation isn't overriding int hashCode() nor boolean equals(Object).
I would have thought that the map keys are always different in the second stream for each list of search configurations. However, in one particular situation the merge-function is called which I don't get why at all it's called.
public SortedMap<ApplicationInfo, List<SearchConfigInfo>> groupByApplications(final BusinessLogicProcessingContext ctx,
final List<SearchConfigInfo> searchConfigInfos) {
requireNonNull(ctx, "The processing context must not be null.");
requireNonNull(searchConfigInfos, "The search configuration informations must not be null.");
final String lang;
final RtInfoWithTitleComparator comp;
lang = ContextLanguage.get(ctx);
appComp = new RtInfoWithTitleComparator(lang);
final Map<ApplicationInfo, List<SearchConfigInfo>> appToSearchConfigs;
appToSearchConfigs = searchConfigInfos.stream()
.collect(groupingBy(searchConfig -> RtCache.getApplication(searchConfig.getApplicationGuid())));
return appToSearchConfigs.entrySet()
.stream()
.collect(toMap(Map.Entry::getKey,
p_entry -> _sortValueList(p_entry.getValue()),
merger(),
() -> new TreeMap<>(appComp)));
}
The general contract of a map is:
"An object that maps keys to values. A map cannot contain duplicate keys; each key can map to at most one value."
That's why I really wonder what happens in this case.
private static BinaryOperator<List<SearchConfigInfo>> merger() {
return (list1, list2) -> { System.out.println(RtCache.getApplication(list1.get(0).getApplicationGuid()).hashCode());
System.out.println(RtCache.getApplication(list2.get(0).getApplicationGuid()).hashCode());
System.out.println(list1.get(0).getApplicationGuid().equals(list2.get(0).getApplicationGuid()));
list1.addAll(list2);
return list1;
};
}
As I can see with the simple STDOUT debugging statements the hashCodes are different as well as they are not equal to each other.

Note that you're supplying a TreeMap as the result of the supplier function given to the Collectors.toMap() method (that's the last argument):
toMap(Map.Entry::getKey,
p_entry -> _sortValueList(p_entry.getValue()),
merger(),
() -> new TreeMap<>(appComp)));
(A supplier function provides the collection that the collector will use to contain the results - so in this case it always supplies a TreeMap.)
A TreeMap performs key comparisons with compareTo(), which is why you can get a key collision in this case - the collisions are taken in respect to the supplier map, not the map from which they originate.

Related

Java Stream - Retrieving repeated records from CSV

I searched the site and didn't find something similar. I'm newbie to using the Java stream, but I understand that it's a replacement for a loop command. However, I would like to know if there is a way to filter a CSV file using stream, as shown below, where only the repeated records are included in the result and grouped by the Center field.
Initial CSV file
Final result
In addition, the same pair cannot appear in the final result inversely, as shown in the table below:
This shouldn't happen
Is there a way to do it using stream and grouping at the same time, since theoretically, two loops would be needed to perform the task?
Thanks in advance.
You can do it in one pass as a stream with O(n) efficiency:
class PersonKey {
// have a field for every column that is used to detect duplicates
String center, name, mother, birthdate;
public PersonKey(String line) {
// implement String constructor
}
// implement equals and hashCode using all fields
}
List<String> lines; // the input
Set<PersonKey> seen = new HashSet<>();
List<String> unique = lines.stream()
.filter(p -> !seen.add(new PersonKey(p))
.distinct()
.collect(toList());
The trick here is that a HashSet has constant time operations and its add() method returns false if the value being added is already in the set, true otherwise.
What I understood from your examples is you consider an entry as duplicate if all the attributes have same value except the ID. You can use anymatch for this:
list.stream().filter(x ->
list.stream().anyMatch(y -> isDuplicate(x, y))).collect(Collectors.toList())
So what does the isDuplicate(x,y) do?
This returns a boolean. You can check whether all the entries have same value except the id in this method:
private boolean isDuplicate(CsvEntry x, CsvEntry y) {
return !x.getId().equals(y.getId())
&& x.getName().equals(y.getName())
&& x.getMother().equals(y.getMother())
&& x.getBirth().equals(y.getBirth());
}
I've assumed you've taken all the entries as String. Change the checks according to the type. This will give you the duplicate entries with their corresponding ID

Aggregate values and convert into single type within the same Java stream

I have a class with a collection of Seed elements. One of the method's return type of Seed is Optional<Pair<Boolean, String>>.
I'm trying to loop over all seeds, find if any boolean value is true and at the same time, create a set with all the String values. For instance, my input is in the form Optional<Pair<Boolean, String>>, the output should be Optional<Signal> where Signal is like:
class Signal {
public boolean exposure;
public Set<String> alarms;
// constructor and getters (can add anything to this class, it's just a bag)
}
This is what I currently have that works:
// Seed::hadExposure yields Optional<Pair<Boolean, String>> where Pair have key/value or left/right
public Optional<Signal> withExposure() {
if (seeds.stream().map(Seed::hadExposure).flatMap(Optional::stream).findAny().isEmpty()) {
return Optional.empty();
}
final var exposure = seeds.stream()
.map(Seed::hadExposure)
.flatMap(Optional::stream)
.anyMatch(Pair::getLeft);
final var alarms = seeds.stream()
.map(Seed::hadExposure)
.flatMap(Optional::stream)
.map(Pair::getRight)
.filter(Objects::nonNull)
.collect(Collectors.toSet());
return Optional.of(new Signal(exposure, alarms));
}
Now I have time to make it better because Seed::hadExposure could become and expensive call, so I was trying to see if I could make all of this with only one pass. I've tried (some suggestions from previous questions) with reduce, using collectors (Collectors.collectingAndThen, Collectors.partitioningBy, etc.), but nothing so far.
It's possible to do this in a single stream() expression using map to convert the non-empty exposure to a Signal and then a reduce to combine the signals:
Signal signal = exposures.stream()
.map(exposure ->
new Signal(
exposure.getLeft(),
exposure.getRight() == null
? Collections.emptySet()
: Collections.singleton(exposure.getRight())))
.reduce(
new Signal(false, new HashSet<>()),
(leftSig, rightSig) -> {
HashSet<String> alarms = new HashSet<>();
alarms.addAll(leftSig.alarms);
alarms.addAll(rightSig.alarms);
return new Signal(
leftSig.exposure || rightSig.exposure, alarms);
});
However, if you have a lot of alarms it would be expensive because it creates a new Set and adds the new alarms to the accumulated alarms for each exposure in the input.
In a language that was designed from the ground-up to support functional programming, like Scala or Haskell, you'd have a Set data type that would let you efficiently create a new set that's identical to an existing set but with an added element, so there'd be no efficiency worries:
filteredSeeds.foldLeft((false, Set[String]())) { (result, exposure) =>
(result._1 || exposure.getLeft, result._2 + exposure.getRight)
}
But Java doesn't come with anything like that out of the box.
You could create just a single Set for the result and mutate it in your stream's reduce expression, but some would regard that as poor style because you'd be mixing a functional paradigm (map/reduce over a stream) with a procedural one (mutating a set).
Personally, in Java, I'd just ditch the functional approach and use a for loop in this case. It'll be less code, more efficient, and IMO clearer.
If you have enough space to store an intermediate result, you could do something like:
List<Pair<Boolean, String>> exposures =
seeds.stream()
.map(Seed::hadExposure)
.flatMap(Optional::stream)
.collect(Collectors.toList());
Then you'd only be calling the expensive Seed::hadExposure method once per item in the input list.

Add object X to collection only if collection does not contain object with one matching property

I just started messing around with Java streams and I wrote something like this:
List<Device> devicesToDelete = new ArrayList<>();
List<Device> oldDeviceList = getCurrentDevices();
for (Device deviceFromOldList : oldDeviceList)
{
// part to simplify
boolean deviceNotExistOnDeleteList =
devicesToDelete.stream().noneMatch(nd -> nd.id == deviceFromOldList.id);
if (deviceNotExistOnDeleteList) {
devicesToDelete.add(deviceFromOldList);
}
// part to simplify end
}
Can it be simplified even more?
I'm not using Set because my Device class .equals() implementation compares all fields in that class. And here I need to compare only id field.
Just use a Map
Map<Object, Device> devicesToDelete = new HashMap<>();
List<Device> oldDeviceList = getCurrentDevices();
for(Device deviceFromOldList: oldDeviceList) {
devicesToDelete.putIfAbsent(deviceFromOldList.id, deviceFromOldList);
}
// in case you need a Collection:
Collection<Device> allDevicesToDelete = devicesToDelete.values();
putIfAbsent will only store the mapping if the key is not already present. This will get you the performance of hashing while only considering the ID.
You may change the type argument Object in Map<Object,Device> to whatever type your ID has, though it doesn’t matter for the operation, if all you need at the end, is the Collection<Device>.
You can use a Stream, e.g.
Map<Object, Device> devicesToDelete = getCurrentDevices().stream()
.collect(Collectors.toMap(
deviceFromOldList -> deviceFromOldList.id, Function.identity(), (a,b) -> a));
though, it’s debatable whether this is a necessary change. The loop is not bad.

I need a fast approach to search substrings

I'm reworking a framework and I need a fast algorithm to search for a substring in a collection of strings.
In short, a class is alerted when any event from a child association is triggered.
The event contains a path which is the path from the current class to the event that was triggered (usually a property change).
Each class has static bindings to paths that are loaded in a collection.
A binding consist of the actual path and a set of property names that are binded to the said path.
When a class receives an event it needs to check if any property name is binded to the event's path and triggers something on any property that has a binding.
Now, I'm only looking for the best collection type to store these bindings and the best way to search the event's path within the static bindings.
Right now my implementation is really basic. I am using a HashMap the key being the possible paths while the value is a set of properties binded to the path.
I am looping through the keyset and I use startsWith with the event's path. (The event's path needs to be a substring of a binding starting from index 0)
For exemple a path would look like this : "association1.association2.propertyInAssociation2" or "association1.association2.association3"
The binding map would look this this (not actually initialised like this it's just an example) :
HashMap<String, Set<String>> bindings = new HashMap<>();
{
bindings.put("association1.association2.propertyInAssociation2", new HashSet<>());
bindings.get("association1.association2.propertyInAssociation2").add("property1");
bindings.get("association1.association2.propertyInAssociation2").add("property2");
bindings.get("association1.association2.propertyInAssociation2").add("property3");
bindings.put("association1.association2.association3.propertyInAssociation3", new HashSet<>());
bindings.get("association1.association2.association3.propertyInAssociation3").add("property4");
bindings.get("association1.association2.association3.propertyInAssociation3").add("property5");
bindings.get("association1.association2.association3.propertyInAssociation3").add("property6");
bindings.get("association1.association2.association3.propertyInAssociation3").add("property7");
}
So for a class with these bindings, receiving an event with a path like "association1.association2.association3.propertyInAssociation3" or "association1.association2.association3"
Would both need to trigger something on property4, property5, property6 and property7.
Like I said, what I need is the most efficient way to search which properties (if any) are binded to an event's path.
I use Java 8 so I don't mind using lambda or whatever is available.
Reworking the bindings as collection of strings of any other format is not out of the question neither if it helps.
Thanks a lot!
Since you say
I am looping through the keyset and I use startsWith with the event's path. (The event's path needs to be a substring of a binding starting from index 0)
You should consider using a different data structure. A HashMap provides for efficient whole-key lookups, but it doesn't help much at all for partial-key lookups. You could consider instead using a SortedMap such as TreeMap. For String keys, SortedMap.tailMap() or SortedMap.subMap() will help you navigate directly to the keys you're looking for, if they are present.
Of course, insertions, deletions, and whole-key lookups are less efficient in a TreeMap than in a HashMap (on average); this is a tradeoff against the much better efficiency of key substring searching.
I would suggest a Stream API approach:
String path = "association1.association2.association3";
List<Map.Entry<String, Set<String>>> result =
bindings.entrySet()
.stream()
.filter(e -> e.getKey().contains(path))
.collect(Collectors.toList());
thanks for all the replies but I've changed my approach.
I will still use a HashMap but instead of adding :
"association1.association2.property"
and try to match partial keys i will add:
"association1"
"association1.association2"
"association1.association2.property"
This way I can efficiently use the hash and since the bindings are static and generated only once for each class type, changing the algorithm of the generation has no performance cost at all.
Thanks again for all your answers.
My suggestion is to use Parallel Stream or to implement your own Map.
Here the tests:
Solution proposed by John (TreeMap)
The best: 6 millliseconds
String path = "association1.association2.association3";
TreeMap<String, HashSet> bindings2 = new TreeMap<String, HashSet>(new Comparator<String>() {
#Override
public int compare(String o1, String o2) {
if (o1.equals(o2))
return 0;
if (o1.startsWith(o2))
return 1;
return -1;
}
});
{
bindings2.put("association1.association2.propertyInAssociation2", new HashSet<>());
bindings2.get("association1.association2.propertyInAssociation2").add("property1");
bindings2.get("association1.association2.propertyInAssociation2").add("property2");
bindings2.get("association1.association2.propertyInAssociation2").add("property3");
bindings2.put("association1.association2.association3.propertyInAssociation3", new HashSet<>());
bindings2.get("association1.association2.association3.propertyInAssociation3").add("property4");
bindings2.get("association1.association2.association3.propertyInAssociation3").add("property5");
bindings2.get("association1.association2.association3.propertyInAssociation3").add("property6");
bindings2.get("association1.association2.association3.propertyInAssociation3").add("property7");
}
// test 1
long time = System.currentTimeMillis();
Object result1 = bindings2.tailMap(path).entrySet().stream().filter(e -> e.getKey().contains(path))
.collect(Collectors.toList());
System.out.println(System.currentTimeMillis() - time);
System.out.println(result1);
Solution proposed by Stefan (Stream)
The best: 16 millliseconds
HashMap<String, Set<String>> bindings = new HashMap<>();
{
bindings.put("association1.association2.propertyInAssociation2", new HashSet<>());
bindings.get("association1.association2.propertyInAssociation2").add("property1");
bindings.get("association1.association2.propertyInAssociation2").add("property2");
bindings.get("association1.association2.propertyInAssociation2").add("property3");
bindings.put("association1.association2.association3.propertyInAssociation3", new HashSet<>());
bindings.get("association1.association2.association3.propertyInAssociation3").add("property4");
bindings.get("association1.association2.association3.propertyInAssociation3").add("property5");
bindings.get("association1.association2.association3.propertyInAssociation3").add("property6");
bindings.get("association1.association2.association3.propertyInAssociation3").add("property7");
}
// test 1
long time = System.currentTimeMillis();
String path = "association1.association2.association3";
List<Map.Entry<String, Set<String>>> result = bindings.entrySet().stream()
.filter(e -> e.getKey().contains(path)).collect(Collectors.toList());
System.out.println(System.currentTimeMillis() - time);
result.forEach(System.out::println);
Solution proposed by Me (parallel Stream)
The best: 9 millliseconds
HashMap<String, Set<String>> bindings = new HashMap<>();
{
bindings.put("association1.association2.propertyInAssociation2", new HashSet<>());
bindings.get("association1.association2.propertyInAssociation2").add("property1");
bindings.get("association1.association2.propertyInAssociation2").add("property2");
bindings.get("association1.association2.propertyInAssociation2").add("property3");
bindings.put("association1.association2.association3.propertyInAssociation3", new HashSet<>());
bindings.get("association1.association2.association3.propertyInAssociation3").add("property4");
bindings.get("association1.association2.association3.propertyInAssociation3").add("property5");
bindings.get("association1.association2.association3.propertyInAssociation3").add("property6");
bindings.get("association1.association2.association3.propertyInAssociation3").add("property7");
}
// test 1
long time = System.currentTimeMillis();
String path = "association1.association2.association3";
List<Map.Entry<String, Set<String>>> result = bindings.entrySet().stream().parallel()
.filter(e -> e.getKey().contains(path)).collect(Collectors.toList());
System.out.println(System.currentTimeMillis() - time);
result.forEach(System.out::println);
Tests are not reliable with few data. Personally I prefer the solution proposed by Fred.
UPDATE: as suggested by Dodgy here you can find a more formal test using JMH
https://github.com/venergiac/benchmark-jmh
git clone https://github.com/venergiac/benchmark-jmh.git
mvn install
java -jar target\benchmark-0.0.1-SNAPSHOT.jar
and the tests revealed a better throughput on parallel stream with hashmap, but we should perform these test on a more formal environlment with more time.

Removing duplicates from list where duplication logic is based on custom field

I have a list of following info
public class TheInfo {
private int id;
private String fieldOne;
private String fieldTwo;
private String fieldThree;
private String fieldFour;
//Standard Getters, Setters, Equals, Hashcode, ToString methods
}
The list is required to be processed in such a way that
Among duplicates, select the one with minimum ID, and remove others. In this particular case, entries are considered duplicate when their values of fieldOne and fieldTwo are equal.
Get concatenated value of fieldThree and fieldFour.
I want to process this list Java8 Streams. Currently I don't know how to remove duplicates base on custom fields. I think I can't use distinct() because I can't change equals/hashcode method as logic is just for this specific case.
How can I achieve this?
Assuming you have
List<TheInfo> list;
you can use
List<TheInfo> result = new ArrayList<>(list.stream().collect(
Collectors.groupingBy(info -> Arrays.asList(info.getFieldOne(), info.getFieldOne()),
Collectors.collectingAndThen(
Collectors.minBy(Comparator.comparingInt(TheInfo::getId)),
Optional::get))).values());
the groupingBy collector produces groups according to a function whose results determine the equality. A list already implements this for a sequence of values, so Arrays.asList(info.getFieldOne(), info.getFieldOne()) produces a suitable key. In Java 9, you would most probably use List.of(info.getFieldOne(), info.getFieldOne()) instead.
The second argument to groupingBy is another collector determining how to process the groups, Collectors.minBy(…) will fold them to the minimum element according to a comparator and Comparator.comparingInt(TheInfo::getId) is the right comparator for getting the element with the minimum id.
Unfortunately, the minBy collector produces an Optional that would be empty if there are no elements, but since we know that the groups can’t be empty (groups without elements wouldn’t be created in the first place), we can unconditionally call get on the optional to retrieve the actual value. This is what wrapping this collector in Collectors.collectingAndThen(…, Optional::get) does.
Now, the result of the grouping is a Map mapping from the keys created by the function to the TheInfo instance with the minimum id. Calling values() on the Map gives as a Collection<TheInfo> and since you want a List, a final new ArrayList<>(collection) will produce it.
Thinking about it, this might be one of the cases, where the toMap collector is simpler to use, especially as the merging of the group elements doesn’t benefit from mutable reduction:
List<TheInfo> result = new ArrayList<>(list.stream().collect(
Collectors.toMap(
info -> Arrays.asList(info.getFieldOne(), info.getFieldOne()),
Function.identity(),
BinaryOperator.minBy(Comparator.comparingInt(TheInfo::getId)))).values());
This uses the same function for determining the key and another function determining a single value, which is just an identity function and a reduction function that will be called, if a group has more than one element. This will again be a function returning the minimum according to the ID comparator.
Using streams, you can process it using just the collector, if you provide it with proper classifier:
private static <T> T min(T first, T second, Comparator<? super T> cmp) {
return cmp.compare(first, second) <= 0 ? first : second;
}
private static void process(Collection<TheInfo> data) {
Comparator<TheInfo> cmp = Comparator.comparing(info -> info.id);
data.stream()
.collect(Collectors.toMap(
info -> Arrays.asList(info.fieldOne, info.fieldTwo), // Your classifier uses a tuple. Closest thing in JDK currently would be a list or some custom class. I chose List for brevity.
info -> info, // or Function.identity()
(a, b) -> min(a, b, cmp) // what do we do with duplicates. Currently we take min according to Comparator.
));
}
The above stream will be collected into Map<List<String>, TheInfo>, which will contain minimal element with lists of two strings as key. You can extract the map.values() and return then in new collection or whatever you need them for.

Categories