Java Stream - Retrieving repeated records from CSV

Java Stream - Retrieving repeated records from CSV - java

I searched the site and didn't find something similar. I'm newbie to using the Java stream, but I understand that it's a replacement for a loop command. However, I would like to know if there is a way to filter a CSV file using stream, as shown below, where only the repeated records are included in the result and grouped by the Center field.
Initial CSV file
Final result
In addition, the same pair cannot appear in the final result inversely, as shown in the table below:
This shouldn't happen
Is there a way to do it using stream and grouping at the same time, since theoretically, two loops would be needed to perform the task?
Thanks in advance.

You can do it in one pass as a stream with O(n) efficiency:
class PersonKey {
// have a field for every column that is used to detect duplicates
String center, name, mother, birthdate;
public PersonKey(String line) {
// implement String constructor
}
// implement equals and hashCode using all fields
}
List<String> lines; // the input
Set<PersonKey> seen = new HashSet<>();
List<String> unique = lines.stream()
.filter(p -> !seen.add(new PersonKey(p))
.distinct()
.collect(toList());
The trick here is that a HashSet has constant time operations and its add() method returns false if the value being added is already in the set, true otherwise.

What I understood from your examples is you consider an entry as duplicate if all the attributes have same value except the ID. You can use anymatch for this:
list.stream().filter(x ->
list.stream().anyMatch(y -> isDuplicate(x, y))).collect(Collectors.toList())
So what does the isDuplicate(x,y) do?
This returns a boolean. You can check whether all the entries have same value except the id in this method:
private boolean isDuplicate(CsvEntry x, CsvEntry y) {
return !x.getId().equals(y.getId())
&& x.getName().equals(y.getName())
&& x.getMother().equals(y.getMother())
&& x.getBirth().equals(y.getBirth());
}
I've assumed you've taken all the entries as String. Change the checks according to the type. This will give you the duplicate entries with their corresponding ID

Related

Set value to a string variable, from an object based on the condition using Java Streams

I am new to Java streams. I have a code snippet that i need to write using java streams. I am trying to set a value to a string based on a condition. I tried to look for solutions and experimented by using anyMatch, however could not get anywhere.
String loadAGENTID = "";
for(ReportGenerationParameter rgp : loadReportTableExt.getReportGenerationParameters()) {
if (rgp.getKey().equalsIgnoreCase(RapFilter.IDUSR)) {
loadAGENT_ID = rgp.getValue();
}
}
String loadAGENTID is to be used in the code. Any suggestion is welcome. Thank you.
I have tried using Arrays.stream and anyMatch but no luck so far
boolean todoName =
Arrays.stream(loadReportTableExt.getReportGenerationParameters())
.anyMatch(item -> item.getKey().equalsIgnoreCase(RapFilter.IDUSR));
if (todoName) {
// want to set the value of the respective object.
loadAGENT_ID = item.getValue();
}

Use the filter to find the matching object and then use findFirst which returns the first matching element
String loadAGENTID = loadReportTableExt.getReportGenerationParameters()
.stream()
.filter(rgp-> rgp.getKey().equalsIgnoreCase(RapFilter.IDUSR))
.findFirst()
.map(rgp->rgp.getValue()) // returns value from first matching element
.orElse(""); // none of them matched returns default value

Check if all object entities are equal using Java Streams [duplicate]

I am new to Java 8. I have a list of custom objects of type A, where A is like below:
class A {
int id;
String name;
}
I would like to determine if all the objects in that list have same name. I can do it by iterating over the list and capturing previous and current value of names. In that context, I found How to count number of custom objects in list which have same value for one of its attribute. But is there any better way to do the same in java 8 using stream?

You can map from A --> String , apply the distinct intermediate operation, utilise limit(2) to enable optimisation where possible and then check if count is less than or equal to 1 in which case all objects have the same name and if not then they do not all have the same name.
boolean result = myList.stream()
.map(A::getName)
.distinct()
.limit(2)
.count() <= 1;
With the example shown above, we leverage the limit(2) operation so that we stop as soon as we find two distinct object names.

One way is to get the name of the first list and call allMatch and check against that.
String firstName = yourListOfAs.get(0).name;
boolean allSameName = yourListOfAs.stream().allMatch(x -> x.name.equals(firstName));

another way is to calculate count of distinct names using
boolean result = myList.stream().map(A::getName).distinct().count() == 1;
of course you need to add getter for 'name' field

One more option by using Partitioning. Partitioning is a special kind of grouping, in which the resultant map contains at most two different groups – one for true and one for false.
by this, You can get number of matching and not matching
String firstName = yourListOfAs.get(0).name;
Map<Boolean, List<Employee>> partitioned = employees.stream().collect(partitioningBy(e -> e.name==firstName));
Java 9 using takeWhile takewhile will take all the values until the predicate returns false. this is similar to break statement in while loop
String firstName = yourListOfAs.get(0).name;
List<Employee> filterList = employees.stream()
.takeWhile(e->firstName.equals(e.name)).collect(Collectors.toList());
if(filterList.size()==list.size())
{
//all objects have same values
}

Or use groupingBy then check entrySet size.
boolean b = list.stream()
.collect(Collectors.groupingBy(A::getName,
Collectors.toList())).entrySet().size() == 1;

Map merge-function (shouldn't be called!?)

I don't get in this rather short method posted below why the merger() function is called (to determine what happens with values which are associated with the same key).
The method is supposed to group the list of search configurations by their application and sort the map keys (the applications by their names), as well as the map values (the search configurations by their names). Maybe the second stream isn't straight forward and I could/should use another approach, but nontheless I want to understand what's happening.
Output is something along the lines:
App1
Search Config Title1
Search Config Title2
App2
Search Config Title
App3
Search Config Title1
Search Config Title2
Search Config Title3
The ApplicationInfo implementation isn't overriding int hashCode() nor boolean equals(Object).
I would have thought that the map keys are always different in the second stream for each list of search configurations. However, in one particular situation the merge-function is called which I don't get why at all it's called.
public SortedMap<ApplicationInfo, List<SearchConfigInfo>> groupByApplications(final BusinessLogicProcessingContext ctx,
final List<SearchConfigInfo> searchConfigInfos) {
requireNonNull(ctx, "The processing context must not be null.");
requireNonNull(searchConfigInfos, "The search configuration informations must not be null.");
final String lang;
final RtInfoWithTitleComparator comp;
lang = ContextLanguage.get(ctx);
appComp = new RtInfoWithTitleComparator(lang);
final Map<ApplicationInfo, List<SearchConfigInfo>> appToSearchConfigs;
appToSearchConfigs = searchConfigInfos.stream()
.collect(groupingBy(searchConfig -> RtCache.getApplication(searchConfig.getApplicationGuid())));
return appToSearchConfigs.entrySet()
.stream()
.collect(toMap(Map.Entry::getKey,
p_entry -> _sortValueList(p_entry.getValue()),
merger(),
() -> new TreeMap<>(appComp)));
}
The general contract of a map is:
"An object that maps keys to values. A map cannot contain duplicate keys; each key can map to at most one value."
That's why I really wonder what happens in this case.
private static BinaryOperator<List<SearchConfigInfo>> merger() {
return (list1, list2) -> { System.out.println(RtCache.getApplication(list1.get(0).getApplicationGuid()).hashCode());
System.out.println(RtCache.getApplication(list2.get(0).getApplicationGuid()).hashCode());
System.out.println(list1.get(0).getApplicationGuid().equals(list2.get(0).getApplicationGuid()));
list1.addAll(list2);
return list1;
};
}
As I can see with the simple STDOUT debugging statements the hashCodes are different as well as they are not equal to each other.

Note that you're supplying a TreeMap as the result of the supplier function given to the Collectors.toMap() method (that's the last argument):
toMap(Map.Entry::getKey,
p_entry -> _sortValueList(p_entry.getValue()),
merger(),
() -> new TreeMap<>(appComp)));
(A supplier function provides the collection that the collector will use to contain the results - so in this case it always supplies a TreeMap.)
A TreeMap performs key comparisons with compareTo(), which is why you can get a key collision in this case - the collisions are taken in respect to the supplier map, not the map from which they originate.

Removing duplicates from list where duplication logic is based on custom field

I have a list of following info
public class TheInfo {
private int id;
private String fieldOne;
private String fieldTwo;
private String fieldThree;
private String fieldFour;
//Standard Getters, Setters, Equals, Hashcode, ToString methods
}
The list is required to be processed in such a way that
Among duplicates, select the one with minimum ID, and remove others. In this particular case, entries are considered duplicate when their values of fieldOne and fieldTwo are equal.
Get concatenated value of fieldThree and fieldFour.
I want to process this list Java8 Streams. Currently I don't know how to remove duplicates base on custom fields. I think I can't use distinct() because I can't change equals/hashcode method as logic is just for this specific case.
How can I achieve this?

Assuming you have
List<TheInfo> list;
you can use
List<TheInfo> result = new ArrayList<>(list.stream().collect(
Collectors.groupingBy(info -> Arrays.asList(info.getFieldOne(), info.getFieldOne()),
Collectors.collectingAndThen(
Collectors.minBy(Comparator.comparingInt(TheInfo::getId)),
Optional::get))).values());
the groupingBy collector produces groups according to a function whose results determine the equality. A list already implements this for a sequence of values, so Arrays.asList(info.getFieldOne(), info.getFieldOne()) produces a suitable key. In Java 9, you would most probably use List.of(info.getFieldOne(), info.getFieldOne()) instead.
The second argument to groupingBy is another collector determining how to process the groups, Collectors.minBy(…) will fold them to the minimum element according to a comparator and Comparator.comparingInt(TheInfo::getId) is the right comparator for getting the element with the minimum id.
Unfortunately, the minBy collector produces an Optional that would be empty if there are no elements, but since we know that the groups can’t be empty (groups without elements wouldn’t be created in the first place), we can unconditionally call get on the optional to retrieve the actual value. This is what wrapping this collector in Collectors.collectingAndThen(…, Optional::get) does.
Now, the result of the grouping is a Map mapping from the keys created by the function to the TheInfo instance with the minimum id. Calling values() on the Map gives as a Collection<TheInfo> and since you want a List, a final new ArrayList<>(collection) will produce it.
Thinking about it, this might be one of the cases, where the toMap collector is simpler to use, especially as the merging of the group elements doesn’t benefit from mutable reduction:
List<TheInfo> result = new ArrayList<>(list.stream().collect(
Collectors.toMap(
info -> Arrays.asList(info.getFieldOne(), info.getFieldOne()),
Function.identity(),
BinaryOperator.minBy(Comparator.comparingInt(TheInfo::getId)))).values());
This uses the same function for determining the key and another function determining a single value, which is just an identity function and a reduction function that will be called, if a group has more than one element. This will again be a function returning the minimum according to the ID comparator.

Using streams, you can process it using just the collector, if you provide it with proper classifier:
private static <T> T min(T first, T second, Comparator<? super T> cmp) {
return cmp.compare(first, second) <= 0 ? first : second;
}
private static void process(Collection<TheInfo> data) {
Comparator<TheInfo> cmp = Comparator.comparing(info -> info.id);
data.stream()
.collect(Collectors.toMap(
info -> Arrays.asList(info.fieldOne, info.fieldTwo), // Your classifier uses a tuple. Closest thing in JDK currently would be a list or some custom class. I chose List for brevity.
info -> info, // or Function.identity()
(a, b) -> min(a, b, cmp) // what do we do with duplicates. Currently we take min according to Comparator.
));
}
The above stream will be collected into Map<List<String>, TheInfo>, which will contain minimal element with lists of two strings as key. You can extract the map.values() and return then in new collection or whatever you need them for.

How to iterate nested for loops referring to parent elements using Java 8 streams?

I want to iterate nested lists using java8 streams, and extract some results of the lists on first match.
Unfortunately I have to also get a values from the parent content if a child element matches the filter.
How could I do this?
java7
Result result = new Result();
//find first match and pupulate the result object.
for (FirstNode first : response.getFirstNodes()) {
for (SndNode snd : first.getSndNodes()) {
if (snd.isValid()) {
result.setKey(first.getKey());
result.setContent(snd.getContent());
return;
}
}
}
java8
response.getFirstNodes().stream()
.flatMap(first -> first.getSndNodes())
.filter(snd -> snd.isValid())
.findFirst()
.ifPresent(???); //cannot access snd.getContent() here

When you need both values and want to use flatMap (as required when you want to perform a short-circuit operation like findFirst), you have to map to an object holding both values
response.getFirstNodes().stream()
.flatMap(first->first.getSndNodes().stream()
.map(snd->new AbstractMap.SimpleImmutableEntry<>(first, snd)))
.filter(e->e.getValue().isValid())
.findFirst().ifPresent(e-> {
result.setKey(e.getKey().getKey());
result.setContent(e.getValue().getContent());
});
In order to use standard classes only, I use a Map.Entry as Pair type whereas a real Pair type might look more concise.
In this specific use case, you can move the filter operation to the inner stream
response.getFirstNodes().stream()
.flatMap(first->first.getSndNodes().stream()
.filter(snd->snd.isValid())
.map(snd->new AbstractMap.SimpleImmutableEntry<>(first, snd)))
.findFirst().ifPresent(e-> {
result.setKey(e.getKey().getKey());
result.setContent(e.getValue().getContent());
});
which has the neat effect that only for the one matching item, a Map.Entry instance will be created (well, should as the current implementation is not as lazy as it should but even then it will still create lesser objects than with the first variant).

It should be like this:
Edit: Thanks Holger for pointing out that the code won't stop at the first valid FirstNode
response.getFirstNodes().stream()
.filter(it -> {it.getSndNodes().stream().filter(SndNode::isValid).findFirst(); return true;})
.findFirst()
.ifPresent(first -> first.getSndNodes().stream().filter(SndNode::isValid).findFirst().ifPresent(snd -> {
result.setKey(first.getKey());
result.setContent(snd.getContent());
}));
A test can be found here

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Stream - Retrieving repeated records from CSV - java

Related

Set value to a string variable, from an object based on the condition using Java Streams

Check if all object entities are equal using Java Streams [duplicate]

Map merge-function (shouldn't be called!?)

Removing duplicates from list where duplication logic is based on custom field

How to iterate nested for loops referring to parent elements using Java 8 streams?

Categories

Resources