How to de-dupe a List of Objects?

How to de-dupe a List of Objects? - java

A Rec object has a member variable called tag which is a String.
If I have a List of Recs, how could I de-dupe the list based on the tag member variable?
I just need to make sure that the List contains only one Rec with each tag value.
Something like the following, but I'm not sure what's the best algorithm to keep track counts, etc:
private List<Rec> deDupe(List<Rec> recs) {
for(Rec rec : recs) {
// How to check whether rec.tag exists in another Rec in this List
// and delete any duplicates from the List before returning it to
// the calling method?
}
return recs;
}

Store it temporarily in a HashMap<String,Rec>.
Create a HashMap<String,Rec>. Loop through all of your Rec objects. For each one, if the tag already exists as a key in the HashMap, then compare the two and decide which one to keep. If not, then put it in.
When you're done, the HashMap.values() method will give you all of your unique Rec objects.

Try this:
private List<Rec> deDupe(List<Rec> recs) {
Set<String> tags = new HashSet<String>();
List<Rec> result = new ArrayList<Rec>();
for(Rec rec : recs) {
if(!tags.contains(rec.tags) {
result.add(rec);
tags.add(rec.tag);
}
}
return result;
}
This checks each Rec against a Set of tags. If the set contains the tag already, it is a duplicate and we skip it. Otherwise we add the Rec to our result and add the tag to the set.

This becomes easier if Rec is .equals based on its tag value. Then you could write something like:
private List<Rec> deDupe( List<Rec> recs )
{
List<Rec> retList = new ArrayList<Rec>( recs.size() );
for ( Rec rec : recs )
{
if (!retList.contains(rec))
{
retList.add(rec);
}
}
return retList;
}

I would do that with the google collections. You can use the filter function, with a predicate that remember previous tags, and filters out Rec's with tag that has been there before.
Something like this:
private Iterable<Rec> deDupe(List<Rec> recs)
{
Predicate<Rec> filterDuplicatesByTagPredicate = new FilterDuplicatesByTagPredicate();
return Iterables.filter(recs, filterDuplicatesByTagPredicate);
}
private static class FilterDuplicatesByTagPredicate implements Predicate<Rec>
{
private Set<String> existingTags = Sets.newHashSet();
#Override
public boolean apply(Rec input)
{
String tag = input.getTag();
return existingTags.add(tag);
}
}
I slightly changed the method to return Iterable instead of List, but ofcourse you change that if that's important.

If you don't care about shuffling the data around (i.e you have a small list of small objects), you can do this:
private List<T> deDupe(List<T> thisListHasDupes){
Set<T> tempSet = new HashSet<T>();
for(T t:thisListHasDupes){
tempSet.add(t);
}
List<T> deDupedList = new ArrayList<T>();
deDupedList.addAll(tempSet);
return deDupedList;
}
Remember that implmenations of Set are going to want a consistent and valid equals operator. So if you have a custom object make sure that's taken care of.

Related

order List<Object> with the order of List<Long> java

After reading several questions and examples I came with this example which I modified a bit to make it work as expected.
Collections.sort(listToOrder, Comparator.comparing(item -> someObject.getListOfLongs().indexOf(item.getId())));
So listToOrder is a list of MyObject which has name and id, so I need to order listToOrder in the same order as listOfLongs.
With the example of code given it work as expected however if the listToOrder is different in size it fails, wondering how I could make it to work even if the sizes are different.
Edit:
I misread, the error I was getting was an IndexOutOfBoundsException which wasn't triggered by the line of code I put up there, it was because of a manual log.

List.indexOf() returns -1 if the element is not found, which means such items will be ordered first in the resulting sorted list.
Without ordering data, the only other sensible way to handle such elements is to order them last:
Collections.sort(listToOrder, Comparator.comparing(item -> someObject.getListOfLongs().contains(item.getId()) ? someObject.getListOfLongs().indexOf(item.getId()) : Integer.MAX_VALUE));

This has nothing to do with sorting, but ordering. Having the following object with full-args constructor and getters:
public static class MyObject {
private final long id;
private final String name;
}
... and the following data in a random order ...
List<Integer> ids = Arrays.asList(5,4,7,0,2,1,3,8,6);
List<MyObject> list = Arrays.asList(
new MyObject(1, "one"),
new MyObject(3, "three"),
...
new MyObject(6, "six"),
new MyObject(8, "eight")
);
The solution you are looking for is this:
List<MyObject> newList = new ArrayList<>(list);
for (int i=0; i<ids.size(); i++) {
int id = ids.get(i);
for (MyObject myObject: list) {
if (myObject.getId() == id) {
newList.set(i, myObject);
break;
}
}
}
Simply find the object with the matching ID and set it to a new list. There is no dedicated method to do that.

Filter List of labels

I have a list of labels which I want to filter and leave only labels "low" "lowest" "high"
I tried to implement this:
private List<Label> filterPriorityLabels(List<Label> labels)
{
for (ListIterator<Label> iter = labels.listIterator(); iter.hasNext();)
{
Label a = iter.next();
if (a.getName() != "low" | "lowest" | "high")
{
iter.remove();
}
}
return labels;
}
But I can't get the working example. How I can fix this code?

Don't compare String with != but !equals().
And a finer solution would be to use contains() method of List.
List<String> acceptableNames = Arrays.asList("low","lowest","high");
if (!acceptableNames.contains(a.getName()))

Here's a complete example based on the answer by #davidxxx:
private static final List<String> acceptableNames =
Arrays.asList("low", "lowest", "high");
private List<Label> filterPriorityLabels(List<Label> labels)
{
for (ListIterator<Label> iter = labels.listIterator(); iter.hasNext();)
{
final Label a = iter.next();
if (!acceptableNames.contains(a.getName())
{
iter.remove();
}
}
return labels;
}
If you're using Java 8, there's a nicer way using streams:
private static final List<String> acceptableNames =
Arrays.asList("low", "lowest", "high");
private List<Label> filterPriorityLabels(List<Label> labels)
{
return labels.stream()
.filter( p -> acceptableNames.contains(p.getName()) )
.collect(Collectors.toList());
}
Note, though, that unlike davidxxx's answer, this does not modify the original list and return it. Instead, it leaves the original list unchanged and returns a new list.

if (! (a.getName().equals("low") || a.getName().equals("lowest") || a.getName().equals("high")))
in Java, you compare Strings (and objects in general), with equals(), not ==
the or logical operator is ||, not |
it expects several boolean expressions as operands. Not Strings.
Also, the signature of your method leads to think that the method creates another list, containing the filtered elements of the original list, whereas it actually modifies the list passed as argument. It should return void, or create and return a copy.

Is there any null free list data structure?

I am using LinkedList data structure serverList to store the elements in it. As of now, it can also insert null in the LinkedList serverList which is not what I want. Is there any other data structure which I can use which will not add null element in the serverList list but maintain the insert ordering?
public List<String> getServerNames(ProcessData dataHolder) {
// some code
String localIP = getLocalIP(localPath, clientId);
String localAddress = getLocalAddress(localPath, clientId);
// some code
List<String> serverList = new LinkedList<String>();
serverList.add(localIP);
if (ppFlag) {
serverList.add(localAddress);
}
if (etrFlag) {
for (String remotePath : holderPath) {
String remoteIP = getRemoteIP(remotePath, clientId);
String remoteAddress = getRemoteAddress(remotePath, clientId);
serverList.add(remoteIP);
if (ppFlag) {
serverList.add(remoteAddress);
}
}
}
return serverList;
}
This method will return a List which I am iterating it in a for loop in normal way. I can have empty serverList if everything is null, instead of having four null values in my list. In my above code, getLocalIP, getLocalAddress, getRemoteIP and getRemoteAddress can return null and then it will add null element in the linked list. I know I can add a if check but then I need to add if check four time just before adding to Linked List. Is there any better data structure which I can use here?
One constraint I have is - This library is use under very heavy load so this code has to be fast since it will be called multiple times.

I am using LinkedList data structure serverList to store the elements in it.
That's most probably wrong, given that you're aiming at speed. An ArrayList is much faster unless you're using it as a Queue or alike.
I know I can add a if check but then I need to add if check four time just before adding to Linked List. Is there any better data structure which I can use here?
A collection silently ignoring nulls would be a bad idea. It may be useful sometimes and very surprising at other times. Moreover, it'd violate the List.add contract. So you won't find it in any serious library and you shouldn't implement it.
Just write a method
void <E> addIfNotNullTo(Collection<E> collection, E e) {
if (e != null) {
collection.add(e);
}
}
and use it. It won't make your code really shorter, but it'll make it clearer.
One constraint I have is - This library is use under very heavy load so this code has to be fast since it will be called multiple times.
Note that any IO is many orders of magnitude slower than simple list operations.

Use Apache Commons Collection:
ListUtils.predicatedList(new ArrayList(), PredicateUtils.notNullPredicate());
Adding null to this list throws IllegalArgumentException. Furthermore you can back it by any List implementation you like and if necessary you can add more Predicates to be checked.
Same exists for Collections in general.

There are data structures that do not allow null elements, such as ArrayDeque, but these will throw an exception rather than silently ignore a null element, so you'd have to check for null before insertion anyway.
If you're dead set against adding null checks before insertion, you could instead iterate over the list and remove null elements before you return it.

The simplest way would be to just override LinkedList#add() in your getServerNames() method.
List<String> serverList = new LinkedList<String>() {
public boolean add(String item) {
if (item != null) {
super.add(item);
return true;
} else
return false;
}
};
serverList.add(null);
serverList.add("NotNULL");
System.out.println(serverList.size()); // prints 1
If you then see yourself using this at several places, you can probably turn it into a class.

You can use a plain Java HashSet to store your paths. The null value may be added multiple times, but it will only ever appears once in the Set. You can remove null from the Set and then convert to an ArrayList before returning.
Set<String> serverSet = new HashSet<String>();
serverSet.add(localIP);
if (ppFlag) {
serverSet.add(localAddress);
}
if (etrFlag) {
for (String remotePath : holderPath) {
String remoteIP = getRemoteIP(remotePath, clientId);
String remoteAddress = getRemoteAddress(remotePath, clientId);
serverSet.add(remoteIP);
if (ppFlag) {
serverSet.add(remoteAddress);
}
}
}
serverSet.remove(null); // remove null from your set - no exception if null not present
List<String> serverList = new ArrayList<String>(serverSet);
return serverList;

Since you use Guava (it's tagged), I have this alternative if you have the luxury of being able to return a Collection instead of a List.
Why Collection ? Because List forces you to either return true or throw an exception. Collection allows you to return false if you didn't add anything to it.
class MyVeryOwnList<T> extends ForwardingCollection<T> { // Note: not ForwardingList
private final List<T> delegate = new LinkedList<>(); // Keep a linked list
#Override protected Collection<T> delegate() { return delegate; }
#Override public boolean add(T element) {
if (element == null) {
return false;
} else {
return delegate.add(element);
}
}
#Override public boolean addAll(Collection<? extends T> elements) {
return standardAddAll(elements);
}
}

how to search an arraylist for duplicate objects (determined by one field) and merge them

I have a class called PriceList
class PriceList {
Integer priceListID;
...
}
and I have extended it in another class to accommodate some user functionality
class PriceListManager extends PriceList{
boolean user;
boolean manager;
}
One user can have an ArrayList of PriceListManager objects, that can contain duplicates (same PriceListID), so I would like to find these duplicates and compare they're fields to create one entry
eg.:
{ PriceListID = 5; user = false; manager = true;
PriceListID = 5; user = true; manager = false; }
should become
PriceListID = 5; user = true; manager = true;
What would be the best approach to that?
I already have equals methods for both classes, PriceList compares two objects by just checking their IDs while PriceListManagers does that AND checks if both boolean fields are the same.
edit: I need to find any objects with same ID, so I can merge their attributes and leave only one object.

How about something like this:
Map<Integer, PriceListManager> map = new HashMap<Integer, PriceListManager>();
for (PriceListManager manager : yourArrayList) {
if (!map.contains(manager.getPriceListID())) {
map.put(manager.getPriceListID(), manager);
}
if (manager.isUser()) {
map.get(manager.getPriceListID()).setIsUser(true);
}
if (manager.isManager()) {
map.get(manager.getPriceListID()).setIsManager(true);
}
}
List<PriceListManager> newList = new ArrayList<PriceListManager>();
newList.addAll(map.values());
// Do stuff with newList....

You can try to iterate through list and convert it into HashMap, where priceListID will be key and PriceListManager as value. While iterating over the ArrayList, check if hashmap whether value for particular priceListID exists :
1. if yes compare the same with current one
2. if not equal update as per your logic.
3. If equal no need to update and
4. if doesn't exists add it to hashmap
I hope this helps.

You have implemented equals and hashCode on PriceListManager to use all fields, but for this particular purpose you need them to match on priceListID alone, right? Maybe you want to give this construction one more thought: what is your entity here? does priceListID alone already determine a priceListManager? In any case, if you want a local solution to this, i'd use a Map and then do something like this:
Map<Integer, PriceListManager> lookup = new HashMap<Integer, PriceListManager>();
for (PriceListManager item: priceListManagers) {
PriceListManager manager = lookup.get(item.getPriceListID());
if (manager == null) {
manager = new PriceListManager();
manager.setPriceListID(item.getPriceListID());
manager.setUser(false);
manager.setManager(false);
lookup.put(manager.getPriceListID(), manager);
}
manager.setUser(manager.getUser() || item.getUser());
manager.setManager(manager.getManager() || item.getManager());
}

If you don't want duplicates. Isn't it a good idea to work with a Set instead of an ArrayList?
Uniqueness is guaranteed in a Set.
You gain performance and have to implement less code since you don't have to do the duplicate check afterwards...

Go through the original list, and if you find an object that was already there, then merge those two into one:
Collection<? extends PriceList> convertToMergedList(Iterable<? extends PriceList> listWithDuplicated) {
Map<Integer, PriceList> idToMergedObject = new LinkedHashMap<>();
for(PriceList price : listWithDuplicated) {
if (idToMergedObject.get(price.piceListId) == null) {
idToMergedObject.put(price.piceListId, price);
} else {
PriceList priceSoFar = idToMergedObject.get(price.piceListId);
PriceList priceMerged = merge(priceSoFar, price);
idToMergedObject.put(price.piceListId, priceMerged);
}
}
return idToMergedObject.values();
}
PriceList merge(PriceList price1, PriceList price2) {
// merging logic
}
I use LinkedHashMap, so that the original order of elements is preserved.

Grouping objects by two fields in java

I have bunch of log files and I want to process them in java, but I want to sort them first so I can have more human readable results.
My Log Class :
public class Log{
//only relevant fields here
private String countryCode;
private AccessType accessType;
...etc..
}
AccessType is Enum, which has values WEB, API, OTHER.
I'd like to group Log objects by both countryCode and accessType, so that end product would be log list.
I got this working for grouping Logs into log list by countryCode like this :
public List<Log> groupByCountryCode(String countryCode) {
Map<String, List<Log>> map = new HashMap<String, List<Log>>();
for (Log log : logList) {
String key = log.getCountryCode();
if (map.get(key) == null) {
map.put(key, new ArrayList<Log>());
}
map.get(key).add(log);
}
List<Log> sortedByCountryCodeLogList = map.get(countryCode);
return sortedByCountryCodeLogList;
}
from this #Kaleb Brasee example :
Group by field name in Java
Here is what I've been trying for some time now, and really stuck now ..
public List<Log> groupByCountryCode(String countryCode) {
Map<String, Map<AccessType, List<Log>>> map = new HashMap<String, Map<AccessType, List<Log>>>();
AccessType mapKey = null;
List<Log> innerList = null;
Map<AccessType, List<Log>> innerMap = null;
// inner sort
for (Log log : logList) {
String key = log.getCountryCode();
if (map.get(key) == null) {
map.put(key, new HashMap<AccessType, List<Log>>());
innerMap = new HashMap<AccessType, List<Log>>();
}
AccessType innerMapKey = log.getAccessType();
mapKey = innerMapKey;
if (innerMap.get(innerMapKey) == null) {
innerMap.put(innerMapKey, new ArrayList<Log>());
innerList = new ArrayList<Log>();
}
innerList.add(log);
innerMap.put(innerMapKey, innerList);
map.put(key, innerMap);
map.get(key).get(log.getAccessType()).add(log);
}
List<Log> sortedByCountryCodeLogList = map.get(countryCode).get(mapKey);
return sortedByCountryCodeLogList;
}
I'm not sure I know what I'm doing anymore

Your question is confusing. You want to sort the list, but you are creating many new lists, then discarding all but one of them?
Here is a method to sort the list. Note that Collections.sort() uses a stable sort. (This means that the original order of items within a group of country code and access type is preserved.)
class MyComparator implements Comparator<Log> {
public int compare(Log a, Log b) {
if (a.getCountryCode().equals(b.getCountryCode()) {
/* Country code is the same; compare by access type. */
return a.getAccessType().ordinal() - b.getAccessType().ordinal();
} else
return a.getCountryCode().compareTo(b.getCountryCode());
}
}
Collections.sort(logList, new MyComparator());
If you really want to do what your code is currently doing, at least skip the creation of unnecessary lists:
public List<Log> getCountryAndAccess(String cc, AccessType access) {
List<Log> sublist = new ArrayList<Log>();
for (Log log : logList)
if (cc.equals(log.getCountryCode()) && (log.getAccessType() == access))
sublist.add(log);
return sublist;
}

If you're able to use it, Google's Guava library has an Ordering class that might be able to help simplify things. Something like this might work:
Ordering<Log> byCountryCode = new Ordering<Log>() {
#Override
public int compare(Log left, Log right) {
return left.getCountryCode().compareTo(right.getCountryCode());
}
};
Ordering<Log> byAccessType = new Ordering<Log>() {
#Override
public int compare(Log left, Log right) {
return left.getAccessType().compareTo(right.getAccessType());
}
};
Collections.sort(logList, byCountryCode.compound(byAccessType));

You should create the new inner map first, then add it to the outer map:
if (map.get(key) == null) {
innerMap = new HashMap<AccessType, List<Log>>();
map.put(key, innerMap);
}
and similarly for the list element. This avoids creating unnecessary map elements which will then be overwritten later.
Overall, the simplest is to use the same logic as in your first method, i.e. if the element is not present in the map, insert it, then just get it from the map:
for (Log log : logList) {
String key = log.getCountryCode();
if (map.get(key) == null) {
map.put(key, new HashMap<AccessType, List<Log>>());
}
innerMap = map.get(key);
AccessType innerMapKey = log.getAccessType();
if (innerMap.get(innerMapKey) == null) {
innerMap.put(innerMapKey, new ArrayList<Log>());
}
innerMap.get(innerMapKey).add(log);
}

Firstly, it looks like you're adding each log entry twice with the final line map.get(key).get(log.getAccessType()).add(log); inside your for loop. I think you can do without that, given the code above it.
After fixing that, to return your List<Log> you can do:
List<Log> sortedByCountryCodeLogList = new ArrayList<Log>();
for (List<Log> nextLogs : map.get(countryCode).values()) {
sortedByCountryCodeLogList.addAll(nextLogs);
}
I think that code above should flatten it down into one list, still grouped by country code and access type (not in insertion order though, since you used HashMap and not LinkedHashMap), which I think is what you want.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to de-dupe a List of Objects? - java

This becomes easier if Rec is .equals based on its tag value. Then you could write something like: private List<Rec> deDupe( List<Rec> recs ) { List<Rec> retList = new ArrayList<Rec>( recs.size() ); for ( Rec rec : recs ) { if (!retList.contains(rec)) { retList.add(rec); } } return retList; }

Related

order List<Object> with the order of List<Long> java

Filter List of labels

Is there any null free list data structure?

how to search an arraylist for duplicate objects (determined by one field) and merge them

Grouping objects by two fields in java

Categories

Resources