Properly delete duplicates in a list

Properly delete duplicates in a list - java

Given the following datatype Testcase (XQuery, Testpath, FirstInputFile, SecondInputFile, Expected)
how can I properly delete duplicates.
Definition of duplicates:
If FirstInputFile already in the list as SecondInputFile vice versa.
Here is the Testdata
tcs.add(new HeaderAndBodyTestcase("XQ 1", "/1", "FAIL", "FAIL2", "FAILED"));
tcs.add(new HeaderAndBodyTestcase("XQ 1", "/1", "FAIL2", "FAIL", "FAILED"));
tcs.add(new HeaderAndBodyTestcase("XQ 2", "/2", "FAIL4", "FAIL3", "FAILED2"));
tcs.add(new HeaderAndBodyTestcase("XQ 2", "/2", "FAIL3", "FAIL4", "FAILED2"));
and here is the function
protected void deleteExistingDuplicatesInArrayList(final ArrayList<HeaderAndBodyTestcase> list) {
for (int idx = 0; idx < list.size() - 1; idx++) {
if (list.get(idx).firstInputFile.equals(list.get(idx).secondInputFile)
|| (list.get(idx + 1).firstInputFile.equals(list.get(idx).firstInputFile)
&& list.get(idx).secondInputFile.equals(list.get(idx + 1).secondInputFile)
|| (list.get(idx).firstInputFile.equals(list.get(idx + 1).secondInputFile)
&& list.get(idx).secondInputFile.equals(list.get(idx + 1).firstInputFile)))) {
list.remove(idx);
}
}
}
This solution is already working, but seems very crappy, so is there a better solution to this?

put everything in a Set using a comparator if necessary, and create a list from this set if you really need a List (and not a Collection)
Set<HeaderAndBodyTestcase> set = new Hashset<>(list);

Given your rather peculiar "equality" constraints, I think the best way would be to maintain two sets of already seen first- and second input files and a loop:
Set<String> first = new HashSet<>();
Set<String> second = new HashSet<>();
for (HeaderAndBodyTestcase tc : tcs) {
if (! first.contains(tc.getSecondInputFile()) &&
! second.contains(tc.getFirstInputFile())) {
first.add(tc.getFirstInputFile());
second.add(tc.getSecondInputFile());
System.out.println(tc); // or add to result list
}
}
This will also work if "equal" elements do not appear right after each other in the original list.
Also note that removing elements from a list while iterating the same list, while working sometimes, will often yield unexpected results. Better create a new, filtered list, or if you have to remove, create an Iterator from that list and use it's remove method.
On closer inspections (yes, it took me that long to understand your code), the conditions in your current working code are in fact much different than what I understood from your question, namely:
remove element if first and second is the same (actually never checked for the last element in the list)
remove element if first is the same as first on last, and second the same as second on last
remove if first is same as last second and vice versa
only consider consecutive elements (from comments)
Given those constraints, the sets are not needed and also would not work properly considering that both the elements have to match (either 'straight' or 'crossed'). Instead you can use pretty much your code as-is, but I would still use an Iterator and keep track of the last element, and also split the different checks to make the whole code much easier to understand.
HeaderAndBodyTestcase last = null;
for (Iterator<HeaderAndBodyTestcase> iter = list.iterator(); iter.hasNext();) {
HeaderAndBodyTestcase curr = iter.next();
if (curr.firstInputFile.equals(curr.secondInputFile)) {
iter.remove();
}
if (last != null) {
boolean bothEqual = curr.firstInputFile.equals(last.firstInputFile)
&& curr.secondInputFile.equals(last.secondInputFile);
boolean crossedEqual = curr.secondInputFile.equals(last.firstInputFile)
&& curr.firstInputFile.equals(last.secondInputFile);
if (bothEqual || crossedEqual) {
iter.remove();
}
}
last = curr;
}

Related

Is removing elements from an ArrayList or LinkedList while iterating through it with a for loop in Java bad? If so, why?

I was showing my code to someone and they said that it would cause undefined behavior. Being a Java programmer, that's not something I understand well. In the following code block I am iterating through scenes, which is an ArrayList, and removing elements from it.
for(int i = 0; i < scenes.size() - 1; i++)
{
if(!(Double.valueOf(scenes.get(i + 1)) - Double.valueOf(scenes.get(i)) > 10))
{
scenes.remove(i + 1);
i--;
}
}
This compiles and doesn't throw an exception at runtime, but I'm still not sure if it's a programming no-no, why it's a programming no-no, and what is the right way to do it. I've heard about using Iterator.remove() and about just creating a whole new List.

In an ArrayList, removing an element from the middle of the list requires you to shift all of the elements with a higher index down by one. This is fine if you do it once (or a small number of times), but inefficient if you do it repeatedly.
You don't really want to use an Iterator for this either, because Iterator.remove() suffers from the same issue.
A better approach to this is to go through the list, moving the elements you want to keep to their new positions; and then just remove the tail of the list at the end:
int dst = 0;
for (int src = 0; src < scenes.size(); ++dst) {
// You want to keep this element.
scenes.set(dst, scenes.get(src++));
// Now walk along the list until you find the element you want to keep.
while (src < scenes.size()
&& Double.parseDouble(scenes.get(src)) - Double.parseDouble(scenes.get(dst)) <= 10) {
// Increment the src pointer, so you won't keep the element.
++src;
}
}
// Remove the tail of the list in one go.
scenes.subList(dst, scenes.size()).clear();
(This "shift and clear" approach is what is used by ArrayList.removeIf; you can't use that directly here because you can't inspect adjacent elements in the list, you only have access to the current element).
You can take a similar approach which will also work efficiently with non-random access lists such as LinkedList. You need to avoid repeatedly calling get and set, since these are e.g. O(size) in the case of LinkedList.
In that case, you would use ListIterator instead of plain indexes:
ListIterator<String> dst = scenes.listIterator();
for (ListIterator<String> src = scenes.listIterator(); src.hasNext();) {
dst.next();
String curr = src.next();
dst.set(curr);
while (src.hasNext()
&& Double.parseDouble(src.next()) - Double.parseDouble(curr) <= 10) {}
}
scenes.subList(dst.nextIndex(), scenes.size()).clear();
Or something like this. I've not tested it, and ListIterator is always pretty confusing to use.

This is straightforward and will work for either ArrayList or LinkedList:
Iterator<String> iterator = list.iterator();
double current = 0;
double next;
boolean firstTime = true;
while (iterator.hasNext()) {
if (firstTime) {
current = Double.parseDouble(iterator.next());
firstTime = false;
} else {
next = Double.parseDouble(iterator.next());
if (next - current > 10) {
current = next;
} else {
iterator.remove();
}
}
}

ConcurrentModificationException When removing element using list iterator java

I have an issue removing the 1st and 2nd element of my list even by using the iterator.
I have read the following threads but can't fix my issue (those were the most relevant but I checked other material as well):
ConcurrentModificationException when trying remove element from list
Iterating through a Collection, avoiding ConcurrentModificationException when removing objects in a loop
So my code looks like this:
List<List<String>> list = cnf.read();
List<List<String>> nlist = new ArrayList<>();
for (List<String> l : list) {
if (l.size() <= 3) {
nlist.add(l);
} else {
int size = l.size();
while (size > 3) {
List<String> three = l.subList(0, 2);
three.add("Y" + (count++));
//Iterator itr = l.iterator();
ListIterator itr = l.listIterator();
int v = 0;
while (itr.hasNext()) {
itr.next();
if (v == 0 || v == 1) {
itr.remove();
v++;
}
}
l.add(0, "Y" + (count++));
size--;
nlist.add(three);
}
nlist.add(l);
}
}
for (List<String> l : nlist) {
System.out.println(l.toString());
System.out.println(l.size());
}
I get a ConcurrentModificationException at the print statement here :
System.out.println(l.toString());
I tried using iterators for my 2 for loops as well but It doesn't seem to make a difference!
I am new to posting questions so let me know If I am doing it right!
Thank you.

After A long debugging, here is the solution.
The sublist function passes by reference and not by value, a sublist created by ArrayList.subList call keeps a reference to the original list and accesses its elementData array directly.
For this reason, when adding an element to the "three" list, we alter the state of the original list. this happens here:
three.add("Y" + (count++));
A way of fixing it for this specific case is to create and initialize the "three" list the following way:
String one = l.get(0);
String two = l.get(1);
List<String> three = new ArrayList<>();
three.add(one);
three.add(two);
three.add("Y" + (count));
This allows us to manipulate our lists without getting Concurrency Exceptions (ConcurrentModificationException). However, if you are manipulating big lists, I would suggest you use another less hardcoded method for list creation.
I will mark this thread as answered and hope it helps people.

How can I test if an array contains each value from map?

I have a map:
Map<String, String> abc = new HashMap<>();
"key1" : "value1",
"key2" : "value2"
And an array:
String[] options= {"value1", "value2", "value3"}
I am creating this array as following (I am using following method to do something else which is not relevant to the question that I am asking here):
public String[] getOptions() {
List<String> optionsList = getOptionsFromAMethod(WebElementA);
String[] options = new String[optionsList.size()];
options = optionsList.toArray(options);
return options;
}
What is the best way to verify if String[] contains each value from Map?
I am thinking about doing this:
for (Object value : abc.values()) {
Arrays.asList(options).contains(value);
}

Explanation
Your current approach creates an ArrayList (from java.util.Arrays, not to confuse with the regular ArrayList from java.util) wrapping the given array.
You then call, for each value of the map, the ArrayList#contains method. However this method is very slow. It walks through the whole list in order to search for something.
Your current approach thus yields O(n^2) which doesn't scale very well.
Solution
We can do better by using a data-structure which is designed for a fast contains query, namely a HashSet.
So instead of putting all your values into an ArrayList we will put them into a HashSet whose contains method is fast:
boolean doesContainAll = true;
HashSet<String> valuesFromArray = new HashSet<>(Arrays.asList(options));
for (String value : abc.values()) {
if (!valuesFromArray.contains(value)) {
doesContainAll = false;
break;
}
}
// doesContainAll now is correctly set to 'true' or 'false'
The code now works in O(n) which is far better and also optimal in terms of complexity.
Of course you can optimize further to speedup by constant factors. For example you can first check the size, if options.length is greater than abc.values().size() then you can directly return with false.
JStream solution
You can also use Java 8 and Streams to simplify the above code, the result and also the procedure behind the scenes is the same:
HashSet<String> valuesFromArray = new HashSet<>(Arrays.asList(options));
boolean doesContainAll = abc.values().stream()
.allMatch(valuesFromArray::contains);
Insights of ArrayList#contains
Let's take a closer look into java.util.Arrays.ArrayList. You can find its code here.
Here is its code for the contains method:
public boolean contains(Object o) {
return indexOf(o) != -1;
}
Lets see how indexOf is implemented:
public int indexOf(Object o) {
E[] a = this.a;
if (o == null) {
for (int i = 0; i < a.length; i++)
if (a[i] == null)
return i;
} else {
for (int i = 0; i < a.length; i++)
if (o.equals(a[i]))
return i;
}
return -1;
}
So indeed, in all cases the method will traverse from left to right through the source array in order to find the object. There is no fancy method that is able to directly access the information whether the object is contained or not, it runs in O(n) and not in O(1).
Note on duplicates
If either of your data may contain duplicates and you plan to count them individually, then you will need a slightly different approach since contains will not bother for the amount of duplicates.
For this you may collect your abc.values() first into a List for example. Then, every time you checked an element, you will remove the matched element from the List.
Alternatively you can setup a HashMap<String, Integer> which counts for every element its occurrences. Then, every time you checked an element, decrease the counter by one.

You can use https://docs.oracle.com/javase/7/docs/api/java/util/List.html#containsAll(java.util.Collection)
Arrays.asList("value1", "value2", "value3").containsAll(abc.values())

I would recommend using a stream:
final List<String> optionsList = Arrays.asList(options);
abc.values().stream().allMatch(optionsList::contains);

Finding duplicate and non duplicate in Java

I know this question has been answered on "how to find" many times, however I have a few additional questions. Here is the code I have
public static void main (String [] args){
List<String> l1= new ArrayList<String>();
l1.add("Apple");
l1.add("Orange");
l1.add("Apple");
l1.add("Milk");
//List<String> l2=new ArrayList<String>();
//HashSet is a good choice as it does not allow duplicates
HashSet<String> set = new HashSet<String>();
for( String e: l1){
//if(!(l2).add(e)) -- did not work
if(!(set).add(e)){
System.out.println(e);
}
Question 1:The list did not work because List allows Duplicate while HashSet does not- is that correct assumption?
Question 2: What does this line mean: if(!(set).add(e))
In the for loop we are checking if String e is in the list l1 and then what does this line validates if(!(set).add(e))
This code will print apple as output as it is the duplicate value.
Question 3: How can i have it print non Duplicate values, just Orange and Milk but not Apple? I tried this approach but it still prints Apple.
List unique= new ArrayList(new HashSet(l1));
Thanks in advance for your time.

1) Yes that is correct. We often use sets to remove duplicates.
2) The add method of HashSet returns false when the item is already in the set. That's why it is used to check whether the item exists in the set.
3) To do this, you need to count up the number of occurrances of each item in the array, store them in a hash map, then print out those items that has a count of 1. Or, you could just do this (which is a little dirty and is slower! However, this approach takes a little less space than using a hash map.)
List<String> l1= new ArrayList<>();
l1.add("Apple");
l1.add("Orange");
l1.add("Apple");
l1.add("Milk");
HashSet<String> set = new HashSet<>(l1);
for (String item : set) {
if (l1.stream().filter(x -> !x.equals(item)).count() == l1.size() - 1) {
System.out.println(item);
}
}

You're right.
Well... adding to the collection doesn't necessary need to return anything. Fortunately guys from the Sun or Oracle decided to return a message if the item was successfully added to the collection or not. This is indicated by true/false return value. true for a success.
You can extend your current code with the following logic: if element wasn't added successfully to the set, it means it was a duplicate so add it to another set Set<> duplicates and later remove all duplicates from the Set.

Question 1:The list did not work because List allows Duplicate while HashSet does not- is that correct assumption?
That is correct.
Question 2: What does this line mean: if(!(set).add(e)) In the for loop we are checking if String e is in the list l1 and then what does this line validates if(!(set).add(e))
This code will print apple as output as it is the duplicate value.
set.add(e) attempts to add an element to the set, and it returns a boolean indicating whether it was added. Negating the result will cause new elements to be ignored and duplicates to be printed. Note that if an element is present 3 times it will be printed twice, and so on.
Question 3: How can i have it print non Duplicate values, just Orange and Milk but not Apple? I tried this approach but it still prints Apple. List<String> unique= new ArrayList<String>(new HashSet<String>(l1));
There are a number of ways to approach it. This one doesn't have the best performance but it's pretty straightforward:
for (int i = 0; i < l1.size(); i++) {
boolean hasDup = false;
for (int j = 0; j < l1.size(); j++) {
if (i != j && l1.get(i).equals(l1.get(j))) {
hasDup = true;
break;
}
}
if (!hasDup) {
System.out.println(e);
}
}

With the /java8 power...
public static void main(String[] args) {
List<String> l1 = new ArrayList<>();
l1.add("Apple");
l1.add("Orange");
l1.add("Apple");
l1.add("Milk");
// remove duplicates
List<String> li = l1.parallelStream().distinct().collect(Collectors.toList());
System.out.println(li);
// map with duplicates frequency
Map<String, Long> countsList = l1.stream().collect(Collectors.groupingBy(fe -> fe, Collectors.counting()));
System.out.println(countsList);
// filter the map where only once
List<String> l2 = countsList.entrySet().stream().filter(map -> map.getValue().longValue() == 1)
.map(map -> map.getKey()).collect(Collectors.toList());
System.out.println(l2);
}

Java LinkedList issues - how to remove items that meet certain qualifications

I have a question,
Here is what I need to do -
I have BankItems that are associated with numbers. I fill the list but when an objects enters that is $100 more than the lowest dollar value currently in the list, I want to delete the object that has the low value.
First - I create the list
List<BankItem> listOfBankItems = new LinkedList<BankItem>();
Later in the program I create a new BankItem object and it to the list
listOfBankItems.add(createdItem);
and after adding each item I want to check to see if the new item is $100 more than any object already in the list so I run something like this
for (int i = 0; i < listOfBankItems.size(); i++) {
int oldValue =listOfBankItems.get(i).getAmount();
int newValue = createdItem.getAmount();
int calculatedDif = newValue - oldValue;
if (calculatedDif > 100) {
listOfBankItems.remove(i);
}
}
Unfortunately, this isn't working. I don't know what it up. Maybe I shouldn't use a LinkedList? Maybe my logic is way off-base. Please help.
Thanks!!!

The problem is that the index of all items after the removed one will change after you remove that element; therefore you'll basically skip the next element after you remove one.
Use an iterator:
for (Iterator<BankItem> itr = listOfBankItems.iterator(); itr.hasNext();) {
BankItem item = itr.next();
int oldValue = item.getAmount();
int newValue = createdItem.getAmount();
int calculatedDif = newValue - oldValue;
if (calculatedDif > 100) {
itr.remove();
}
}

Your most significant issue probably relates to concurrent modification. For example, if element #49 is the one to be removed, once you remove it, the next element will now be #49, but you will be checking for #50 (as i was still incremented) - so you're probably missing elements from your check.
There are a few ways to handle this. You could remove the i++ from your for loop (leaving only the trailing semi-colon), then do this:
if (calculatedDif > 100) {
listOfBankItems.remove(i);
} else {
i++;
}
Alternatively, you could use an Iterator and its remove() method, which would handle this for you automatically.
You can also improve the performance of this by not obtaining newValue and recalculating calculatedDif on every step of the loop. Declare these 2 lines above the for loop.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Properly delete duplicates in a list - java

put everything in a Set using a comparator if necessary, and create a list from this set if you really need a List (and not a Collection) Set<HeaderAndBodyTestcase> set = new Hashset<>(list);

Related

Is removing elements from an ArrayList or LinkedList while iterating through it with a for loop in Java bad? If so, why?

ConcurrentModificationException When removing element using list iterator java

How can I test if an array contains each value from map?

Finding duplicate and non duplicate in Java

Java LinkedList issues - how to remove items that meet certain qualifications

Categories

Resources