How to remove all duplicated strings from a Java List?

How to remove all duplicated strings from a Java List? - java

For a given list, say [ "a", "a", "b", "c", "c" ] I need [ "b" ] (only non duplicated elements) as output. Note that this is different from using the Set interface for the job...
I wrote the following code to do this in Java:
void unique(List<String> list) {
Collections.sort(list);
List<String> dup = new ArrayList<>();
int i = 0, j = 0;
for (String e : list) {
i = list.indexOf(e);
j = list.lastIndexOf(e);
if (i != j && !dup.contains(e)) {
dup.add(e);
}
}
list.removeAll(dup);
}
It works... but for a list of size 85320, ends after several minutes!

You best performance is with set:
String[] xs = { "a", "a", "b", "c", "c" };
Set<String> singles = new TreeSet<>();
Set<String> multiples = new TreeSet<>();
for (String x : xs) {
if(!multiples.contains(x)){
if(singles.contains(x)){
singles.remove(x);
multiples.add(x);
}else{
singles.add(x);
}
}
}
It's a single pass and insert , remove and contains are log(n).

Using Java 8 streams:
return list.stream()
.collect(Collectors.groupingBy(e -> e, Collectors.counting()))
.entrySet()
.stream()
.filter(e -> e.getValue() == 1)
.map(Map.Entry::getKey)
.collect(Collectors.toList());

You can use streams to achieve this in simpler steps as shown below with inline comments:
//Find out unique elements first
List<String> unique = list.stream().distinct().collect(Collectors.toList());
//List to collect output list
List<String> output = new ArrayList<>();
//Iterate over each unique element
for(String element : unique) {
//if element found only ONCE add to output list
if(list.stream().filter(e -> e.equals(element)).count() == 1) {
output.add(element);
}
}

you can use a Map. do the following
1. Create a map of following type Map<String, Integer>
2. for all elements
check if the string is in hashmap
if yes then increment the value of that map entry by 1
else add <current element , 1>
3. now your output are those entries of the Map whose values are 1.

Given that you can sort the list, about the most efficient way to do this is to use a ListIterator to iterate over runs of adjacent elements:
List<String> dup = new ArrayList<>();
Collections.sort(list);
ListIterator<String> it = list.listIterator();
while (it.hasNext()) {
String first = it.next();
// Count the number of elements equal to first.
int cnt = 1;
while (it.hasNext()) {
String next = it.next();
if (!first.equals(next)) {
it.previous();
break;
}
++cnt;
}
// If there are more than 1 elements between i and start
// it's duplicated. Otherwise, it's a singleton, so add it
// to the output.
if (cnt == 1) {
dup.add(first);
}
}
return dup;
ListIterator is more efficient for lists which don't support random access, like LinkedList, than using index-based access.

Related

Remove duplicates from list based on duplicate index in another list

I have 2 Lists: Names & IDs.
There are cases where the same name will appear multiple times. For example:
Names = {'ben','david','jerry','tom','ben'}
IDs = {'123','23456','34567','123','123'}
I know I can use
Set<String> set = new LinkedHashSet<>( Names );
Names .clear();
Names .addAll( set );
In order to remove duplicates, however, it not what I want.
What I would like to do is to check where Names has a duplicate value which in this case will be the last value and then remove from IDs the value at that position so the final result will be:
Names = {'ben','david','jerry','tom'}
IDs = {'123','23456','34567','123'}
How can I get the index of the duplicated value in order to remove it from the second list? or is there some easy and fast way to do it?
I'm sure I can solve it by using a loop but I try to avoid it.
SOLUTION:
I changed the code to use:
Map<String, String> map = new HashMap<>();
When using:
map.put(name,id);
It might not do the job since there are cases where the same name has different it and it won't allow duplicate in the name so just changed to map.put(id,name) and it did the job.
Thank you

You could collect the data from both input arrays/lists into a set of pairs and then recollect the pairs back to two new lists (or clear and reuse existing names/IDs lists):
List<String> names = Arrays.asList("ben","david","jerry","tom","ben");
List<String> IDs = Arrays.asList("123","23456","34567","123","123");
// assuming that both lists have the same size
// using list to store a pair
Set<List<String>> deduped = IntStream.range(0, names.size())
.mapToObj(i -> Arrays.asList(names.get(i), IDs.get(i)))
.collect(Collectors.toCollection(LinkedHashSet::new));
System.out.println(deduped);
System.out.println("-------");
List<String> dedupedNames = new ArrayList<>();
List<String> dedupedIDs = new ArrayList<>();
deduped.forEach(pair -> {dedupedNames.add(pair.get(0)); dedupedIDs.add(pair.get(1)); });
System.out.println(dedupedNames);
System.out.println(dedupedIDs);
Output:
[[ben, 123], [david, 23456], [jerry, 34567], [tom, 123]]
-------
[ben, david, jerry, tom]
[123, 23456, 34567, 123]

You can collect as a map using Collectors.toMap then get the keySet and values from map for names and ids list.
List<String> names = Arrays.asList("ben","david","jerry","tom","ben");
List<String> ids = Arrays.asList("123","23456","34567","123","123");
Map<String, String> map =
IntStream.range(0, ids.size())
.boxed()
.collect(Collectors.toMap(i -> names.get(i), i -> ids.get(i),
(a,b) -> a, LinkedHashMap::new));
List<String> newNames = new ArrayList<>(map.keySet());
List<String> newIds = new ArrayList<>(map.values());
You can do map creation part using loop also
Map<String, String> map = new LinkedHashMap<>();
for (int i = 0; i < names.size(); i++) {
if(!map.containsKey(names.get(i))) {
map.put(names.get(i), ids.get(i));
}
}

You could add your names one by one to a set as long as Set.add returns true and if it returns false store the index of that element in a list (indices to remove). Then sort the indices list in reverse order and use List.remove(int n) on both your names list and id list:
List<String> names = ...
List<String> ids = ...
Set<String> set = new HashSet<>();
List<Integer> toRemove = new ArrayList<>();
for(int i = 0; i< names.size(); i ++){
if(!set.add(names.get(i))){
toRemove.add(i);
}
}
Collections.sort(toRemove, Collections.reverseOrder());
for (int i : toRemove){
names.remove(i);
ids.remove(i);
}
System.out.println(toRemove);
System.out.println(names);
System.out.println(ids);

Split list into duplicate and non-duplicate lists Java 8

I have a List<String> that may or not contain duplicated values:
In the case of duplicated "ABC" value (only ABC for this matter)
List myList = {"ABC", "EFG", "IJK", "ABC", "ABC"},
I want to split the list in two lists to finally get
List duplicatedValues = {"ABC"};
and
List nonDuplicatedValues = {"EFG", "IJK"};
And also if the list doesn't have more than one "ABC" it will return the same list
What I did so far :
void generateList(List<String> duplicatedValues, List<String> nonDuplicatedValues){
List<String> myList=List.of("ABC","EFG","IJK","ABC","ABC");
Optional<String> duplicatedValue = myList.stream().filter(isDuplicated -> Collections.frequency(myList, "ABC") > 1).findFirst();
if (duplicatedValue.isPresent())
{
duplicatedValues.addAll(List.of(duplicatedValue.get()));
nonDuplicatedValues.addAll(myList.stream().filter(string->string.equals("ABC")).collect(Collectors.toList()));
}
else
{
nonDuplicatedValues.addAll(myList);
}
}
Is there a more efficient way to do that using only a stream of myList ?

You can do something like this:
myList.stream().forEach((x) -> ((Collections.frequency(myList, x) > 1) ? duplicatedValues : nonDuplicatedValues).add(x));
(The duplicatedValues should be a Set to prevent duplications)

Also it can be done by collecting to lists of duplicated and non-duplicated values:
Map<Boolean, List<String>> result = input.stream()
.collect(Collectors.collectingAndThen(
Collectors.groupingBy(s -> s, Collectors.counting()),
m -> m.entrySet().stream()
.collect(Collectors.groupingBy(e -> e.getValue() > 1,
Collectors.mapping(e -> e.getKey(), Collectors.toList()))
)
));
List<String> duplicates = result.get(true);
List<String> nonDuplicates = result.get(false);

It is possible to use a stream to create from your list a Map storing strings and their frequencies in your list; after you can iterate over the map to put elements in lists duplicatedValues and nonDuplicatedValues like below:
List<String> duplicatedValues = new ArrayList<String>();
List<String> nonDuplicatedValues = new ArrayList<String>();
List<String> myList=List.of("ABC","EFG","IJK","ABC","ABC");
Map<String, Long> map = myList.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
map.forEach((k, v) -> { if (v > 1) duplicatedValues.add(k); else nonDuplicatedValues.add(k); });

Here is one way to do it. It basically does a frequency count and divdes accordingly.
List<String> myList = new ArrayList<>(
List.of("ABC", "EFG", "IJK", "ABC", "ABC", "RJL"));
Map<String,Long> freq = new HashMap<>();
for (String str : myList) {
freq.compute(str, (k,v)->v == null ? 1 : v + 1);
}
Map<String,List<String>> dupsAndNonDups = new HashMap<>();
for (Entry<String,Long> e : freq.entrySet()) {
dupsAndNonDups.computeIfAbsent(e.getValue() > 1 ? "dups" : "nondups",
k-> new ArrayList<>()).add(e.getKey());
}
System.out.println("dups = " + dupsAndNonDups.get("dups"));
Prints
dups = [ABC]
nondups = [RJL, EFG, IJK]

How to find duplicate values based upon first 10 digits?

I have a scenario where i have a list as below :
List<String> a1 = new ArrayList<String>();
a1.add("1070045028000");
a1.add("1070045028001");
a1.add("1070045052000");
a1.add("1070045086000");
a1.add("1070045052001");
a1.add("1070045089000");
I tried below to find duplicate elements but it will check whole string instead of partial string(first 10 digits).
for (String s:al){
if(!unique.add(s)){
System.out.println(s);
}
}
Is there any possible way to identify all duplicates based upon the first 10 digits of a number & then find the lowest strings by comparing from the duplicates & add in to another list?
Note: Also there will be only 2 duplicates with each 10 digit string code always!!

You may group by a (String s) -> s.substring(0, 10)
Map<String, List<String>> map = list.stream()
.collect(Collectors.groupingBy(s -> s.substring(0, 10)));
map.values() would give you Collection<List<String>> where each List<String> is a list of duplicates.
{
1070045028=[1070045028000, 1070045028001],
1070045089=[1070045089000],
1070045086=[1070045086000],
1070045052=[1070045052000, 1070045052001]
}
If it's a single-element list, no duplicates were found, and you can filter these entries out.
{
1070045028=[1070045028000, 1070045028001],
1070045052=[1070045052000, 1070045052001]
}
Then the problem boils down to reducing a list of values to a single value.
[1070045028000, 1070045028001] -> 1070045028000
We know that the first 10 symbols are the same, we may ignore them while comparing.
[1070045028000, 1070045028001] -> [000, 001]
They are still raw String values, we may convert them to numbers.
[000, 001] -> [0, 1]
A natural Comparator<Integer> will give 0 as the minimum.
0
0 -> 000 -> 1070045028000
Repeat it for all the lists in map.values() and you are done.
The code would be
List<String> result = map
.values()
.stream()
.filter(list -> list.size() > 1)
.map(l -> l.stream().min(Comparator.comparingInt(s -> Integer.valueOf(s.substring(10)))).get())
.collect(Collectors.toList());

A straight-forward loop solution would be
List<String> a1 = Arrays.asList("1070045028000", "1070045028001",
"1070045052000", "1070045086000", "1070045052001", "1070045089000");
Set<String> unique = new HashSet<>();
Map<String,String> map = new HashMap<>();
for(String s: a1) {
String firstTen = s.substring(0, 10);
if(!unique.add(firstTen)) map.put(firstTen, s);
}
for(String s1: a1) {
String firstTen = s1.substring(0, 10);
map.computeIfPresent(firstTen, (k, s2) -> s1.compareTo(s2) < 0? s1: s2);
}
List<String> minDup = new ArrayList<>(map.values());
First, we add all duplicates to a Map, then we iterate over the list again and select the minimum for all values present in the map.
Alternatively, we may add all elements to a map, collecting them into lists, then select the minimum out of those, which have a size bigger than one:
List<String> minDup = new ArrayList<>();
Map<String,List<String>> map = new HashMap<>();
for(String s: a1) {
map.computeIfAbsent(s.substring(0, 10), x -> new ArrayList<>()).add(s);
}
for(List<String> list: map.values()) {
if(list.size() > 1) minDup.add(Collections.min(list));
}
This logic is directly expressible with the Stream API:
List<String> minDup = a1.stream()
.collect(Collectors.groupingBy(s -> s.substring(0, 10)))
.values().stream()
.filter(list -> list.size() > 1)
.map(Collections::min)
.collect(Collectors.toList());
Since you said that there will be only 2 duplicates per key, the overhead of collecting a List before selecting the minimum is negligible.
The solutions above assume that you only want to keep values having duplicates. Otherwise, you can use
List<String> minDup = a1.stream()
.collect(Collectors.collectingAndThen(
Collectors.toMap(s -> s.substring(0, 10), Function.identity(),
BinaryOperator.minBy(Comparator.<String>naturalOrder())),
m -> new ArrayList<>(m.values())));
which is equivalent to
Map<String,String> map = new HashMap<>();
for(String s: a1) {
map.merge(s.substring(0, 10), s, BinaryOperator.minBy(Comparator.naturalOrder()));
}
List<String> minDup = new ArrayList<>(map.values());
Common to those solutions is that you don’t have to identify duplicates first, as when you want to keep unique values too, the task reduces to selecting the minimum when encountering a minimum.

While I hate doing your homework for you, this was fun. :/
public static void main(String[] args) {
List<String> al=new ArrayList<>();
al.add("1070045028000");
al.add("1070045028001");
al.add("1070045052000");
al.add("1070045086000");
al.add("1070045052001");
al.add("1070045089000");
List<String> ret=new ArrayList<>();
for(String a:al) {
boolean handled = false;
for(int i=0;i<ret.size();i++){
String ri = ret.get(i);
if(ri.substring(0, 10).equals(a.substring(0,10))) {
Long iri = Long.parseLong(ri);
Long ia = Long.parseLong(a);
if(ia < iri){
//a is smaller, so replace it in the list
ret.set(i, a);
}
//it was a duplicate, we are done with it
handled = true;
break;
}
}
if(!handled) {
//wasn't a duplicate, just add it
ret.add(a);
}
}
System.out.println(ret);
}
prints
[1070045028000, 1070045052000, 1070045086000, 1070045089000]

Here's another way to do it – construct a Set and store just the 10-digit prefix:
Set<String> set = new HashSet<>();
for (String number : a1) {
String prefix = number.substring(0, 10);
if (set.contains(prefix)) {
System.out.println("found duplicate prefix [" + prefix + "], skipping " + number);
} else {
set.add(prefix);
}
}

How to GROUP BY same strings in Java

I have got an Arraylist of strings and I need to return the Arraylist indexes of strings that are the same.
For example
Arraylist[0]: IPAddress
Arraylist[1]: DomainName
Arraylist[2]: IPAddress
Arraylist[3]: Filesize
The output should be:
Arraylist[0]
IPAddress|0,2 //0,2 denotes the arraylist index that is of the same
Arraylist[1]
DomainName|1
Arraylist[2]
Filesize|3
Any idea how can this be achieved?
What I have done is:
for(int i=0; i<arr.size(); i++){
if(arr.get(i).equals(arr.size()-1)){
//print index
}
}

With Java8 streams
List<String> strings = Arrays.asList("IPAddress", "DomainName", "IPAddress", "Filesize");
Map<String, List<Integer>> map = IntStream.range(0, strings.size()).boxed().collect(Collectors.groupingBy(strings::get));
System.out.println(map);
output
{DomainName=[1], Filesize=[3], IPAddress=[0, 2]}
To get the results in ordered
Map<String, List<Integer>> map = IntStream.range(0, strings.size())
.boxed()
.collect(Collectors.groupingBy(strings::get, LinkedHashMap::new, Collectors.toList()));

The mechanical steps are fairly straightforward:
Get a collection which can support a key (which is the string in your list) and a list of values representing the indexes in which they occur (which would be another ArrayList).
If the element exists in the collection, simply add the index to its value.
Otherwise, create a new list, add the index to that, then add that to the collection.
Here is some sample code below.
final List<String> list = new ArrayList<String>() {{
add("IPAddress");
add("DomainName");
add("IPAddress");
add("Filesize");
}};
final Map<String, List<Integer>> correlations = new LinkedHashMap<>();
for (int i = 0; i < list.size(); i++) {
final String key = list.get(i);
if (correlations.containsKey(key)) {
correlations.get(key).add(i);
} else {
final List<Integer> indexList = new ArrayList<>();
indexList.add(i);
correlations.put(key, indexList);
}
}
Any optimizations to the above are left as an exercise for the reader.

How to build an object from similar key value prefix in a map

Firstly I am using Java to write this.
So I have a Map with keys and values like this.
Key = Value
"a.1" = "a1"
"a.2" = "a2"
"a.3" = "a3"
"b.1" = "b1"
"b.2" = "b2"
"b.3" = "b3"
"c.1" = "c1"
"c.2" = "c2"
"c.3" = "c3"
What I need to do is get it to split all up and make so that eventually I can loop through and create new objects using
someloop{
new someObject(a1,b1,c1); // new someObject(a2,b2,c2); // new someObject(a3,b3,c3);
}
I need to be able to make it dynamic so I can add another prefix (d,e) and also check if a number is missing or is skipped.

If you change the constructor of SomeObject to accept a list of Strings, this might work:
Map<String, String> map = new HashMap<>();
map.put("a.1", "a1");
map.put("a.2", "a2");
map.put("a.3", "a3");
map.put("b.1", "b1");
map.put("b.2", "b2");
map.put("b.3", "b3");
map.put("c.1", "c1");
map.put("c.2", "c2");
map.put("c.3", "c3");
Map<String, List<String>> grouped = map.entrySet().stream()
.sorted(Comparator.comparing(Map.Entry::getKey))
.collect(Collectors.groupingBy(
entry -> entry.getKey().split("\\.")[0],
HashMap::new,
Collectors.mapping(Map.Entry::getValue, Collectors.toList())));
List<SomeObject> objects = grouped.values().stream().map(SomeObject::new).collect(Collectors.toList());
System.out.println(objects);

I have omitted input validation:
public static void buildObjects(Map<String, String> keyValuePairs) {
List<List<String>> sortedValues = new ArrayList<>();
// assuming keys are ending in a digit 1 through 9, add empty lists to sortedValues to hold values
sortedValues.add(null); // index 0
for (int index = 0; index <= 9; index++) {
sortedValues.add(new ArrayList<>());
}
for (Map.Entry<String, String> pair : keyValuePairs.entrySet()) {
String key = pair.getKey();
int indexOfDot = key.indexOf('.');
int suffix = Integer.parseInt(key.substring(indexOfDot + 1));
sortedValues.get(suffix).add(pair.getValue());
}
for (List<String> list : sortedValues) {
if (list != null && ! list.isEmpty()) {
new SomeObject(list.toArray(new String[list.size()]));
}
}
}
You will probably also want to add code that does something with the created objects.
If you want to be sure about the order of values passed to the constructor, you may use a new TreeMap(keyValuePairs) or even new TreeMap(yourComparator).putAll(keyValuePairs). This will control the order in which the keys are processed.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to remove all duplicated strings from a Java List? - java

Using Java 8 streams: return list.stream() .collect(Collectors.groupingBy(e -> e, Collectors.counting())) .entrySet() .stream() .filter(e -> e.getValue() == 1) .map(Map.Entry::getKey) .collect(Collectors.toList());

you can use a Map. do the following 1. Create a map of following type Map<String, Integer> 2. for all elements check if the string is in hashmap if yes then increment the value of that map entry by 1 else add <current element , 1> 3. now your output are those entries of the Map whose values are 1.

Related

Remove duplicates from list based on duplicate index in another list

Split list into duplicate and non-duplicate lists Java 8

How to find duplicate values based upon first 10 digits?

How to GROUP BY same strings in Java

How to build an object from similar key value prefix in a map

Categories

Resources