How to find duplicate values based upon first 10 digits? - java

I have a scenario where i have a list as below :
List<String> a1 = new ArrayList<String>();
a1.add("1070045028000");
a1.add("1070045028001");
a1.add("1070045052000");
a1.add("1070045086000");
a1.add("1070045052001");
a1.add("1070045089000");
I tried below to find duplicate elements but it will check whole string instead of partial string(first 10 digits).
for (String s:al){
if(!unique.add(s)){
System.out.println(s);
}
}
Is there any possible way to identify all duplicates based upon the first 10 digits of a number & then find the lowest strings by comparing from the duplicates & add in to another list?
Note: Also there will be only 2 duplicates with each 10 digit string code always!!

You may group by a (String s) -> s.substring(0, 10)
Map<String, List<String>> map = list.stream()
.collect(Collectors.groupingBy(s -> s.substring(0, 10)));
map.values() would give you Collection<List<String>> where each List<String> is a list of duplicates.
{
1070045028=[1070045028000, 1070045028001],
1070045089=[1070045089000],
1070045086=[1070045086000],
1070045052=[1070045052000, 1070045052001]
}
If it's a single-element list, no duplicates were found, and you can filter these entries out.
{
1070045028=[1070045028000, 1070045028001],
1070045052=[1070045052000, 1070045052001]
}
Then the problem boils down to reducing a list of values to a single value.
[1070045028000, 1070045028001] -> 1070045028000
We know that the first 10 symbols are the same, we may ignore them while comparing.
[1070045028000, 1070045028001] -> [000, 001]
They are still raw String values, we may convert them to numbers.
[000, 001] -> [0, 1]
A natural Comparator<Integer> will give 0 as the minimum.
0
0 -> 000 -> 1070045028000
Repeat it for all the lists in map.values() and you are done.
The code would be
List<String> result = map
.values()
.stream()
.filter(list -> list.size() > 1)
.map(l -> l.stream().min(Comparator.comparingInt(s -> Integer.valueOf(s.substring(10)))).get())
.collect(Collectors.toList());

A straight-forward loop solution would be
List<String> a1 = Arrays.asList("1070045028000", "1070045028001",
"1070045052000", "1070045086000", "1070045052001", "1070045089000");
Set<String> unique = new HashSet<>();
Map<String,String> map = new HashMap<>();
for(String s: a1) {
String firstTen = s.substring(0, 10);
if(!unique.add(firstTen)) map.put(firstTen, s);
}
for(String s1: a1) {
String firstTen = s1.substring(0, 10);
map.computeIfPresent(firstTen, (k, s2) -> s1.compareTo(s2) < 0? s1: s2);
}
List<String> minDup = new ArrayList<>(map.values());
First, we add all duplicates to a Map, then we iterate over the list again and select the minimum for all values present in the map.
Alternatively, we may add all elements to a map, collecting them into lists, then select the minimum out of those, which have a size bigger than one:
List<String> minDup = new ArrayList<>();
Map<String,List<String>> map = new HashMap<>();
for(String s: a1) {
map.computeIfAbsent(s.substring(0, 10), x -> new ArrayList<>()).add(s);
}
for(List<String> list: map.values()) {
if(list.size() > 1) minDup.add(Collections.min(list));
}
This logic is directly expressible with the Stream API:
List<String> minDup = a1.stream()
.collect(Collectors.groupingBy(s -> s.substring(0, 10)))
.values().stream()
.filter(list -> list.size() > 1)
.map(Collections::min)
.collect(Collectors.toList());
Since you said that there will be only 2 duplicates per key, the overhead of collecting a List before selecting the minimum is negligible.
The solutions above assume that you only want to keep values having duplicates. Otherwise, you can use
List<String> minDup = a1.stream()
.collect(Collectors.collectingAndThen(
Collectors.toMap(s -> s.substring(0, 10), Function.identity(),
BinaryOperator.minBy(Comparator.<String>naturalOrder())),
m -> new ArrayList<>(m.values())));
which is equivalent to
Map<String,String> map = new HashMap<>();
for(String s: a1) {
map.merge(s.substring(0, 10), s, BinaryOperator.minBy(Comparator.naturalOrder()));
}
List<String> minDup = new ArrayList<>(map.values());
Common to those solutions is that you don’t have to identify duplicates first, as when you want to keep unique values too, the task reduces to selecting the minimum when encountering a minimum.

While I hate doing your homework for you, this was fun. :/
public static void main(String[] args) {
List<String> al=new ArrayList<>();
al.add("1070045028000");
al.add("1070045028001");
al.add("1070045052000");
al.add("1070045086000");
al.add("1070045052001");
al.add("1070045089000");
List<String> ret=new ArrayList<>();
for(String a:al) {
boolean handled = false;
for(int i=0;i<ret.size();i++){
String ri = ret.get(i);
if(ri.substring(0, 10).equals(a.substring(0,10))) {
Long iri = Long.parseLong(ri);
Long ia = Long.parseLong(a);
if(ia < iri){
//a is smaller, so replace it in the list
ret.set(i, a);
}
//it was a duplicate, we are done with it
handled = true;
break;
}
}
if(!handled) {
//wasn't a duplicate, just add it
ret.add(a);
}
}
System.out.println(ret);
}
prints
[1070045028000, 1070045052000, 1070045086000, 1070045089000]

Here's another way to do it – construct a Set and store just the 10-digit prefix:
Set<String> set = new HashSet<>();
for (String number : a1) {
String prefix = number.substring(0, 10);
if (set.contains(prefix)) {
System.out.println("found duplicate prefix [" + prefix + "], skipping " + number);
} else {
set.add(prefix);
}
}

Related

Split list into duplicate and non-duplicate lists Java 8

I have a List<String> that may or not contain duplicated values:
In the case of duplicated "ABC" value (only ABC for this matter)
List myList = {"ABC", "EFG", "IJK", "ABC", "ABC"},
I want to split the list in two lists to finally get
List duplicatedValues = {"ABC"};
and
List nonDuplicatedValues = {"EFG", "IJK"};
And also if the list doesn't have more than one "ABC" it will return the same list
What I did so far :
void generateList(List<String> duplicatedValues, List<String> nonDuplicatedValues){
List<String> myList=List.of("ABC","EFG","IJK","ABC","ABC");
Optional<String> duplicatedValue = myList.stream().filter(isDuplicated -> Collections.frequency(myList, "ABC") > 1).findFirst();
if (duplicatedValue.isPresent())
{
duplicatedValues.addAll(List.of(duplicatedValue.get()));
nonDuplicatedValues.addAll(myList.stream().filter(string->string.equals("ABC")).collect(Collectors.toList()));
}
else
{
nonDuplicatedValues.addAll(myList);
}
}
Is there a more efficient way to do that using only a stream of myList ?
You can do something like this:
myList.stream().forEach((x) -> ((Collections.frequency(myList, x) > 1) ? duplicatedValues : nonDuplicatedValues).add(x));
(The duplicatedValues should be a Set to prevent duplications)
Also it can be done by collecting to lists of duplicated and non-duplicated values:
Map<Boolean, List<String>> result = input.stream()
.collect(Collectors.collectingAndThen(
Collectors.groupingBy(s -> s, Collectors.counting()),
m -> m.entrySet().stream()
.collect(Collectors.groupingBy(e -> e.getValue() > 1,
Collectors.mapping(e -> e.getKey(), Collectors.toList()))
)
));
List<String> duplicates = result.get(true);
List<String> nonDuplicates = result.get(false);
It is possible to use a stream to create from your list a Map storing strings and their frequencies in your list; after you can iterate over the map to put elements in lists duplicatedValues and nonDuplicatedValues like below:
List<String> duplicatedValues = new ArrayList<String>();
List<String> nonDuplicatedValues = new ArrayList<String>();
List<String> myList=List.of("ABC","EFG","IJK","ABC","ABC");
Map<String, Long> map = myList.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
map.forEach((k, v) -> { if (v > 1) duplicatedValues.add(k); else nonDuplicatedValues.add(k); });
Here is one way to do it. It basically does a frequency count and divdes accordingly.
List<String> myList = new ArrayList<>(
List.of("ABC", "EFG", "IJK", "ABC", "ABC", "RJL"));
Map<String,Long> freq = new HashMap<>();
for (String str : myList) {
freq.compute(str, (k,v)->v == null ? 1 : v + 1);
}
Map<String,List<String>> dupsAndNonDups = new HashMap<>();
for (Entry<String,Long> e : freq.entrySet()) {
dupsAndNonDups.computeIfAbsent(e.getValue() > 1 ? "dups" : "nondups",
k-> new ArrayList<>()).add(e.getKey());
}
System.out.println("dups = " + dupsAndNonDups.get("dups"));
Prints
dups = [ABC]
nondups = [RJL, EFG, IJK]

Show all the longest words from finite Stream

I have to find all longest words from given file using Streams API. I did it in few steps but looking for some "one liner", actually I processing whole file two times, first to find max length of word and second for comparing all to max length, assuming its not the best for performance ; P Could someone help me? Just look at code:
public class Test {
public static void main(String[] args) throws IOException {
List<String> words = Files.readAllLines(Paths.get("alice.txt"));
OptionalInt longestWordLength = words.stream().mapToInt(String::length).max();
Map<Integer, List<String>> groupedByLength = words.stream().collect(Collectors.groupingBy(String::length));
List<String> result = groupedByLength.get(longestWordLength.getAsInt());
}
}
I wish to make it straight:
List<String> words = Files.readAllLines(Paths.get("alice.txt"));
List<String> result = // code
File contains just one word per line, anyway it's not important - question is about the right stream code.
Instead of just keeping the largest length, you could collect the words into a map from their length to the words, and then just take the longest one:
List<String> longestWords =
Files.lines(Paths.get("alice.txt"))
.collect(Collectors.groupingBy(String::length))
.entrySet()
.stream()
.sorted(Map.Entry.<Integer, List<String>> comparingByKey().reversed())
.map(Map.Entry::getValue)
.findFirst()
.orElse(null);
EDIT:
As Malte Hartwig noted, using max on the streamed map is much more elegant (and probably faster):
List<String> longestWords =
Files.lines(Paths.get("alice.txt"))
.collect(Collectors.groupingBy(String::length))
.entrySet()
.stream()
.max(Map.Entry.comparingByKey())
.map(Map.Entry::getValue)
.orElse(null);
EDIT2:
There's a built-in inefficiency in both the above solutions, as they both build a map a essentially store the lengths for all the strings in the file instead of just the longest ones. If performance is more important than elegance in your usecase, you could write your own Collector to just preserve the longest strings in list:
private static int stringInListLength(List<String> list) {
return list.stream().map(String::length).findFirst().orElse(0);
}
List<String> longestWords =
Files.lines(Paths.get("alice.txt"))
.collect(Collector.of(
LinkedList::new,
(List<String> list, String string) -> {
int stringLen = string.length();
int listStringLen = stringInListLength(list);
if (stringLen > listStringLen) {
list.clear();
}
if (stringLen >= listStringLen) {
list.add(string);
}
},
(list1, list2) -> {
int list1StringLen = stringInListLength(list1);
int list2StringLen = stringInListLength(list2);
if (list1StringLen > list2StringLen) {
return list1;
}
if (list2StringLen > list1StringLen) {
return list2;
}
list1.addAll(list2);
return list1;
}
));
reduce will help you:
Optional<String> longest = words.stream()
.reduce((s1, s2) -> {
if (s1.length() > s2.length())
return s1;
else
return s2;
});
In case the Stream is empty it will return an Optional.empty
In case you want the list of all the words that have the maximum length this piece will help you:
Optional<List<String>> longest = words.stream()
.collect(Collectors.groupingBy(
String::length,
Collectors.toList()
))
.entrySet()
.stream()
.reduce(
(entry1, entry2) -> {
if (entry1.getKey() > entry2.getKey())
return entry1;
else
return entry2;
}
)
.map(Map.Entry::getValue);
iterate over the map keys to find the longest wordlength

How to remove all duplicated strings from a Java List?

For a given list, say [ "a", "a", "b", "c", "c" ] I need [ "b" ] (only non duplicated elements) as output. Note that this is different from using the Set interface for the job...
I wrote the following code to do this in Java:
void unique(List<String> list) {
Collections.sort(list);
List<String> dup = new ArrayList<>();
int i = 0, j = 0;
for (String e : list) {
i = list.indexOf(e);
j = list.lastIndexOf(e);
if (i != j && !dup.contains(e)) {
dup.add(e);
}
}
list.removeAll(dup);
}
It works... but for a list of size 85320, ends after several minutes!
You best performance is with set:
String[] xs = { "a", "a", "b", "c", "c" };
Set<String> singles = new TreeSet<>();
Set<String> multiples = new TreeSet<>();
for (String x : xs) {
if(!multiples.contains(x)){
if(singles.contains(x)){
singles.remove(x);
multiples.add(x);
}else{
singles.add(x);
}
}
}
It's a single pass and insert , remove and contains are log(n).
Using Java 8 streams:
return list.stream()
.collect(Collectors.groupingBy(e -> e, Collectors.counting()))
.entrySet()
.stream()
.filter(e -> e.getValue() == 1)
.map(Map.Entry::getKey)
.collect(Collectors.toList());
You can use streams to achieve this in simpler steps as shown below with inline comments:
//Find out unique elements first
List<String> unique = list.stream().distinct().collect(Collectors.toList());
//List to collect output list
List<String> output = new ArrayList<>();
//Iterate over each unique element
for(String element : unique) {
//if element found only ONCE add to output list
if(list.stream().filter(e -> e.equals(element)).count() == 1) {
output.add(element);
}
}
you can use a Map. do the following
1. Create a map of following type Map<String, Integer>
2. for all elements
check if the string is in hashmap
if yes then increment the value of that map entry by 1
else add <current element , 1>
3. now your output are those entries of the Map whose values are 1.
Given that you can sort the list, about the most efficient way to do this is to use a ListIterator to iterate over runs of adjacent elements:
List<String> dup = new ArrayList<>();
Collections.sort(list);
ListIterator<String> it = list.listIterator();
while (it.hasNext()) {
String first = it.next();
// Count the number of elements equal to first.
int cnt = 1;
while (it.hasNext()) {
String next = it.next();
if (!first.equals(next)) {
it.previous();
break;
}
++cnt;
}
// If there are more than 1 elements between i and start
// it's duplicated. Otherwise, it's a singleton, so add it
// to the output.
if (cnt == 1) {
dup.add(first);
}
}
return dup;
ListIterator is more efficient for lists which don't support random access, like LinkedList, than using index-based access.

Counting same Strings from Array in Java

How can I count the same Strings from an array and write them out in the console?
The order of the items should correspond to the order of the first appearance of the item. If there are are two or more items of a kind, add an "s" to the item name.
String[] array = {"Apple","Banana","Apple","Peanut","Banana","Orange","Apple","Peanut"};
Output:
3 Apples
2 Bananas
2 Peanuts
1 Orange
I tried this:
String[] input = new String[1000];
Scanner sIn = new Scanner(System.in);
int counter =0;
String inputString = "start";
while(inputString.equals("stop")==false){
inputString = sIn.nextLine();
input[counter]=inputString;
counter++;
}
List<String> asList = Arrays.asList(input);
Map<String, Integer> map = new HashMap<String, Integer>();
for (String s : input) {
map.put(s, Collections.frequency(asList, s));
}
System.out.println(map);
But I don't know how to get the elements out of the Map and sort them like I would like.
You can use a Map to put your result, here is a simple example:
public static void main(String args[]){
String[] array = {"Apple","Banana","Apple","Peanut","Banana","Orange","Apple","Peanut"};
Map<String, Integer> result = new HashMap<>();
for(String s : array){
if(result.containsKey(s)){
//if the map contain this key then just increment your count
result.put(s, result.get(s)+1);
}else{
//else just create a new node with 1
result.put(s, 1);
}
}
System.out.println(result);
}
Use Java streams groupingBy and collect the results into a Map<String, Long> as shown below:
String[] array = {"Apple","Banana","Apple","Peanut","Banana","Orange","Apple", "Peanut"};
Map<String, Long> map = Stream.of(array).collect(Collectors.
groupingBy(Function.identity(), //use groupingBy array element
Collectors.counting())); //count number of occurances
System.out.println(map);//output the results of the Map
Java 8 would allow a pretty elegant way of doing this with groupingBy and counting. Using a LinkedHashMap instead of the default map should handle the ordering:
Arrays.stream(array)
.collect(Collectors.groupingBy(Function.identity(),
LinkedHashMap::new,
Collectors.counting()))
.entrySet()
.forEach(e -> System.out.println(e.getValue() +
"\t" +
e.getKey() +
(e.getValue() > 1 ? "s" : "")));
use java 8
Map<String, Long> myMap = Stream.of(array).collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

Recursively or iteratively retrieve key-value combinations from a HashMap

I want to retrieve k,v-pairs from a HashMap.
The entrys are like this:
a = 3,4
b = 5,6
and so on. I need combinations of these values.
a=3, b=5
a=3, b=6
a=4, b=5
a=4, b=6
I don't know how many keys and how many entrys the values have. With entrySet I can get the values but not combinations. It looks like recursion but how?
Here's my code:
HashMap<String, String[]> map = new HashMap<String, String[]>();
BufferedReader file = new BufferedReader(new FileReader("test.txt"));
String str;
while ((str = file.readLine()) != null) {
// ... logic
map.put(key, value);
}
System.out.println("number of keys: " + map.size());
for (Map.Entry<String, String[]> entry : map.entrySet()) {
for (String value : entry.getValue()) {
System.out.println(entry.getKey() + ": " + value);
}
}
file.close();
You can try the following code:
public void mapPermute(Map<String, String[]> map, String currentPermutation) {
String key = map.keySet().iterator().next(); // get the topmost key
// base case
if (map.size() == 1) {
for (String value : map.get(key)) {
System.out.println(currentPermutation + key + "=" + value);
}
} else {
// recursive case
Map<String, String[]> subMap = new HashMap<String, String[]>(map);
for (String value : subMap.remove(key)) {
mapPermute(subMap, currentPermutation + key + "=" + value + ", ");
}
}
}
No guarantees on memory efficiency or speed. If you want to preserve the order of the keys in the map, you will have to pass in a TreeMap and change the code to use a TreeMap under the recursive case.
As the base case suggests, I'm assuming you have at least one entry in your map.
You can obtain a Cartesian product of map key-value combinations using a map and reduce approach.
Try it online!
Map<String, String[]> map = Map.of(
"a", new String[]{"3", "4"},
"b", new String[]{"5", "6"});
List<Map<String, String>> comb = map.entrySet().stream()
// Stream<List<Map<String,String>>>
.map(e -> Arrays.stream(e.getValue())
.map(v -> Map.of(e.getKey(), v))
.collect(Collectors.toList()))
// summation of pairs of list into a single list
.reduce((list1, list2) -> list1.stream()
// combinations of inner maps
.flatMap(map1 -> list2.stream()
// concatenate into a single map
.map(map2 -> {
Map<String, String> m = new HashMap<>();
m.putAll(map1);
m.putAll(map2);
return m;
}))
// list of combinations
.collect(Collectors.toList()))
// otherwise, an empty list
.orElse(Collections.emptyList());
// output, order may vary
comb.forEach(System.out::println);
Output, order may vary:
{a=3, b=5}
{a=3, b=6}
{a=4, b=5}
{a=4, b=6}
See also: Cartesian product of map values
It looks to me like you really want a MultiMap. In particular, ArrayListMultimap allows duplicate entries:
ArrayListMultimap<String, String> map = ArrayListMultimap.create();
for each line in file:
parse key k
for each value in line:
parse value v
map.put(k, v);
for (Map.Entry<String, String> entry : map.entries()) {
String key = entry.getKey();
String value = entry.getValue();
}
If you want a cartesian product of maps, you could compute that directly using recursion, or you could iterate over the maps: create a list of iterators and iterate odometer-style; when iterator N reaches its end, advance iterator N+1 and reset iterators 1..N.
Just poked around and found this SO question.
So I'd recommend you use guava's Sets.cartesianProduct for the cartesian product. Here's my poking around code, which you could adapt to your input logic:
String key1 = "a";
Set<Integer> values1 = Sets.newLinkedHashSet(Arrays.asList(1, 2, 3, 4));
String key2 = "b";
Set<Integer> values2 = Sets.newLinkedHashSet(Arrays.asList(5, 6, 7));
String key3 = "c";
Set<Integer> values3 = Sets.newLinkedHashSet(Arrays.asList(8, 9));
List<String> keys = Arrays.asList(key1, key2, key3);
Set<List<Integer>> product = Sets.cartesianProduct(values1, values2, values3);
for (List<Integer> values : product) {
for (int i = 0; i < keys.size(); ++i) {
String key = keys.get(i);
int value = values.get(i);
System.out.print(key + "=" + value + "; ");
}
System.out.println();
}

Categories