Finding pairs of duplicates in a Java ArrayList

Finding pairs of duplicates in a Java ArrayList - java

I'm looking to find the number of duplicate pairs in a Java ArrayList.
I can work it out on paper but I don't know if there is some form of mathematical formula for working this out easily as I'm trying to avoid nested for loops in my code.
An example using the data set [2,2,3,2,2]:
0:1, 0:3, 0:4, 1:3, 1:4, 3:4. So the answer is six duplicate pairs?

You just need to count how many times each number appears (I would go with a map here) and calculate 2-combinations ( http://en.wikipedia.org/wiki/Combination ) of that count for each number with a count > 1.
So basically you need a method to calculate n!/k!(n-k)! with k being 2 and n being the count.
Taking your example [2,2,3,2,2], the number 2 appears 4 times, so the math would go:
4!/2!(4-2)! = 24/4 = 6 --> 6 pairs
If you don't want to implement the factorial function, you can use the ArithmeticUtils from Apache Commons, they already have the factorial implemented.

If you want to avoid nested loops (at the expense of having 2 loops), you could:
for each number in the list, find how many times each number is repeated (maybe use a Map with key = number, value = times that number occurred in the List)
for each number in the map, calculate the number of possible combinations based on the times that it occurred (0 or 1 times = no duplicate pairs, 2 or more = n!/(2*(n-2)!) = (n*(n-1))/2 duplicate pairs)
sum all the possible combinations
Doing a sort like ElKamina suggests would allow for some optimization on this method.

Sort the numbers first. Later, if there k copies of a given number, there will be k*(k-1)/2 pairs from that number. Now sum it over all the numbers.

Using Guava, if your elements were Strings:
Multiset<String> multiset = HashMultiset.create(list);
int pairs = 0;
for(Multiset.Entry<String> entry : multiset.entrySet()) {
pairs += IntMath.binomial(entry.getCount(), 2);
}
return pairs;
That uses Guava's Multiset and math utilities.

Related

Create a List of random Integers using Stream and sum all elements except the smallest?

I want to generate 4 random numbers, ranging from 1 through 6 inclusive. Then I want to get the sum of these elements excluding the smallest value.
I am currently creating one stream to populate a list:
List<Integer> values = r.ints(4,1,7).boxed().collect(Collectors.toList())
Then I remove the smallest value and use another stream to get the sum of the values:
values.stream().mapToInt(Integer::intValue).sum();
Can someone suggest a way to perform all these operations in a single stream?

Sort the stream, then skip the first (ie smallest) element:
int sumExceptSmallest = IntStream.of(4,1,7).sorted().skip(1).sum(); // 11
or in your specific case:
int sumExceptSmallest = r.ints(4,1,7).sorted().skip(1).sum();
Note that while this may be the coolest and most efficient for the coder, it is not the most efficient possible solution because the sort has time complexity of O(n log n). The most efficient run time would be a single pass to both find the smallest and compute the sum, then subtract one from the other, yielding the solution in O(n) time.

Using IntSummaryStatistics can calculate this in your stream with a single pass:
IntStream stream = /* random stream of integers */;
IntSummaryStatistics stats = stream.summaryStatistics();
int sum = stats.sum();
sum -= stats.min(); //sum, minus the lowest element

Eliminate duplicate value from a arraylist in java

I have a number. 1st I found out the prime factor of this number. Say the number is 12, then prime factor [2,2,3].
Next I have to find out the other factor of this number like 12/2=6 ,[2,6] one factor. Second one is 12/3=4, [3,4] another factor.
2nd example , I considered another number =30. prime factor is[2,3,5]. Other factor is[2,15],[3,10],[5,6].
1 and the number itself is excluded.
Now I take an arraylist which consist of the prime factors of a given number. Then I iterate a loop and divide the number with there prime factor and get another factor.
Say ArrayList abc={2,3,5}
if(abc.size()>=3){
for(int i=0;i<abc.size();i++){
abc1.add(abc.get(i));
abc1.add(number / abc.get(i));
}
}
abc1 is another ArrayList for appending purpose. Now this solution work well when the abc arraylist consist of 3 or more than 3 different numbers like example 30. But it doesn't work well for a repeating numbers like example 12 where is the prime list 2 is a repeated number. I get the output [2,6],[2,6],[3,4].
To find out the repeated number from a list I write down this code block
for(int k=0;k<abc.size();k++){
for(int j=k+1;j<abc.size();j++){
if(abc.get(k).equals(abc.get(j))){
System.out.println(abc.get(j));
}
}
}
But how could I use this with previous code to eliminate one 2.

If you want to avoid duplicates you could use a linkedHashSet (to remove duplicates and maintain order or insertion) and then pass in that as the argument in the constructor of an ArrayList.
Set<Integer> s = new LinkedHashSet<>();
ArrayList<Integer> a = new ArrayList<>(s);
I hope I was able to answer your question, if I did not please let me know.

Data structure for permutations in Java

I need to store a permutation of n integers and be able to compute both the permutation of a value and the inverse operation in efficient time.
I.e, I need to store a reordering of values [0...n-1] in such a way I can ask for position(i) and value(j) (with 0 <= i,j <= n).
With an example—Suppose we have the following permutation of values:
[7,2,3,6,0,4,8,9,1,5]
I need the following operations:
position(7) = 9
value(9) = 7
I know libraries in C++ for that, such as: https://github.com/fclaude/libcds2
Is there any structure or library in Java that allows to do that and is efficient in space and time?

If there are no duplicates, the List interface will suit your needs.
It provides the following methods:
List#get(index) returns the element with index index
List#indexOf(element) returns the index of the first encountered element

Combination Algorithm from multiple sets

I am trying to write an algorithm that tells me how many pairs I could generate with items coming from multiple set of values. For example I have the following sets:
{1,2,3} {4,5} {6}
From these sets I can generate 11 pairs:
{1,4}, {1,5}, {1,6}, {2,4}, {2,5}, {2,6}, {3,4}, {3,5}, {3,6}, {4,6}, {5,6}
I wrote the following algorithm:
int result=0;
for(int k=0;k<numberOfSets;k++){ //map is a list where I store all my sets
int size1 = map.get(k);
for(int l=k+1;l<numberOfSets;l++){
int size2 = map.get(l);
result += size1*size2;
}
}
But as you can see the algorithm is not very scalable. If the number of sets increases the algorithm starts performing very poorly.
Am I missing something?, Is there an algorithm that can help me with this ? I have been looking to combination and permutation algorithms but I am not very sure if thats the right path for this.
Thank you very much in advance

First at all, if the order in the pairs does matter, then starting with int l=k+1 in the inner cycle is erroneous. E.g. you are missing {4,1} if you consider it equal with {1,4}, then the result is correct, otherwise it isn't.
Second, to complicate the matter further, you don't say if the the pairs need to be unique or not. E.g. {1,2} , {2,3}, {4} will generate {2,4} twice - if you need to count it as unique, the result of your code is incorrect (and you will need to keep a Set<Pair<int,int>> to remove the duplicates and you will need to scan those sets and actually generate the pairs).
The good news: while you can't do better than O(N2) just for counting the pairs, even if you have thousands of sets, the millions of integral multiplication/additions are fast enough on nowaday computers - e.g Eigen deals quite well with O(N^3) operations for floating multiplications (see matrix multiplication operations).

Assuming you only care about the number of pairs, and are counting duplicates, then there is a more efficient algorithm:
We will keep track of the current number of sets, and the number of elements which we encountered so far.
Go over the list from the end to the start
For each new set, the number of new pairs we can make is the size of the set * the size of encountered elements. Add this to the current number of sets.
Add the size of the new set to the number of elements which we encountered so far.
The code:
int numberOfPairs=0;
int elementsEncountered=0;
for(int k = numberOfSets - 1 ; k >= 0 ; k--) {
int sizeOfCurrentSet = map.get(k);
int numberOfNewPairs = sizeOfCurrentSet * elementsEncountered;
numberOfPairs += numberOfNewPairs;
elementsEncountered += sizeOfCurrentSet;
}
The key point to relize is that when we count the number of new pairs that each set contributes, it doesn't matter from which set we select the second element of the pair. That is, we don't need to keep track of any set which we have already analyzed.

how to find the most frequent character in a big string using java?

i'm working on an assignment that i have to find the most four frequent characters in a string.
i write this so far.
import java.util.Scanner;
public class FindTheMostOccur
{
public static void main (String[] args)
{
String input;
String example = "how to find the most frequent character in a big string using java?";
String[] array = new String[example.length()];
for (int i = 97; i < 123; i++)
{
int mostFrequent =0;
for( int j = 0; j < example.length(); j++)
{
if(example.charAt(j) == i)
{
++mostFrequent;
}
}
System.out.println( (char)i + " is showing " + mostFrequent+ " times ");
}
}
}
Here is the output for this example.
a is showing 5 times
b is showing 1 times
c is showing 2 times
d is showing 1 times
e is showing 4 times
f is showing 2 times
g is showing 3 times
h is showing 3 times
i is showing 5 times
j is showing 1 times
k is showing 0 times
l is showing 0 times
m is showing 1 times
n is showing 5 times
o is showing 3 times
p is showing 0 times
q is showing 1 times
r is showing 4 times
s is showing 3 times
t is showing 6 times
u is showing 2 times
v is showing 1 times
w is showing 1 times
x is showing 0 times
y is showing 0 times
in this examle : t, a,i,n
I DON"T NEED SOMEONE TO COMPLETE THE PROGRAM FOR ME, however i need some ideas how to find the most four frequent character in this example.

The simplest algorithm I can think of off hand is that instead of doing multiple passes do a single pass.
Create a HashMap mapping from character to count.
Each time you find a character if it is not in the map, add it with value 1. If it is in the map increment the count.
At the end of the loop your HashMap now contains the count for every character in the String.
Take the EntrySet from the HashMap and dump it into a new ArrayList.
Sort the ArrayList with a custom comparator that uses entry.getValue() to compare by.
The first (or last depending on the sort direction) values in the ArrayList will be the ones you want.

How about this?
int count[] = new int[1000];// all with zero
Then for each character from the string, do count[]++ like this way
count['c']++;
count['A']++;
At the end, find out which index holds the maximum value. Then just print the ascii of that index.

Ok, here're some ideas:
To find the 4 most frequent characters, you must first know the frequency for all the characters. So, you would need to store the frequency somewhere. Currently, you are just printing the frequency of each character. You can't compare that way. So, you need a data structure.
Here we are talking about mapping from a chararacter to its count. Possibly you need a hash table. Look out for the hash table implementation in Java. You will find HashMap, LinkedHashMap, TreeMap.
Since you want the 4 most frequent characters, you need to order the characters by their frequency. Find out what kind of map stores the elements in sorted order. Note: You need to sort the map by values, not keys. And sort it in descending order, which is obvious.
Using Array:
There is another approach you can follow. You can create an array of size 26, assuming you only want to count frequency of alphabetic characters.
int[] frequency = new int[26];
Each index of that array correspond to the frequency of a single character. Iterate over the string, and increment the index corresponding to the current character by 1.
You can do character to index mapping like this:
int index = ch - 'a';
So, for character 'a', the index will be 0, for character 'b', index will be 1, so on. You might also want to take care of case sensitivity.
Problem with the array approach is, once you sort the array to get 4 most frequent character, you've lost the character to index mapping. So, you would need to have some way to sort the indices along with the frequency at those indices. Yes you're right. You need another array here.
But, will having 2 arrays make your problem easy? No. Because it's difficult to sort 2 arrays in synchronization. So what to do? You can create a class storing index and character. Maintain an array of that class reference, of size 26. Then find your way out to port the array approach to this array.

What data structures are you allowed to use?
I'm thinking that you can use a priority queue and increment the priority of the node containing the character that is being repeated. And when you are done, the first index would contain the node with the most repeated character

I got another idea that you can do it using an array only.
Well you have 26 letters in the Alphabet. The ASCII code of the lowercase letters range from 97 to 122(you can make every letter a lower case using .lowercase() or something similar).
So for each character, get its ASCII code, do the ASCII code -97 and increment it.
Simple,and only need an array.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Finding pairs of duplicates in a Java ArrayList - java

Sort the numbers first. Later, if there k copies of a given number, there will be k*(k-1)/2 pairs from that number. Now sum it over all the numbers.

Using Guava, if your elements were Strings: Multiset<String> multiset = HashMultiset.create(list); int pairs = 0; for(Multiset.Entry<String> entry : multiset.entrySet()) { pairs += IntMath.binomial(entry.getCount(), 2); } return pairs; That uses Guava's Multiset and math utilities.

Related

Create a List of random Integers using Stream and sum all elements except the smallest?

Eliminate duplicate value from a arraylist in java

Data structure for permutations in Java

Combination Algorithm from multiple sets

how to find the most frequent character in a big string using java?

Categories

Resources