Multithreaded search of single collection for duplicates

Multithreaded search of single collection for duplicates - java

use I can't divide into segads. As for my above example if 5 threads are set, then first segment would take 2 first object, and second 3th and 4th, so they dont find dups, but there are dups if we merge them, its 2th and 3th.
There could be more complex strate take from first threads .. ah nevermind, to hard to explain.
And ofcourse, problelection itself in my plans.
Tha
EDIT:
InChunk, and then continue analyzing that chunk till the end. ;/

I think the process of dividing up the items to be de-duped is going to have to look at the end of the section and move forward to encompass dups past it. For example, if you had:
1 1 2 . 2 4 4 . 5 5 6
And you dividing up into blocks of 3, then the dividing process would take 1 1 2 but see that there was another 2 so it would generate 1 1 2 2 as the first block. It would move forward 3 again and generate 4 4 5 but see that there were dups forward and generate 4 4 5 5. The 3rd thread would just have 6. It would become:
1 1 2 2 . 4 4 5 5 . 6
The size of the blocks are going to be inconsistent but as the number of items in the entire list gets large, these small changes are going to be insignificant. The last thread may have very little to do or be short changed altogether but again, as the number of elements gets large, this should not impact the performance of the algorithm.
I think this method would be better than somehow having one thread handle the overlapping blocks. With that method, if you had a lot of dups, you could see it having to handle a lot more than 2 contiguous blocks if you were unlucky in the positing of the dups. For example:
1 1 2 . 2 4 5 . 5 5 6
One thread would have to handle that entire list because of the 2s and the 5s.

I would use a chunk-based division, a task queue (e.g. ExecutorService) and private hash tables to collect duplicates.
Each thread in the pool will take chunks on demand from the queue and add 1 to the value corresponding to the key of the item in the private hash table. At the end they will merge with the global hash table.
At the end just parse the hash table and see which keys have a value greater than 1.
For example with a chunk size of 3 and the items:
1 2 2 2 3 4 5 5 6 6
Assume to have 2 threads in the pool. Thread 1 will take 1 2 2 and thread 2 will take 2 3 4. The private hash tables will look like:
1 1
2 2
3 0
4 0
5 0
6 0
and
1 0
2 1
3 1
4 1
5 0
6 0
Next, thread 1 will process 5 5 6 and thread 2 will process 6:
1 1
2 2
3 0
4 0
5 2
6 1
and
1 0
2 1
3 1
4 1
5 0
6 1
At the end, the duplicates are 2, 5 and 6:
1 1
2 3
3 1
4 1
5 2
6 2
This may take up some amount of space due to the private tables of each thread, but will allow the threads to operate in parallel until the merge phase at the end.

Related

smallest possible sum of the values of subarrays

I'm trying to understand question and solving it using java.
But first I'm not able to understand properly.
Here is the question:
You are given an array a of length n and an integer c.
The value of some array b of length k is the sum of its elements except for the smallest. For example, the value of the array [3, 1, 6, 5, 2] with c = 2 is 3 + 6 + 5 = 14.
Among all possible partitions of a into contiguous subarrays output the smallest possible sum of the values of these subarrays.
Input
The first line contains integers n and c (1 ≤ n, c ≤ 100 000).
The second line contains n integers ai (1 ≤ ai ≤ 109) — elements of a.
Output
Output a single integer — the smallest possible sum of values of these subarrays of some partition of a.
Examples
inputCopy
3 5
1 2 3
output
6
inputCopy
12 10
1 1 10 10 10 10 10 10 9 10 10 10
output
92
inputCopy
7 2
2 3 6 4 5 7 1
output
17
inputCopy
8 4
1 3 4 5 5 3 4 1
output
23
In the third example one of the optimal partitions is [2, 3], [6, 4, 5, 7], [1] with the values 3, 13 and 1 respectively.
My Understanding:
1) Partition is being being done within continuous numbers. Correct ?
2) What is the significance of Integer c in input ?
3) How is being done in third example ? I mean after having subarrays, How 13 came out from second subarray ?
Can anyone help me to understand the question ? I can write code myself.

How to print all possible sequences with intermixed sequence of push and pop operation

Suppose that an intermixed sequence of push and pop operations are performed on a LIFO stack.How to print all possible sequences? I can just judge it's about recursion. For example, if order 1 2 3 is given, output is
1 2 3
1 3 2
2 1 3
2 3 1
3 2 1

Use Google GUAVA's method https://google.github.io/guava/releases/19.0/api/docs/com/google/common/collect/Collections2.html#orderedPermutations(java.lang.Iterable) to get all possible permutations and then for each permutation reverse the order using https://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#reverse(java.util.List)

Execute condition based on which subset of 4 flags

I want to find efficient algorithm based on which subset it is. New condition is to be executed for each subset.
For eg: I have 4 flags ABCD and each subset will have seperate condition. What is the most efficient algorithm to solve the following condition. It can be made easily but I want to find the most efficient algorithm. Is there already an algorithm which solves this kind of problem?
A B C D
0 0 0 0 Subset 1 Execute Condition 1
0 0 0 1 Subset 2 Execute Condition 2
0 0 1 0 Subset 3 Execute Condition 3
0 0 1 1 Subset 4 Execute Condition 4
0 1 0 0 Subset 5 Execute Condition 5
0 1 0 1 Subset 6 Execute Condition 6
0 1 1 0 Subset 7 Execute Condition 7
0 1 1 1 Subset 8 Execute Condition 8
1 0 0 0 Subset 9 Execute Condition 9
1 0 0 1 Subset 10 Execute Condition 10
1 0 1 0 Subset 11 Execute Condition 11
1 0 1 1 Subset 12 Execute Condition 12
1 1 0 0 Subset 13 Execute Condition 13
1 1 0 1 Subset 14 Execute Condition 14
1 1 1 0 Subset 15 Execute Condition 15
1 1 1 1 Subset 16 Execute Condition 16

Bitmasking can be used to generate all subsets. There are four values. Therefore, you have 2^4 subsets. All you have to do is iterate this mask 2^4 times and mask it with each of the four values. In each iteration, the result of masking is a subset of the given values. Here's an idea:
allSubsets = {}
for mask in range(1<<4):
subsets = []
for i in range(0,3):
val = mask & (1<<i)
if(val)
subsets.append(a[i]) # Individual subset. Here assume array a has 4 values. Can be just 1s and 0s as in your case.
allSubsets[mask] = subset #keep appending each generated subset
return allSubsets # Do your operation by iterating on each of these subsets

Why does an index in Quick-Union Weighted remain size 1 when merged with a bigger tree?

I've been looking into algorithms using a class on coursera. In one of the first lectures, Quick Union Weighted is being discussed. I get what it does and I've tested it out using their code and written a small test for it.
Everything is clear but one point: when you create a union of two objects, it will add the object with the smallest tree to the bigger one. At the same time, the size of the larger tree will be incremented with the size of the smaller tree in a separate array which is used to determine what tree is bigger. Since the array is initiated with value 1 for every index (every node on its own basically is a tree of 1 object), why isn't the value of this index set to 0 instead of remaining on 1?
In order to illustrate this:
// Quick Union Weighted
ID: 0 1 2 3 4 5 6 7 8 9
SZ: 1 1 1 1 1 1 1 1 1 1
quw.union(2, 4);
ID: 0 1 2 3 2 5 6 7 8 9
SZ: 1 1 2 1 1 1 1 1 1 1
quw.union(5, 4);
ID: 0 1 2 3 2 2 6 7 8 9
SZ: 1 1 3 1 1 1 1 1 1 1
quw.union(2, 7);
ID: 0 1 2 3 2 2 6 2 8 9
SZ: 1 1 4 1 1 1 1 1 1 1
// Whereas I would've expected to end up with this
// to point out that the index is empty.
SZ: 1 1 4 1 0 0 1 0 1 1
Why are the sizes of merged indices 1 instead of 0?
You can find the code to test it out here. Note that the implementation is the same as the example provided by the lecturers, which is why I'm assuming my code is correct.

I think this is because the node itself is also size 1 and does not have any children. It can however have children. I'm actually not familiar with Quick-Union Weighted but if it's bit like the other union find algoritmes I've seen you can for example do
quw.union(0, 1);
ID: 0 0 2 3 2 2 6 2 8 9
SZ: 1 1 4 1 1 1 1 1 1 1
quw.union(0, 2);
ID: 2 2 2 3 2 2 6 2 8 9
SZ: 2 1 6 1 1 1 1 1 1 1
So now 0 en 1 have merged and the entire tree starting from 0 is merged with 2 again, still making the subtree starting at 0 size 2.
Like I said, I'm not sure it that's possible in Quick-Union Weighted but the reason for the '1' is still because it's also size 1 on its own.

Path finding in java / prolog using A*Star

I am supposed to start for example from point 1B to 5D. how am i supposed to reach? anyone can gv me a hint to get start on this?thanks. not asking for codes but the tips/ hints. thanks.
A B C D E
1 5 1 4 4 1
2 3 4 3 3 4
3 4 3 1 1 3
4 4 3 4 2 5
5 3 4 1 1 3

Start by defining data structures , and apply algorithm with it
You may refer to pseudo-code on wiki: Wiki A Star Search Algorithm

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Multithreaded search of single collection for duplicates - java

Related

smallest possible sum of the values of subarrays

How to print all possible sequences with intermixed sequence of push and pop operation

Execute condition based on which subset of 4 flags

Why does an index in Quick-Union Weighted remain size 1 when merged with a bigger tree?

Path finding in java / prolog using A*Star

Categories

Resources