Efficiently determine the parity of a permutation

Efficiently determine the parity of a permutation - java

I have an int[] array of length N containing the values 0, 1, 2, .... (N-1), i.e. it represents a permutation of integer indexes.
What's the most efficient way to determine if the permutation has odd or even parity?
(I'm particularly keen to avoid allocating objects for temporary working space if possible....)

I think you can do this in O(n) time and O(n) space by simply computing the cycle decomposition.
You can compute the cycle decomposition in O(n) by simply starting with the first element and following the path until you return to the start. This gives you the first cycle. Mark each node as visited as you follow the path.
Then repeat for the next unvisited node until all nodes are marked as visited.
The parity of a cycle of length k is (k-1)%2, so you can simply add up the parities of all the cycles you have discovered to find the parity of the overall permutation.
Saving space
One way of marking the nodes as visited would be to add N to each value in the array when it is visited. You would then be able to do a final tidying O(n) pass to turn all the numbers back to the original values.

I selected the answer by Peter de Rivaz as the correct answer as this was the algorithmic approach I ended up using.
However I used a couple of extra optimisations so I thought I would share them:
Examine the size of data first
If it is greater than 64, use a java.util.BitSet to store the visited elements
If it is less than or equal to 64, use a long with bitwise operations to store the visited elements. This makes it O(1) space for many applications that only use small permutations.
Actually return the swap count rather than the parity. This gives you the parity if you need it, but is potentially useful for other purposes, and is no more expensive to compute.
Code below:
public int swapCount() {
if (length()<=64) {
return swapCountSmall();
} else {
return swapCountLong();
}
}
private int swapCountLong() {
int n=length();
int swaps=0;
BitSet seen=new BitSet(n);
for (int i=0; i<n; i++) {
if (seen.get(i)) continue;
seen.set(i);
for(int j=data[i]; !seen.get(j); j=data[j]) {
seen.set(j);
swaps++;
}
}
return swaps;
}
private int swapCountSmall() {
int n=length();
int swaps=0;
long seen=0;
for (int i=0; i<n; i++) {
long mask=(1L<<i);
if ((seen&mask)!=0) continue;
seen|=mask;
for(int j=data[i]; (seen&(1L<<j))==0; j=data[j]) {
seen|=(1L<<j);
swaps++;
}
}
return swaps;
}

You want the parity of the number of inversions. You can do this in O(n * log n) time using merge sort, but either you lose the initial array, or you need extra memory on the order of O(n).
A simple algorithm that uses O(n) extra space and is O(n * log n):
inv = 0
mergesort A into a copy B
for i from 1 to length(A):
binary search for position j of A[i] in B
remove B[j] from B
inv = inv + (j - 1)
That said, I don't think it's possible to do it in sublinear memory. See also:
https://cs.stackexchange.com/questions/3200/counting-inversion-pairs
https://mathoverflow.net/questions/72669/finding-the-parity-of-a-permutation-in-little-space

Consider this approach...
From the permutation, get the inverse permutation, by swapping the rows and
sorting according to the top row order. This is O(nlogn)
Then, simulate performing the inverse permutation and count the swaps, for O(n). This should give the parity of the permutation, according to this
An even permutation can be obtained as the composition of an even
number and only an even number of exchanges (called transpositions) of
two elements, while an odd permutation be obtained by (only) an odd
number of transpositions.
from Wikipedia.
Here's some code I had lying around, which performs an inverse permutation, I just modified it a bit to count swaps, you can just remove all mention of a, p contains the inverse permutation.
size_t
permute_inverse (std::vector<int> &a, std::vector<size_t> &p) {
size_t cnt = 0
for (size_t i = 0; i < a.size(); ++i) {
while (i != p[i]) {
++cnt;
std::swap (a[i], a[p[i]]);
std::swap (p[i], p[p[i]]);
}
}
return cnt;
}

Related

Time complexity for all subsets using backtracking

I am trying to understand the time complexity while using backtracking. The problem is
Given a set of unique integers, return all possible subsets.
Eg. Input [1,2,3] would return [[],[1],[2],[1,2],[3],[1,3],[2,3],[1,2,3]]
I am solving it using backtracking as this:
private List<List<Integer>> result = new ArrayList<>();
public List<List<Integer>> getSubsets(int[] nums) {
for (int length = 1; length <= nums.length; length++) { //O(n)
backtrack(nums, 0, new ArrayList<>(), length);
}
result.add(new ArrayList<>());
return result;
}
private void backtrack(int[] nums, int index, List<Integer> listSoFar, int length) {
if (length == 0) {
result.add(listSoFar);
return;
}
for (int i = index; i < nums.length; i++) { // O(n)
List<Integer> temp = new ArrayList<>();
temp.addAll(listSoFar); // O(2^n)
temp.add(nums[i]);
backtrack(nums, i + 1, temp, length - 1);
}
}
The code works fine, but I am having trouble understanding the time/space complexity.
What I am thinking is here the recursive method is called n times. In each call, it generates the sublist that may contain max 2^n elements. So time and space, both will be O(n x 2^n), is that right?
Is that right? If not, can any one elaborate?
Note that I saw some answers here, like this but unable to understand. When recursion comes into the picture, I am finding it a bit hard to wrap my head around it.

You're exactly right about space complexity. The total space of the final output is O(n*2^n), and this dominates the total space used by the program. The analysis of the time complexity is slightly off though. Optimally, time complexity would, in this case, be the same as the space complexity, but there are a couple inefficiencies here (one of which is that you're not actually backtracking) such that the time complexity is actually O(n^2*2^n) at best.
It can definitely be useful to analyze a recursive algorithm's time complexity in terms of how many times the recursive method is called times how much work each call does. But be careful about saying backtrack is only called n times: it is called n times at the top level, but this is ignoring all the subsequent recursive calls. Also every call at the top level, backtrack(nums, 0, new ArrayList<>(), length); is responsible for generating all subsets sized length, of which there are n Choose length. That is, no single top-level call will ever produce 2^n subsets; it's instead that the sum of n Choose length for lengths from 0 to n is 2^n:
Knowing that across all recursive calls, you generate 2^n subsets, you might then want to ask how much work is done in generating each subset in order to determine the overall complexity. Optimally, this would be O(n), because each subset varies in length from 0 to n, with the average length being n/2, so the overall algorithm might be O(n/2*2^n) = O(n*2^n), but you can't just assume the subsets are generated optimally and that no significant extra work is done.
In your case, you're building subsets through the listSoFar variable until it reaches the appropriate length, at which point it is appended to the result. However, listSoFar gets copied to a temp list in O(n) time for each of its O(n) characters, so the complexity of generating each subset is O(n^2), which brings the overall complexity to O(n^2*2^n). Also, some listSoFar subsets are created which never figure into the final output (you never check to see that there are enough numbers remaining in nums to fill listSoFar out to the desired length before recursing), so you end up doing unnecessary work in building subsets and making recursive calls which will never reach the base case to get appended to result, which might also worsen the asymptotic complexity. You can address the first of these inefficiencies with back-tracking, and the second with a simple break statement. I wrote these changes into a JavaScript program, leaving most of the logic the same but re-naming/re-organizing a little bit:
function getSubsets(nums) {
let subsets = [];
for (let length = 0; length <= nums.length; length++) {
// refactored "backtrack" function:
genSubsetsByLength(length); // O(length*(n Choose length))
}
return subsets;
function genSubsetsByLength(length, i=0, partialSubset=[]) {
if (length === 0) {
subsets.push(partialSubset.slice()); // O(n): copy partial and push to result
return;
}
while (i < nums.length) {
if (nums.length - i < length) break; // don't build partial results that can't finish
partialSubset.push(nums[i]); // O(1)
genSubsetsByLength(length - 1, ++i, partialSubset);
partialSubset.pop(); // O(1): this is the back-tracking part
}
}
}
for (let subset of getSubsets([1, 2, 3])) console.log(`[`, ...subset, ']');
The key difference is using back-tracking to avoid making copies of the partial subset every time you add a new element to it, such that each is built in O(length) = O(n) time rather than O(n^2) time, because there is now only O(1) work done per element added. Popping off the last character added to the partial result after each recursive call allows you to re-use the same array across recursive calls, thus avoiding the O(n) overhead of making temp copies for each call. This, along with the fact that only subsets which appear in the final output are built, allows you to analyze the total time complexity in terms of the total number of elements across all subsets in the output: O(n*2^n).

Your code works not efficiently.
Like first solution in the link, you only think about the number will be included or not. (like getting combination)
It means, you don't have to iterate in getSubsets and backtrack function.
"backtrack" function could iterate "nums" array with parameter
private List<List<Integer>> result = new ArrayList<>();
public List<List<Integer>> getSubsets(int[] nums) {
backtrack(nums, 0, new ArrayList<>(), new ArrayList<>());
return result;
}
private void backtrack(int[] nums, int index, List<Integer> listSoFar)
// This function time complexity 2^N, because will search all cases when the number included or not
{
if (index == nums.length) {
result.add(listSoFar);
return;
}
// exclude num[index] in the subset
backtrack(nums, index+1, listSoFar)
// include num[index] in the subset
backtrack(nums, index+1, listSoFar.add(nums[index]))
}

What counts as a comparison in algorithm analysis?

MAIN QUESTION: When keeping track of comparisons, what actually counts as a comparison? Should I only count comparisons between array items since that's what the algorithm is meant for or is it more widely accepted to count every single comparison?
Currently, I am trying to wrap my head around the fact that I'm told that the theoretical number of comparisons for the worst case bubble sort algorithm is as follows:
Amount of comparisons:
(N-1) + (N-2) + (N-3) + ... + 2 + 1 = (N*(N-1))/2 = (N^2-N)/2 < N^2
So according to the formula (N^2-N)/2, with an input size (N) of 10, I would get a total of 45 comparisons. However, it is mentioned that this calculation only applies to the comparison operation in the inner loop of this pseudo code:
for i:=1 to N-1 do
{
for j:=0 to N-i do
{
if A[j] > A[j+1] // This is the comparison that's counted.
{
temp := A[j]
A[j] := A[j+1]
A[j+1] := temp
}
}
}
Now in Java, my code looks like this:
public int[] bubble(int[] array)
{
int comparisons = 0;
int exchanges = 0;
int temp;
int numberOfItems = array.length;
boolean cont = true;
comparisons++; // When pass == numberOfItems, a comparison will be made by the for loop that wouldn't otherwise be counted.
for (int pass=1; pass != numberOfItems; pass++)
{
comparisons = comparisons + 2; // Counts both the outer for loop comparison and the if statement comparison.
if (cont) // If any exchanges have taken place, cont will be true.
{
cont = false;
comparisons++; // Counts the inner for loop comparison
for (int index = 0; index != (numberOfItems - pass); index++)
{
comparisons++; // Counts the if statement comparison.
if (array[index] > array[index+1])
{
temp = array[index];
array[index] = array[index+1];
array[index+1] = temp;
cont = true;
exchanges++;
} // end inner if
} // end inner for
}
else
{
break; // end outer if
}
}
System.out.println("Comparisons = " + comparisons + "\tExchanges = " + exchanges);
return array;
}
After performing the worst case scenario on my code (using an array with 10 elements that are in the reverse order), I have gotten a total of 73 comparisons. This seems like a crazy high overshoot of the theoretical result which was 45 comparisons. This feels right to me though since I've accounted for all for loops and if statements.
Any help is greatly appreciated!
EDIT: I have noticed an error in my total comparison count for my inner loop. I wound up counting the inner loop twice before, but now it is fixed. Instead of getting 118 comparisons, I now get 73. However, the question still stands.

When measuring the number of comparisons in a sort, you only count comparisons between the array items. You count them whether or not they're actually in the array when you compare them.
The idea is that, instead of simple integers, the array might contain things that take a long time to compare. An array of of strings, for example, can be bubble-sorted using N(N-1)/2 string comparions, even though a single string comparison might require many other operations, including many comparisons of individual characters.
Measuring the performance of a sorting algorithm in terms of the number of comparisons makes the measurement independent of the type of things being sorted.

In evaluating sorting algorithms, it is common to count all comparisons between array elements as having equivalent cost, while ignoring comparisons between things like array indices. The basic concept is that in order for sorting operations to remain distinctly different from radix partitioning, the size of the items being sorted would need to increase as the number of them increased. Suppose, for example, one had an array of 1,000,000,000 char values and wanted to sort them. While one could use Quicksort, bubble sort, or something else, a faster approach would simply be to use an int[65536] and count how many of each value there are. Even if one needed to sort items which had char keys, the best way to do that would be to determine where to place the last item with a key of 0 (the number of items with a key of zero, minus one), where to place the last item with a key of 1 (number of items with keys of 0 or 1, minus one), etc. All such operations would take time proportional to the number of items plus the number of possible key values, without any lg(N) factor.
Note that if one ignores "bookkeeping" costs, algorithms like Quicksort aren't quite optimal. A sorting algorithm which is designed to maximize the amount of information gained from each comparison may do slightly fewer comparisons. Unless comparisons are very expensive, however, such a sorting algorithm would likely waste more time being "smart" than it would have spent being "stupid".
One issue I haven't seen discussed much, though I would think it could offer significant benefit in many real-world cases, would be optimizing sequences of comparisons between items that are known to be in a narrow range. If while performing a Quicksort on a series of thousand-character path names, one is processing a partition whose entries are all known to between two names that share the first 950 characters, there would be no need to examine the first 950 characters of any names in that partition. Such optimizations would not likely be meaningful in big-O terms unless key length was a parameter, but in the real world I would expect it could sometimes have an order-of-magnitude impact.

the comparison variable should only be incremented after the if statement has been reached in the execution of the code. The if statement is only reached if the condition stated in the outer and inner for loop have been met therefore the code should be like this.
Also dont forget to change the condition in the for loops from using != to <= The new java code:
public int[] bubble(int[] array)
{
int comparisons = 0;
int exchanges = 0;
int temp;
int numberOfItems = array.length;
boolean cont = true;
for (int pass=1; pass <= numberOfItems; pass++)
{
if (cont) // If any exchanges have taken place, cont will be true.
{
cont = false;
for (int index = 0; index <= (numberOfItems - pass); index++)
{
if (array[index] > array[index+1])
{ comparison++;
temp = array[index];
array[index] = array[index+1];
array[index+1] = temp;
cont = true;
exchanges++;
} // end inner if
} // end inner for
}
}
comparison++; // here you increment by one because you must also count the comparison that failed
System.out.println("Comparisons = " + comparisons + "\tExchanges = " + exchanges);
return array;
}

Finding mean and median in constant time

This is a common interview question.
You have a stream of numbers coming in (let's say more than a million). The numbers are between [0-999]).
Implement a class which supports three methods in O(1)
* insert(int i);
* getMean();
* getMedian();
This is my code.
public class FindAverage {
private int[] store;
private long size;
private long total;
private int highestIndex;
private int lowestIndex;
public FindAverage() {
store = new int[1000];
size = 0;
total = 0;
highestIndex = Integer.MIN_VALUE;
lowestIndex = Integer.MAX_VALUE;
}
public void insert(int item) throws OutOfRangeException {
if(item < 0 || item > 999){
throw new OutOfRangeException();
}
store[item] ++;
size ++;
total += item;
highestIndex = Integer.max(highestIndex, item);
lowestIndex = Integer.min(lowestIndex, item);
}
public float getMean(){
return (float)total/size;
}
public float getMedian(){
}
}
I can't seem to think of a way to get the median in O(1) time.
Any help appreciated.

You have already done all the heavy lifting, by building the store counters. Together with the size value, it's easy enough.
You simply start iterating the store, summing up the counts until you reach half of size. That is your median value, if size is odd. For even size, you'll grab the two surrounding values and get their average.
Performance is O(1000/2) on average, which means O(1), since it doesn't depend on n, i.e. performance is unchanged even if n reaches into the billions.
Remember, O(1) doesn't mean instant, or even fast. As Wikipedia says it:
An algorithm is said to be constant time (also written as O(1) time) if the value of T(n) is bounded by a value that does not depend on the size of the input.
In your case, that bound is 1000.

The possible values that you can read are quite limited - just 1000. So you can think of implementing something like a counting sort - each time a number is input you increase the counter for that value.
To implement the median in constant time, you will need two numbers - the median index(i.e. the value of the median) and the number of values you've read and that are on the left(or right) of the median. I will just stop here hoping you will be able to figure out how to continue on your own.
EDIT(as pointed out in the comments): you already have the array with the sorted elements(stored) and you know the number of elements to the left of the median(size/2). You only need to glue the logic together. I would like to point out that if you use linear additional memory you won't need to iterate over the whole array on each insert.

For the general case, where range of elements is unlimited, such data structure does not exist based on any comparisons based algorithm, as it will allow O(n) sorting.
Proof: Assume such DS exist, let it be D.
Let A be input array for sorting. (Assume A.size() is even for simplicity, that can be relaxed pretty easily by adding a garbage element and discarding it later).
sort(A):
ds = new D()
for each x in A:
ds.add(x)
m1 = min(A) - 1
m2 = max(A) + 1
for (i=0; i < A.size(); i++):
ds.add(m1)
# at this point, ds.median() is smallest element in A
for (i = 0; i < A.size(); i++):
yield ds.median()
# Each two insertions advances median by 1
ds.add(m2)
ds.add(m2)
Claim 1: This algorithm runs in O(n).
Proof: Since we have constant operations of add() and median(), each of them is O(1) per iteration, and the number of iterations is linear - the complexity is linear.
Claim 2: The output is sorted(A).
Proof (guidelines): After inserting n times m1, the median is the smallest element in A. Each two insertions after it advances the median by one item, and since the advance is sorted, the total output is sorted.
Since the above algorithm sorts in O(n), and not possible under comparisons model, such DS does not exist.
QED.

I believe this algorithm is O(N). Am I correct?

This algorithm reverses an array of N integers. I believe this algorithm is O(N) because for each loop iteration, the four lines of code are executed once thus completing the job in 4N time.
public static void reverseTheNumbers(int[] list) {
for (int i = 0; i < list.length / 2; i++) {
int j = list.length - 1 - i;
int temp = list[i];
list[i] = list[j];
list[j] = temp;
}
}

There isn't such a thing as 4N time. The algorithm is linear because as you increase the size of the input the runtime of the algorithm increases proportionally. In other words if you doubled the size of list you would expect the algorithm to take twice as long.
It doesn't matter how many operations you do inside your loop - as long as they are each constant time (relative to the input) the runtime of the loop is determined simply by the number of iterations.
Put another way, these four statements are - all together - an O(1) operation.
int j = list.length - 1 - i;
int temp = list[i];
list[i] = list[j];
list[j] = temp;
There's nothing significant about the fact that this sequence of steps is expressed in four statements in Java syntax - experimenting with javap suggests these four lines compiles into ~20 bytecode commands, and who knows how many processor instructions that bytecode gets converted into. The good news is Big-O notation works the same regardless of the particular syntax - a sequence of operations is O(1) or constant time if its execution time is the same regardless of the input.
Therefore you're doing an O(1) operation N times; aka O(N).

Yes, you are correct. The number of operations is linearly dependent on the size of the array (N), making it an O(N) algorithm.

Yes, the complexity of the algorithm is O(n).
However, the exact "time" (because there are no constant factors in asymptotic complexity, check comment below) is not 4 times the size of the array, we could say it is 1/2*(c1+c2+c3+c3) times the size of the array, where 1/2 corresponds to each loop iteration and each c corresponds to the time needed for each operation inside theloop.
It would be 4 times the size of the array, if the algorithm was iterating the whole array 4 times.

Checking whether there is a subset of size k of an array which has a sum multiple of n

Good evening, I have an array in java with n integer numbers. I want to check if there is a subset of size k of the entries that satisfies the condition:
The sum of those k entries is a multiple of m.
How may I do this as efficiently as possible? There are n!/k!(n-k)! subsets that I need to check.

You can use dynamic programming. The state is (prefix length, sum modulo m, number of elements in a subset). Transitions are obvious: we either add one more number(increasing the number of elements in a subset and computing new sum modulo m), or we just increase prefix lenght(not adding the current number). If you just need a yes/no answer, you can store only the last layer of values and apply bit optimizations to compute transitions faster. The time complexity is O(n * m * k), or about n * m * k / 64 operations with bit optimizations. The space complexity is O(m * k). It looks feasible for a few thousands of elements. By bit optimizations I mean using things like bitset in C++ that can perform an operation on a group of bits at the same time using bitwise operations.

I don't like this solution, but it may work for your needs
public boolean containsSubset( int[] a , int currentIndex, int currentSum, int depth, int divsor, int maxDepth){
//you could make a, maxDepth, and divisor static as well
//If maxDepthis equal to depth, then our subset has k elements, in addition the sum of
//elements must be divisible by out divsor, m
//If this condition is satisafied, then there exists a subset of size k whose sum is divisible by m
if(depth==maxDepth&&currentSum%divsor==0)
return true;
//If the depth is greater than or equal maxDepth, our subset has more than k elements, thus
//adding more elements can not satisfy the necessary conditions
//additionally we know that if it contains k elements and is divisible by m, it would've satisafied the above condition.
if(depth>=maxdepth)
return false;
//boolean to be returned, initialized to false because we have not found any sets yet
boolean ret = false;
//iterate through all remaining elements of our array
for (int i = currentIndex+1; i < a.length; i++){
//this may be an optimization or this line
//for (int i = currentIndex+1; i < a.length-maxDepth+depth; i++){
//by recursing, we add a[i] to our set we then use an or operation on all our subsets that could
//be constructed from the numbers we have so far so that if any of them satisfy our condition (return true)
//then the value of the variable ret will be true
ret |= containsSubset(a,i,currentSum+a[i],depth+1,divisor, maxDepth);
} //end for
//return the variable storing whether any sets of numbers that could be constructed from the numbers so far.
return ret;
}
Then invoke this method as such
//this invokes our method with "no numbers added to our subset so far" so it will try adding
// all combinations of other elements to determine if the condition is satisfied.
boolean answer = containsSubset(myArray,-1,0,0,m,k);
EDIT:
You could probably optimize this by taking everything modulo (%) m and deleting repeats. For examples with large values of n and/or k, but small values of m, this could be a pretty big optimization.
EDIT 2:
The above optimization I listed isn't helpful. You may need the repeats to get the correct information. My bad.
Happy Coding! Let me know if you have any questions!

If numbers have lower and upper bounds, it might be better to:
Iterate all multiples of n where lower_bound * k < multiple < upper_bound * k
Check if there is a subset with sum multiple in the array (see Subset Sum problem) using dynamic programming.
Complexity is O(k^2 * (lower_bound + upper_bound)^2). This approach can be optimized further, I believe with careful thinking.
Otherwise you can find all subsets of size k. Complexity is O(n!). Using backtracking (pseudocode-ish):
function find_subsets(array, k, index, current_subset):
if current_subset.size = k:
add current_subset to your solutions list
return
if index = array.size:
return
number := array[index]
add number to current_subset
find_subsets(array, k, index + 1, current_subset)
remove number from current_subset
find_subsets(array, k, index + 1, current_subset)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.