I'm working on my homework and there's a question that ask us to sort a struct array
The structcitizen consist of an int id and a boolean gender, where id is randomly generated between 1 to 100,
and gender is determined by if id is odd or even, odd=true(male) and even=false(female)
for example a = {33, true}
The question requires me to sort the citizen[] array by gender, it seems very easy but it has the following requirements:
run in linear times O(N)
no new array
only constant extra space can be used
I am thinking about using counting sort but it seems a little bit hard to do it without a new array, is there any suggestion?
Since this is a homework question I'm not going to provide code. The following should be sufficient to get you started.
"Sorting" by gender here really means partitioning into two groups. A general purpose sort cannot be better than O(n*log(n)), but partitioning can be done in O(n) with constant space.
Consider iterating from both ends simultaneously (while loop, two index pointers initialized to first and last elements) looking for elements that are in the "wrong" section. When you find one such element at each end, swap them. Note that the pointers move independently of each other, only when skipping over elements that are already in the right section, and of course immediately after a swap, which is a subcase of "elements already in the right section".
Quit when the index pointers meet somewhere in the middle.
This is not a general purpose sort. You cannot do this for the case where the number of keys is unknown.
Since you only have two values to sort, you could use a kind of swap-counting-sort (I couldn't find any relevant paper on that one).
There is room for optimisation on that sort, but that will be your job.
Here is a pseudo-code of that special sort according to your issue :
integer maleIndex = 0 // Current position of males in the array
for i=0 until array.size do
if array.at(i) is a male then
// after a while, all female will end up at the end
// while all male will end up at the beginning
swap(array.at(maleIndex), array.at(i))
maleIndex = maleIndex + 1
end
end
One approach is similar to the partition stage in QuickSort, or to the median/rank finding algorithm QuickSelect.
I'm going to describe the outline of the algorithms, but not provide any code. Hopefully, it will be good enough that it is easy to make the translation.
You basically want to reorganize the array so that one gender is at the start, and the other is at the end of the array. You'll have the array partitioned in three:
From 0 to i-1 you have the first gender (male or female, up to you)
From i to j-1 you have both male/female. This is the unknown area.
From j to n-1 you have the second gender.
At the start of the algorithm i is set to 0, so the first area is empty, and j is set to n-1, so the second area is empty. Basically the whole array is in the unknown state.
Then, you iterate over the array in a particular way. At each step, you look at citizen[i].gender. If it is the first gender, you leave it alone and increment i. If it is the second gender, you swap A[i] with A[j] and decrement j. You stop when i is equal to j.
Why is this correct? Well, at each step we can see that the constraint of having the three areas is maintained, assuming it held to begin with (which it does), and either the first or the second one increases. At the end, the second area has no elements, so we're only left with the first gender at the start, and the second at the end.
Why is it linear? Well, at each step we make a constant-time decision for one element in the array about where it should belong. There's n such elements, so the time is linear in that. Alternatively, the iteration test can be expressed as while (j - i > 0), and the expression j - i starts at n-1 and drops by 1 for each iteration.
Related
I am looking at the LeetCode problem 2134. Minimum Swaps to Group All 1's Together II:
A swap is defined as taking two distinct positions in an array and swapping the values in them.
A circular array is defined as an array where we consider the first element and the last element to be adjacent.
Given a binary circular array nums, return the minimum number of swaps required to group all 1's present in the array together at any location.
I am trying to study how other people came up with solutions of their own. I came across this particular one, but I don't understand the logic:
class Solution {
public int minSwaps(int[] nums) {
// number of ones
int cntones=Arrays.stream(nums).sum();
// worst case answer
int rslt=nums.length;
// position lft and figure better value for min/rslt
int holes = 0;
for(int i=0;i<cntones;i++) {
if(nums[i]==0)
holes++;
}
// better value for rslt from lft to rgt
// up to index of cntones.
rslt = Math.min(rslt, holes);
// they have a test case with one element
// and that trips up if you dont do modulo
int rgt=cntones % nums.length;
for(int lft=0;lft<nums.length;lft++) {
rslt=Math.min(rslt,holes);
if(nums[lft]!=nums[rgt])
if(nums[rgt]==1)
holes--;
else
holes++;
rgt=(rgt+1)%nums.length;
}
return rslt;
}
}
Why is the worst case, the length of the input array?
I'm thinking wait, wouldn't the worst case be something like [0,1,0,1,0,1...] where 0's and 1's are alternating? Can you give me an example?
I suppose #of holes can potentially be a possible solution in some cases, from counting 0's in a fixed length (the number of total 1's) of a window but because I do not understand the worst case, rslt from question #1, below line stumps me as well.
// better value for rslt from lft to rgt
// up to index of cntones.
rslt = Math.min(rslt, holes);
About the modulo below, I don't think cntones can ever be bigger than nums.length, in turn which will result in 0 all the time? I'm thinking for the case with one element, you'd have to check whether that one element is 0 or 1. How does below line cover that edge case?
// they have a test case with one element
// and that trips up if you dont do modulo
int rgt=cntones % nums.length;
Due to #1~#3 the last for loop makes no sense to me...
Why is the worst case, the length of the input array?
First note that a swap is only useful when it swaps a 0 with 1. Secondly, it makes no sense to swap the same digit a second time, as the result of such double swap could have been achieved with a single swap. So we can say that an upper limit for the number of swaps is the number of 0-digits or number of 1-digits (which ever is the least). In fact, this is an overestimation, because at least one 1-digit should be able to stay unmoved. But let's ignore that for now. To reach that worst case, there should be as many 1 as 0 digits, so then we have half of the length as worst case. Of course, by initialising with a value that is greater than that (like the length) we do no harm.
The example of alternating digits would be resolved by keeping half of those 1-digits unmoved, and moving the remaining 1-digits in the holes between them. So that means we have a number of swaps that is equal to about one fourth of the length of the array.
below line stumps me as well.
rslt = Math.min(rslt, holes);
As you said, there is a window moving over the circular array, which represents the final situation where all 1-digits should end up. So it sets the target to work towards. Obviously, the 1-digits that are already within that window don't need to be swapped. Each 0-digit inside that window has to be swapped with a 1-digit that is currently outside that window. Doing that will reach the target, and so the number of swaps for reaching that particular target window is equal to the number of holes (0-digits) inside that window.
As that exercise is done for each possible window, we are interested to find the best position of the window, i.e. the one where the number of holes (swaps) is minimised. That is what this line of code is doing. rslt is the minimum "so far" and holes is the fresh value we have for the current window. If that is less, then rslt should be updated to it. That's what happens in this statement.
About the modulo below, I don't think cntones can ever be bigger than nums.length, in turn which will result in 0 all the time? I'm thinking for the case with one element, you'd have to check whether that one element is 0 or 1. How does below line cover that edge case?
int rgt=cntones % nums.length;
That modulo only serves for the case that cntones is equal to nums.length. You are right that it will never exceed it. But the case where it is equal is possible (when the input only has 1-digits). And as rgt is going to be used as an index, it should not be equal to nums.length as that is an undefined slot in the array.
Due to #1~#3 the last for loop makes no sense to me...
It should be clear from the above details. That loop moves the window with one step at a time, and keeps the variable holes updated incrementally. Of course, we could have decided to count the number of holes from scratch in each window, but that would be a waste of time. As we go from one window to the next, we only lose one digit on the left and gain one on the right, so we can just update holes with that information and know how many holes there are in the current window -- the one that starts at lft and runs (circular) to rgt. In case the digit that we lose at the left is the same as the one we gain at the right, we obviously didn't change the number of holes. Where they are different, we either win or lose one hole in comparison with the previous window.
Let us consider we have objects in a list as
listOfObjects = [a,b,ob,ob,c,ob,c,ob,c,ob,ob,c,ob]
we have to group them as
[ob,ob,c,ob,c,ob] from index 2 to 7
[ob,ob,c,ob] from index 9 to 12
i.e the group starts if we have two ob's together, as in index 2 and 7, and ends before the 'c' having two ob's following, as in index 8 having 'c' which is followed by two 'ob's or if the list ends.
So what will be the best algorithm to get the above(in java)?
I assume "best algorithm" according to you is that which is optimal in terms of time complexity.
You can do this task by simple one traversal with keeping track of next 3 elements (of course taking care that you don't go out of list size) and ending the group by checking the strategy you said. If there are no 3 elements next the current element, you simply end your group (as you specified in your strategy)
So the time complexity of this algorithm will be O(n). It will not be possible to get better than this.
I think Stack is a suitable data structure.
it'll be all right once you put 'ob' in the stack.
also you need 'count' variable.
So I was thinking of a new sorting algorithm that might be efficient but I am not too sure about that.
1) Imagine we have an array a of only positive numbers.
2) We go through the array and find the biggest number n.
3) We create a new array b of the size n+1.
4) We go through every entry in the unsorted array and increase the value in the second array at the index of the number of the unsorted array we are looking at by one. (In pseudo-code this means: b[a[i]]++; while a[i] is the number we are currently looking at)
5) Once we have done this with every element in a, the array b stores at every index the exact amount of numbers of this index. (For example: b[0] = 3 means that we had 3 zeros in the initial array a)
6) We go through the whole array b and skip all the empty fields and create a new List or Array out of it.
So I can imagine that this algorithm can be very fast and efficient for smaller numbers only since at the end we have to go through the whole array b to build the sorted one, which is going to be really time consuming.
If we for example have an array a = {1000, 1} it would still check 1001 elements in array b wether or not they are 0 even though we only have 2 elements in the initial array.
With smaller numbers however we should almost get a O(n) result? I am not too sure about that and that's why I am asking you. Maybe I am even missing something really important. Thanks for your help in advance :)
Congratulations on independently re-discovering the counting sort.
This is indeed a very good sorting strategy for situations when the range is limited, and the number of items is significantly greater than the number of items in your array.
In situations when the range is greater than the number of items in the array a traditional sorting algorithm would give you better performance.
Algorithms of this kind are called pseudo-polynomial.
we should almost get a O(n) result
You get O(N+M) result, when M - max number in first array. Plus, you spend O(M) memory, so it have sense only if M is small. See counting sort
So I am currently learning Java and I was asking myself, why the Insertion-Sort method doesn´t have the need to use the swap operation? As Far as I understood, elements get swapped so wouldn´t it be usefull to use the swap operation in this sorting algorithm?
As I said, I am new to this but I try to understand the background of these algorithms , why they are the way they actually are
Would be happy for some insights :)
B.
Wikipedia's article for Insertion sort states
Each iteration, insertion sort removes one element from the input
data, finds the location it belongs within the sorted list, and
inserts it there. It repeats until no input elements remain. [...] If
smaller, it finds the correct position within the sorted list, shifts
all the larger values up to make a space, and inserts into that
correct position.
You can consider this shift as an extreme swap. What actually happens is the value is stored in a placeholder and checked versus the other values. If those values are smaller, they are simply shifted, ie. replace the previous (or next) position in the list/array. The placeholder's value is then put in the position from which the element was shifted.
Insertion Sort does not perform swapping. It performs insertions by shifting elements in a sequential list to make room for the element that is being inserted.
That is why it is an O(N^2) algorithm: for each element out of N, there can be O(N) shifts.
So, you Could do insertion sort by swapping.
But, is that the best way to do it? you should think of what a swap is...
temp = a
a=b
b=temp
there are 3 assignments that take place for a single swap.
eg. [2,3,1]
If the above list is to be sorted, you could 1. swap 3 and 1 then, 2. swap 1 and 2
total 6 assignments
Now,
Instead of swapping, if you just shift 2 and 3 one place to the right ( 1 assignment each) and then put 1 in array[0], you would end up with just 3 assignments instead of the 6 you would do with swapping.
Hi this a java exercise on hashing. First we have an array of N strings (1<=N<=100000), the program will find the minimum length of the consecutive subseries which contains all distinct strings which present in the original array.
For example, original array is {apple,orange,orange pear,pear apple,pear}
the consecutive subarrays can be {orange, pear, pear, apple}
so answer is 19
I've written a code which visit every element in the array and create a new hash table to find the length of the subarray which contain all the distinct strings. It becomes very very slow once N is larger than 1000. So I hope there is a faster algorithm. Thank you!
Pass through the array once, using a hash to keep track of whether you've seen a word before or not. Count the distinct words in the array by adding to your count only when you're seeing a word for the first time.
Pass through the array a second time, using a hash to keep track of the number of times you've seen each word. Also keep track of the sum of the lengths of all the words you've seen. Keep going until you have seen all words at least once.
Now move the start of the range forward as long as you can do so without reducing a word's count to zero. Remember to adjust your hash and letter count accordingly. This gives you the first range which includes every word at least once, and can't be reduced without excluding a word.
Repeatedly do the following: Move the left end of your range forward by one, and then move the right end forward until you find another instance of the word that you just booted from the left end. Each time you do this, you have another minimal range that includes each word once.
While doing steps 3 and 4, keep track of the minimum length so far, and the start and end of the associated range. You're done when you need to move the right end of your range past the end of the array. At this point you have the right minimum length, and the range that achieves it.
This runs in linear time.