Understanding Radix Sort Algorithm [duplicate] - java

This question already has answers here:
Radix Sort Algorithm
(4 answers)
Closed 8 years ago.
I am trying to understand how this radix sort algorithm works. I am new to algorithms and bits so this isn't coming easy to me. So far i have added these comments to my code to try and make it easier to understand. I am not sure if I have grasped the concept correctly so if anyone can see any problems with my comments/something I am not understanding correctly please help me out :)
Also would anyone be able to explain this line of code to me: mask = 1 << bit;
My commented code:
public static ArrayList<Integer> RadixSort(ArrayList<Integer> a)
//This method implements the radix sort algorithm, taking an integer array as an input
{
ArrayList<Integer> array = CopyArray(a);
//Created a new integer array called 'array' and set it to equal the array inputed to the method
//This was done by copying the array entered to the method through the CopyArray method, then setting the results of the method to the new empty array
Integer[] zerobucket = new Integer[a.size()];
Integer[] onebucket = new Integer[a.size()];
//Created two more integer arrays to act as buckets for the binary values
//'zerobucket' will hold array elements where the ith bit is equal to 0
//'onebucket' will hold array elements where the ith bit is equal to 1
int i, bit;
//Created two integer variables i & bit, these will be used within the for loops below
//Both i & bit will be incremented to run the radix sort for every bit of the binary value, for every element in the array
Integer element, mask;
//Created an integer object called element, this will be used to retrieve the ith element of the unsorted array
//Created an integer object called mask, this will be used to compare the bit values of each element
for(bit=0; bit<8; ++bit)
//Created a for loop to run for every bit of the binary value e.g.01000000
//Change from 8 to 32 for whole integers - will run 4 times slower
{
int zcount = 0;
int ocount = 0;
//Created two integer variables to allow the 'zerobucket' and 'onebucket' arrays to be increment within the for loop below
for(i=0; i<array.size(); ++i)
//Created a nested for loop to run for every element of the unsorted array
//This allows every bit for every binary value in the array
{
element = array.get(i);
//Set the variable 'element' to equal the ith element in the array
mask = 1 << bit;
if ((element & mask) == 0)
//If the selected bit of the binary value is equal to 0, run this code
{
zerobucket[zcount++] = array.get(i);
//Set the next element of the 'zerobucket' array to equal the ith element of the unsorted array
}
else
//Else if the selected but of the binary value is not equal to 0, run this code
{
onebucket[ocount++] = array.get(i);
//Set the next element of the 'onebucket' array to equal the ith element of the unsorted array
}
}
for(i=0; i<ocount; ++i)
//Created a for loop to run for every element within the 'onebucket' array
{
array.set(i,onebucket[i]);
//Appended the ith element of the 'onebucket' array to the ith position in the unsorted array
}
for(i=0; i<zcount; ++i)
//Created a for loop to run for every element within the 'zerobucket' array
{
array.set(i+ocount,zerobucket[i]);
//Appended the ith element of the 'zerobucket' array to the ith position in the unsorted array
}
}
return(array);
//Returned the sorted array to the method
}
I did not write this code I was given it to try to understand

I'll answer your questions in reverse order...
mask = 1 << bit;
Due to precedence rules, you could write this as
mask = (1 << bit);
Which is a bit more obvious. Take the integer 1 (0x01), shift it left by the bit position, and assign it to mask. So if bit is 2, mask is 00000010 (skipping leading zeroes). If bit is 4, mask is 00001000. And so on.
The reason for the mask is the line that follows:
if ((element & mask) == 0)
which is meant to identify whether the bit as the position bit is a 1 or zero. An item ANDed with the bitmask will either be zero or non-zero depending on whether the bit in the same position as the 1 in the bitmask is 0 or non-zero respectively.
Now the more complicated question. The algorithm in question is a least significant bit radix sort, meaning a radix sort in which the passes over the sorted values go from the least to most significant (or in the case of software integers, right to left) bits.
The following pseudocode describes your code above:
array = copy (a)
for every bit position from 0 to radix // radix is 8
for each item in array
if bit at bit position in item is 0
put item in zeroes bucket
else
put item in ones bucket
array = ordered concatenation of ones bucket and zeroes bucket
return array
So why does this work? You can think of this as an iterative weighted ranking of the items in the array. All other bits being equal, an item with a 1 bit will be a larger item than an item with a 0 bit (rank the items). Each pass becomes more important to final position (weight the rankings). The successive application of these binary sorts will result in items that more often have a 1 being more often placed in the 1's bucket. Each time that an item stays in the 1's bucket when other items don't, that item improves its relative position in the 1's bucket when compared to other items that in future passes may also have a 1. Consistent presence in the 1's bucket is then associated with consistent improvement in position, the logical extreme of which will be the largest value in the array.
Hope that helps.

Related

Array Duplicate Efficiency Riddle

Recently in AP Computer Science A, our class recently learned about arrays. Our teacher posed to us a riddle.
Say you have 20 numbers, 10 through 100 inclusive, right? (these numbers are gathered from another file using Scanners)
As each number is read, we must print the number if and only if it is not a duplicate of a number already read. Now, here's the catch. We must use the smallest array possible to solve the problem.
That's the real problem I'm having. All of my solutions require a pretty big array that has 20 slots in it.
I am required to use an array. What would be the smallest array that we could use to solve the problem efficiently?
If anyone could explain the method with pseudocode (or in words) that would be awesome.
In the worst case we have to use an array of length 19.
Why 19? Each unique number has to be remembered in order to sort out duplicates from the following numbers. Since you know that there are 20 numbers incoming, but not more, you don't have to store the last number. Either the 20th number already appeared (then don't do anything), or the 20th number is unique (then print it and exit – no need to save it).
By the way: I wouldn't call an array of length 20 big :)
If your numbers are integers: You have a range from 10 to 100. So you need 91 Bits to store which values have already been read. A Java Long has 64 Bits. So you will need an array of two Longs. Let every Bit (except for the superfluous ones) stand for a number from 10 to 100. Initialize both longs with 0. When a number is read, check if the corresponding bit mapped to the read value is set to 1. If yes, the read number is a duplicate, if no set the bit to 1.
This is the idea behind the BitSet class.
Agree with Socowi. If number of numbers is known and it is equal to N , it is always possible to use N-1 array to store duplicates. Once the last element from the input is received and it is already known that this is the last element, it is not really needed to store this last value in the duplicates array.
Another idea. If your numbers are small and really located in [10:100] diapason, you can use 1 Long number for storing at least 2 small Integers and extract them from Long number using binary AND to extract small integers values back. In this case it is possible to use N/2 array. But it will make searching in this array more complicated and does not save much memory, only number of items in the array will be decreased.
You technically don't need an array, since the input size is fixed, you can just declare 20 variables. But let's say it wasn't fixed.
As other answer says, worst case is indeed 19 slots in the array. But, assuming we are talking about integers here, there is a better case scenario where some numbers form a contiguous interval. In that case, you only have to remember the highest and lowest number, since anything in between is also a duplicate. You can use an array of intervals.
With the range of 10 to 100, the numbers can be spaced apart and you still need an array of 19 intervals, in the worst case. But let's say, that the best case occurs, and all numbers form a contiguous interval, then you only need 1 array slot.
The problem you'd still have to solve is to create an abstraction over an array, that expands itself by 1 when an element is added, so it will use the minimal size necessary. (Similar to ArrayList, but it doubles in size when capacity is reached).
Since an array cannot change size at run time You need a companion variable to count the numbers that are not duplicates and fill the array partially with only those numbers.
Here is a simple code that use companion variable currentsize and fill the array partially.
Alternative you can use arrayList which change size during run time
final int LENGTH = 20;
double[] numbers = new double[LENGTH];
int currentSize = 0;
Scanner in = new Scanner(System.in);
while (in.hasNextDouble()){
if (currentSize < numbers.length){
numbers[currentSize] = in.nextDouble();
currentSize++;
}
}
Edit
Now the currentSize contains those actual numbers that are not duplicates and you did not fill all 20 elements in case you had some duplicates. Of course you need some code to determine whither a numbers is duplicate or not.
My last answer misunderstood what you were needing, but I turned this thing up that does it an int array of 5 elements using bit shifting. Since we know the max number is 100 we can store (Quite messily) four numbers into each index.
Random rand = new Random();
int[] numbers = new int[5];
int curNum;
for (int i = 0; i < 20; i++) {
curNum = rand.nextInt(100);
System.out.println(curNum);
boolean print = true;
for (int x = 0; x < i; x++) {
byte numberToCheck = ((byte) (numbers[(x - (x % 4)) / 4] >>> ((x%4) * 8)));
if (numberToCheck == curNum) {
print = false;
}
}
if (print) {
System.out.println("No Match: " + curNum);
}
int index = ((i - (i % 4)) / 4);
numbers[index] = numbers[index] | (curNum << (((i % 4)) * 8));
}
I use rand to get my ints but you could easily change this to a scanner.

Longest sequence of numbers

I was recently asked this question in an interview for which i could give an O(nlogn) solution, but couldn't find a logic for O(n) . Can someone help me with O(n) solution?
In an array find the length of longest sequence of numbers
Example :
Input : 2 4 6 7 3 1
Output: 4 (because 1,2,3,4 is a sequence even though they are not in consecutive positions)
The solution should also be realistic in terms of space consumed . i.e the solution should be realistic even with an array of 1 billion numbers
For non-consecutive numbers you needs a means of sorting them in O(n). In this case you can use BitSet.
int[] ints = {2, 4, 6, 7, 3, 1};
BitSet bs = new BitSet();
IntStream.of(ints).forEach(bs::set);
// you can search for the longer consecutive sequence.
int last = 0, max = 0;
do {
int set = bs.nextSetBit(last);
int clear = bs.nextClearBit(set + 1);
int len = clear - set;
if (len > max)
max = len;
last = clear;
} while (last > 0);
System.out.println(max);
Traverse the array once and build the hash map whose key is a number from the input array and value is a boolean variable indicating whether the element has been processed or not (initially all are false). Traverse once more and do the following: when you check number a, put value true for that element in the hash map and immediately check the hash map for the existence of the elements a-1 and a+1. If found, denote their values in the hash map as true and proceed checking their neighbors, incrementing the length of the current contigous subsequence. Stop when there are no neighbors, and update longest length. Move forward in the array and continue checking unprocessed elements. It is not obvious at the first glance that this solution is O(n), but there are only two array traversals and hash map ensures that every element of the input is processed only once.
Main lesson - if you have to reduce time complexity, it is often neccesary to use additional space.

Reverse Engineer Sorting Algorithm

I have been given 3 algorithms to reverse engineer and explain how they work, so far I have worked out that I have been given a quick sorting algorithm and a bubble sorting algorithm; however i'm not sure what algorithm this is. I understand how the quick sort and bubble sort work, but I just can't get my head around this algorithm. I'm unsure what the variables are and was hoping someone out there would be able to tell me whats going on here:
public static ArrayList<Integer> SortB(ArrayList<Integer> a)
{
ArrayList<Integer> array = CopyArray(a);
Integer[] zero = new Integer[a.size()];
Integer[] one = new Integer[a.size()];
int i,b;
Integer x,p;
//Change from 8 to 32 for whole integers - will run 4 times slower
for(b=0;b<8;++b)
{
int zc = 0;
int oc = 0;
for(i=0;i<array.size();++i)
{
x = array.get(i);
p = 1 << b;
if ((x & p) == 0)
{
zero[zc++] = array.get(i);
}
else
{
one[oc++] = array.get(i);
}
}
for(i=0;i<oc;++i) array.set(i,one[i]);
for(i=0;i<zc;++i) array.set(i+oc,zero[i]);
}
return(array);
}
This is a Radix Sort, limited to the least significant eight bits. It does not complete the sort unless you change the loop to go 32 times instead of 8.
Each iteration processes a single bit b. It prepares a mask called p by shifting 1 left b times. This produces a power of two - 1, 2, 4, 8, ..., or 1, 10, 100, 1000, 10000, ... in binary.
For each bit, the number of elements in the original array with bit b set to 1 and to 0 are separated into two buckets called one and zero. Once the separation is over, the elements are placed back into the original array, and the algorithm proceeds to the next iteration.
This implementation uses two times more storage than the size of the original array, and goes through the array a total of 16 times (64 times in the full version - once for reading and once for writing of data for each bit). The asymptotic complexity of the algorithm is linear.
Looks like a bit-by-bit radix sort to me, but it seems to be sorting backwards.

Finding unique numbers from sorted array in less than O(n)

I had an interview and there was the following question:
Find unique numbers from sorted array in less than O(n) time.
Ex: 1 1 1 5 5 5 9 10 10
Output: 1 5 9 10
I gave the solution but that was of O(n).
Edit: Sorted array size is approx 20 billion and unique numbers are approx 1000.
Divide and conquer:
look at the first and last element of a sorted sequence (the initial sequence is data[0]..data[data.length-1]).
If both are equal, the only element in the sequence is the first (no matter how long the sequence is).
If the are different, divide the sequence and repeat for each subsequence.
Solves in O(log(n)) in the average case, and O(n) only in the worst case (when each element is different).
Java code:
public static List<Integer> findUniqueNumbers(int[] data) {
List<Integer> result = new LinkedList<Integer>();
findUniqueNumbers(data, 0, data.length - 1, result, false);
return result;
}
private static void findUniqueNumbers(int[] data, int i1, int i2, List<Integer> result, boolean skipFirst) {
int a = data[i1];
int b = data[i2];
// homogenous sequence a...a
if (a == b) {
if (!skipFirst) {
result.add(a);
}
}
else {
//divide & conquer
int i3 = (i1 + i2) / 2;
findUniqueNumbers(data, i1, i3, result, skipFirst);
findUniqueNumbers(data, i3 + 1, i2, result, data[i3] == data[i3 + 1]);
}
}
I don't think it can be done in less than O(n). Take the case where the array contains 1 2 3 4 5: in order to get the correct output, each element of the array would have to be looked at, hence O(n).
If your sorted array of size n has m distinct elements, you can do O(mlogn).
Note that this is going to efficient when m << n (eg m=2 and n=100)
Algorithm:
Initialization: Current element y = first element x[0]
Step 1: Do a binary search for the last occurrence of y in x (can be done in O(log(n)) time. Let it's index be i
Step 2: y = x[i+1] and go to step 1
Edit: In cases where m = O(n) this algorithm is going to work badly. To alleviate it you can run it in parallel with regular O(n) algorithm. The meta algorithm consists of my algorithm and O(n) algorithm running in parallel. The meta algorithm stops when either of these two algorithms complete.
Since the data consists of integers, there are a finite number of unique values that can occur between any two values. So, start with looking at the first and last value in the array. If a[length-1] - a[0] < length - 1, there will be some repeating values. Put a[0] and a[length-1] into some constant-access-time container like a hash set. If the two values are equal, you konow that there is only one unique value in the array and you are done. You know that the array is sorted. So, if the two values are different, you can look at the middle element now. If the middle element is already in the set of values, you know that you can skip the whole left part of the array and only analyze the right part recursively. Otherwise, analyze both left and right part recursively.
Depending on the data in the array you will be able to get the set of all unique values in a different number of operations. You get them in constant time O(1) if all the values are the same since you will know it after only checking the first and last element. If there are "relatively few" unique values, your complexity will be close to O(log N) because after each partition you will "quite often" be able to throw away at least one half of the analyzed sub-array. If the values are all unique and a[length-1] - a[0] = length - 1, you can also "define" the set in constant time because they have to be consecutive numbers from a[0] to a[length-1]. However, in order to actually list them, you will have to output each number, and there are N of them.
Perhaps someone can provide a more formal analysis, but my estimate is that this algorithm is roughly linear in the number of unique values rather than the size of the array. This means that if there are few unique values, you can get them in few operations even for a huge array (e.g. in constant time regardless of array size if there is only one unique value). Since the number of unique values is no grater than the size of the array, I claim that this makes this algorithm "better than O(N)" (or, strictly: "not worse than O(N) and better in many cases").
import java.util.*;
/**
* remove duplicate in a sorted array in average O(log(n)), worst O(n)
* #author XXX
*/
public class UniqueValue {
public static void main(String[] args) {
int[] test = {-1, -1, -1, -1, 0, 0, 0, 0,2,3,4,5,5,6,7,8};
UniqueValue u = new UniqueValue();
System.out.println(u.getUniqueValues(test, 0, test.length - 1));
}
// i must be start index, j must be end index
public List<Integer> getUniqueValues(int[] array, int i, int j) {
if (array == null || array.length == 0) {
return new ArrayList<Integer>();
}
List<Integer> result = new ArrayList<>();
if (array[i] == array[j]) {
result.add(array[i]);
} else {
int mid = (i + j) / 2;
result.addAll(getUniqueValues(array, i, mid));
// avoid duplicate divide
while (mid < j && array[mid] == array[++mid]);
if (array[(i + j) / 2] != array[mid]) {
result.addAll(getUniqueValues(array, mid, j));
}
}
return result;
}
}

Get confused with nested loops

I know the rationale behind nested loops, but this one just make me confused about the reason it wants to reveal:
public static LinkedList LinkedSort(LinkedList list)
{
for(int k = 1; k < list.size(); k++)
for(int i = 0; i < list.size() - k; i++)
{
if(((Birth)list.get(i)).compareTo(((Birth)list.get(i + 1)))>0)
{
Birth birth = (Birth)list.get(i);
list.set( i, (Birth)list.get( i + 1));
list.set(i + 1, birth);
}
}
return list;
}
Why if i is bigger then i + 1, then swap i and i + 1? I know for this coding, i + 1 equals to k, but then from my view, it is impossible for i greater then k, am i right? And what the run result will be looking like? I'm quite confused what this coding wants to tell me, hope you guys can help me clarify my doubts, thank you.
This method implements a bubble sort. It reorders the elements in the list in ascending order. The exact data to be ordered by is not revealed in this code, the actual comparison is done in Birth#compare.
Lets have a look at the inner loop first. It does the actual sorting. The inner loop iterates over the list, and compares the element at position 0 to the element at position 1, then the element at position 1 to the element at position 2 etc. Each time, if the lower element is larger than the higher one, they are swapped.
After the first full run of the inner loop the largest value in the list now sits at the end of the list, since it was always larger than the the value it was compared to, and was always swapped. (try it with some numbers on paper to see what happens)
The inner loop now has to run again. It can ignore the last element, since we already know it contains the largest value. After the second run the second largest value is sitting the the second-to-last position.
This has to be repeated until the whole list is sorted.
This is what the outer loop is doing. It runs the inner loop for the exact number of times to make sure the list is sorted. It also gives the inner loop the last position it has to compare to ignore the part already sorted. This is just an optimization, the inner loop could just ignore k like this:
for(int i = 0; i < list.size() - 1; i++)
This would give the same result, but would take longer since the inner loop would needlessly compare the already sorted values at the end of the list every time.
Example: you have a list of numbers which you want to sort ascendingly:
4 2 3 1
The first iteration do these swap operations: swap(4, 2), swap(4, 3), swap(4, 1). The intermediate result after the 1st iteration is 2 3 1 4. In other words, we were able to determine which number is the greatest one and we don't need to iterate over the last item of the intermediate result.
In the second iteration, we determine the 2nd greatest number with operations: swap(3, 1). The intermediate result looks then 2 1 3 4.
And the end of the 3rd iteration, we have a sorted list.

Categories