I was going through data structures in java under the topic Skip list and I came across the following:
In a skip list of n nodes, for each k and i such that 1 ≤ k ≤lg n and 1 ≤ i ≤
n/2k–1⎦ – 1, the node in position 2k–1 · i points to the node in position 2k–1 · (i + 1).
This means that every second node points to the node two positions ahead, every
fourth node points to the node four positions ahead, and so on, as shown in Figure
3.17a. This is accomplished by having different numbers of reference fields in nodes
on the list: Half of the nodes have just one reference field, one-fourth of the nodes
have two reference fields, one-eighth of the nodes have three reference fields, and so
on. The number of reference fields indicates the level of each node, and the number of
levels is maxLevel = ⎣lg n⎦ + 1.
And the figure is :
A skip list with (a) evenly and (b) unevenly spaced nodes of different levels;
(c) the skip list with reference nodes clearly shown.
I don't understand the mathematical part and what exactly the sktip list is and even nodes?
Ok let me try to make you understand this.
A skip list is a data-structure which definitely makes your searches faster in a list of given elements.
A better analogy would be a network of subway in any of the bigger cities. Imagine there are 90 stations to cover and there are different lines (Green, Yellow and Blue).
The Green line only connects the stations numbered 0, 30, 60 and 90
The Yellow line connects 0, 10, 20, 30, 40, 50, 60, 70, 80 and 90
The blue line connects all the station from 0 through 90.
If you want to board the train at station 0 and want to get down at 75. What is the best strategy?
Common sense would suggest to board a train on Green line from station 0 and get down at station 60.
Board another train on Yellow line from station 60 and get down at station 70.
Board another train on Blue line from station 70 and get down at 75.
Any other way would have been more time consuming.
Now replace the stations with the nodes and lines with three individual lists (the set of these lists are called skip list).
And just imaging that you wanted to search an element at a node containing the value 75.
I hope this explains what Skip Lists are and how they are efficient.
In the traditional approach of searching, you could have visited each node and got to 75 in 75 hops.
In case of binary search you would have done it in logN
In skip list you can do the same in 1 + 1 + 15 in our particular case. You can do the math, seems to be simple though :)
EDIT: Evenly spaced nodes & Unevenly spaced nodes
As you can see my analogy, it has equal number of stations between each node on each line.
This is evenly spaced nodes. It is an ideal situation.
To understand it better we need to understand the creation of Skip Lists.
In the early stages of its construction there is only one list (the blue line) and each new node is first added to the list at an appropriate location. When the number of nodes in the blue line increases then there comes a need to create another list (yellow line) and promote one of the nodes to list 2. (PS: The first and the last element of list 1 is always promoted to the newly added list in the skip lists set). Hence, the moment a new list is added it will have three nodes.
Promotion Strategy : How to find out which node to promote from the bottom most list(blue line) to the upper lists (yellow line and green line).
The best way to decide is randomly :) So lets say upon addition of a new node, we flip a coin to see if it can be promoted to the second list. if yes, then we add it to the second list and then flip a coin again to check if it has to be added in the third list or not.
So you see, if you use this random mechanism, there might arrive situations where the nodes are unevenly spaced. :)
Hope this helps.
Related
I want to use a singly-linked list to store an image consisting of RGB pixels. The node in the list should contain the value of RGB and its consecutively occurring times. For example, I have an image including 4 pixels and their (R,G,B) values are (8,2,5),(8,2,6),(8,7,6) and (8,7,9) respectively. In this situation, if using a singly-linked list to store them more compactly, the nodes should store the following information:
red needs one node: value 8, repetition times 4(because there are four consecutive 8).
green needs two nodes: 1st node(value:2, repetition 2), 2nd node(value:7,repetition 2)
blue needs three nodes: 1st node(value 5, repetition 1), 2nd node(value:6,repetition 2) 3rd node(value 8,repetition 1)
I wonder which of the following is better:
1.Using one Singly-linked list which contains 3 heads to point to R,G and B nodes respectively. That means storing R,G and B are stored in 3 different sequence of nodes.
2.Using one Singly-linked list which contains only 1 head, but the content of nodes has to be an array to store RGB. That means R,G,and B are combined as an array of 3 elements(R,G,B) and the arrays are stored in one single sequence of nodes.
I think the first solution is easier to realize, but it seems to require more space(creating more nodes). The second may have a potential problem since the number of nodes needed for R, G and B is not the same.
If your goal is to compress an image, you should really consider standard compression techniques. I have a feeling compressing and decompressing an image in the manner will not be very effecient.
[Interview Question] I got this question in a recent online interview. I had no clue how to solve it. Can anyone please help me solve this so that I can learn in Java.
Tom is very good in problem-solving. So to test Tom's skills, Jerry asks Tom a graph problem. Jerry gives Tom, an array A of N integers.
A graph is a simple graph, iff it has no self-loop or multi-edges.
Now Jerry asks Tom whether he can design a simple graph of N vertices or not. The condition is that Tom has to use each and every element of A exactly once for the degrees of vertices of the graph.
Now, Tom wants your help to design his graph. Print "YES" if the graph can be designed, otherwise print "NO" (without quotes).
Input
A single integer T, in the first line, denoting the number of test cases.
For each test case, there are 2 lines.
The first line is a single integer N, denoting the number of elements of array A.
The second line has N-space separated integers, representing elements of A.
Output
For each test case, print "YES" or "NO" (without quotes) whether the graph can be designed or not, in a new line.
Constraints
1<= T <= 100
1<= N <= 100
0<= Element of A <= 5000
Sample Test Cases
Input
1
2
1 1
Output
YES
Explanation
For this test case, a simple graph with 2 vertices can be designed, where each vertex has degree 1.
Input
2
3
1 2 1
3
1 1 1
Output
YES
NO
Explanation
For the first test case, we can design a simple graph of 3 vertices, which has degree sequence as [1, 2, 1]. The first vertex has degree 1, second, has 2 and third has 1.
For the second test case, we cannot make a simple graph of 3 vertices, which has degree sequence as [1, 1, 1].
One necessery condition is that sum of elements in A is even. That is due each edge
is counted twice in adjencency list.
Next is to try to construct graph, or at least 'allocate' pairs of nodes.
Sort elements of A in decending order,
Let the largest (first) element be a,
Check are element on positions 2 to a+1 larger than 0,
If there is a element with value 0 than it is not possible to construct a graph,
Decrease these a elements by 1 and set first element to 0,
Repeat process until all elements are 0.
Note that sorting in subsequent steps can be done in O(n) with merge sort step, since list consists
of three sorted parts:
first element (0) which can go to the end,
sorted part with a elements,
rest which is also sorted.
I was solving a problem which states following:
There are n buckets in a row. A gardener waters the buckets. Each day he waters the buckets between positions i and j (inclusive). He does this for t days for different i and j.
Output the volume of the waters in the buckets assuming initially zero volume and each watering increases the volume by 1.
Input: first line contains t and n seperated by spaces.
The next t lines contain i and j seperated by spaces.
Output: a single line showing the volume in the n buckets seperated by spaces.
Example:
Input:
2 2
1 1
1 2
Output:
2 1
Constraints:
0 <= t <= 104; 1 <= n <= 105
I tried this problem. But I use O(n*t) algorithm. I increment each time the bucket from i to j at each step. But this shows time limit error. Is there any efficient algorithm to solve this problem. A small hint would suffice.
P.S: I have used C++ and Java as tags bcoz the program can be programmed in both the languages.
Instead of remembering the amount of water in each bucket, remember the difference between each bucket and the previous one.
have two lists of the intervals, one sorted by upper, one by lower bound
then iterate over n starting with a volume v of 0.
On each iteration over n
check if the next interval starts at n
if so increase v by one and check the next interval.
do the same for the upper bounds but decrease the volume
print v
repeat with the next n
I think the key observation here is that you need to figure out a way to represent your (possibly) 105 buckets without actually allocating space for each and every one of them, and tracking them separately. You need to come up with a sparse representation to track your buckets, and the water inside.
The fact that your input comes in ranges gives you a good hint: you should probably make use of ranges in your sparse representation. You can do this by just tracking the buckets on the ends of each range.
I suggest you do this with a linked list. Each list node will contain 2 pieces of information:
a bucket number
the amount of water in that bucket
You assume that all buckets between the current bucket and the next bucket have the same volume of water.
Here's an example:
Input:
5 30
1 5
4 20
7 13
25 30
19 27
Here's what would happen on each step of the algorithm, with step 1 being the initial state, and each successive step being what you do after parsing a line.
1:0→NULL (all buckets are 0)
1:1→6:0→NULL (1-5 have 1, rest are 0)
1:1→4:2→6:1→21:0→NULL (1-3 have 1, 4-5 have 2, 6-20 have 1, rest have 0)
1:1→4:2→6:1→7:2→14:1→21:0→NULL
1:1→4:2→6:1→7:2→14:1→21:0→25:1→NULL
1:1→4:2→6:1→7:2→14:1→19:2→21:1→25:2→28:1→NULL
You should be able to infer from the above example that the complexity with this method is actually O(t2) instead of O(n×t), so this should be much faster. As I said in my comment above, the bottleneck this way should actually be the parsing and output rather than the actual computation.
Here's an algorithm with space and time complexity O(n)
I am using java since I am used to it
1) Create a hashset of n elements
2) Each time a watering is made increase the respective elements count
3) After file parsing is complete then iterate over hashset to calculate result.
This is a more descriptive version of my previous question. The problem I am trying to solve relates to block-matching or image-within-image recognition.
I see an image, extract the [x,y] of every black pixel and create a set for that image, such as
{[8,0], [9,0], [11,0]}
The set is then augmented so that the first pixel in the set is at [0,0], but the relationship of the pixels is preserved. For example, I see {[8,0], [9,0]} and change the set to {[0,0], [1,0]}. The point of the extraction is that now if I see {[4,0], [5,0]}, I can recognize that basic relationship as two vertically adjacent pixels, my {[0,0], [1,0]}, since it is the same image but only in a different location.
I have a list of these pixel sets, called "seen images". Each 'seen image' has a unique identifier, that allows it to be used as a nested component of other sets. For example:
{[0,0], [1,0]} has the identifier 'Z'
So if I see:
{[0,0], [1, 0], [5,6]}
I can identify and store it as:
{[z], [5, 6]}
The problem with this is that I have to generate every combination of [x,y]'s within the pixel set to check for a pattern match, and to build the best representation. Using the previous example, I have to check:
{[0,0], [1,0]},
{[0,0], [5,6]},
{[1,0], [5,6]} which is {[0,0], [4,5]}
{[0,0], [1,0], [5,6]}
And then if a match occurs, that subset gets replaced with it's ID, merged with the remainder of the original set, and the new combination needs to be checked if it is a 'seen image':
{[z],[5, 6]}
The point of all that is to match as many of the [x,y]'s possible, using the fewest pre-existing pieces as components, to represent a newly seen image concisely. The greedy solution to get the component that matches the largest subset is not the right one. Complexity arises in generating all of the combinations that I need to check, and then the combinations that spawn from finding a match, meaning that if some match and swap produces {[z], [1,0], [2,0]}, then I need to check (and if matched, repeat the process):
{[z], [1,0]}
{[z], [2,0]}
{[1,0], [2,0]} which is {[0,0], [1,0]}
{[z], [1,0], [2,0]}
Currently I generate the pixel combinations this way (here I use numbers to represent pixels 1 == [x,y]) Ex. (1, 2, 3, 4): Make 3 lists:
1.) 2.) 3.)
12 23 34
13 24
14
Then for each number, for each list starting at that number index + 1, concatenate the number and each item and store on the appropriate list, ex. (1+23) = 123, (1+24) = 124
1.) 2.) 3.)
12 23 34
13 24
14
---- ---- ----
123 234
124
134
So those are all the combinations I need to check if they are in my 'seen images'. This is a bad way to do this whole process. I have considered different variations / optimizations, including once the second half of a list has been generated (below the ----), check every item on the list for matches, and then destroy the list to save space, and then continue generating combinations. Another option would be to generate a single combination, and then check it for a match, and somehow index the combinations so you know which one to generate next.
Does anyone recognize this process in general? Can you help me optimize what I am doing for a set of ~million items. I also have not yet come up with a non-recursive or efficient way to handle that each match generates additional combinations to check.
I am having trouble comprehending how to scan in an adjacency list to a graph. I understand how adjacency tables work and their mapping to each other but what I don't understand is what type of data type to store them in. My assignment is to take an input file that tells the number of vertexes G=(V,E) and gives the edges to the other numbers in the graph.
So for example:
3
010
101
110
so:
0 maps to 1
1 maps to 0
2 maps to 0
2 maps to 1
From there I have to implement a breath search and a depth search on them. Would a hash table be my best bet?
The difference of using BFS and DFS is in which data structure you store the data, one is a "queue" the other is a "stack" (your answer). If you use a java list, you could get them from the beginning or from the end, but you can also use "real" stack and queue.
So in your case, create a List, and store the origin of your search in it.
After a while loop, while you have elements in your list keep it going.
So pick your element from the list ( first or last) and evaluate if it is your target, if it is not, store all its neighbors in the list and keep it going.
You may add something two stop adding the same element twice, you should have a list of visited nodes.
But, I have doubts if you wanted to know where to store the adjacency list. An array of lists would do. Every vertex, vertex[i] has a List with all the vertices that is connected to.