Suffix array nlogn creation

Suffix array nlogn creation - java

I have been learning suffix arrays creation, & i understand that We first sort all suffixes according to first character, then according to first 2 characters, then first 4 characters and so on while the number of characters to be considered is smaller than 2n.
But my doubt is why don't we choose the first 3 characters, then 9... and so on. Why only 2 characters are taken into account since the strings are a part of same strings and not different random strings?

I haven't analyzed the suffix array construction algorithm thoroughly, but still would like to share my thoughts.
In my humble opinion, your question is similar to the following ones:
Why do computers use binary encoding of information instead of ternary?
Why does binary search bisect the range instead of trisecting it?
Why are there two sexes rather than three?
The reason is that the number 2 is special - it is the smallest plural number. The difference between 1 and 2 is qualitative, whereas the difference between 2 and 3 (as well as any other positive integer) is quantitative and therefore not as drastic.
As a result, binary formulation of many algorithms and data structures turns out to be the simplest one, though some of them may be generalized, with various degrees of added complexity, for an arbitrary base.

Answer is given from the post you linked. And as #Leon answered, the algorithm work because it use a dichotomous approach to solve the sorting problem. if you correctly read the answer, the main purpose is to divide word be small 2 character fragments. So that 4 characters can be easily sort base on the arrangement of the 2 pair of characters, 6 characters with 4-2 or 2-4 or 2-2-2 and so one. Thus have a word of 3 letters in the table is non-sense since word of 3 characters may be seen has 2 characters + the position in the alphabet of the last character.

I think you are considering only the speed of 2^x versus 3^x where you obviously would prefer the latter.
But you have to consider the effort you need for each step.
Since 3^x needs about 1.58 less steps than 2^x you would need to be able to compute a single step for the 3^x growth in less than 1.58 times what you need for a single step in the 2^x growth to perform better.
Generally the problems will get much more complex when you have to handle three elements in each step instead of two.
Also if you could expand it to 3^x you could also do it for a bigger n^x and then with big n your algorithm is suddenly not exponential but effectively linear.

Related

How can I write a recursive permutation function without using arrays and substrings in Java

My teacher and I were discussing whether or not a recursive permutation function could be written without the use of substrings and/or arrays in Java.
Is there a way to do this?

The answer is yes, this can be done. I'm assuming that "without the use of substrings and/or arrays" refers to the info being passed to the recursion. You have to have some sort of container for the elements that are to be permuted.
In that case it can be done by pulling some hideous tricks with numerically encoding the indices of the elements as digits of a numeric argument. For instance, if there are 3 elements and I use 1 as a sentinel value in the left-most digit (so you can have 0 as the leading index sometimes), 1 means I haven't started, 10 means the first element has been selected, 102 means the first and third, and 1021 means I'm ready to print the permutation since I now have a 4 digit argument and there are 3 elements in the set. I can then deconstruct which elements to print using % 10 and / 10 arithmetic to pick them off.
I implemented this in Ruby rather than Java, and I'm not going to share the actual code because it's too horrible to contemplate. However, it works recursively with only the input array of elements and an integer as arguments, no partial solution substrings or arrays.

Addition of Two Numbers represented by Linked List nodes [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'm preparing for an interview. The question I got is: Two numbers are represented by a linked list, where each node contains a single digit.The digits are stored in reverse oder, such that the 1's digit is at the head of the list. Write a function that adds the two numbers and returns the sum as a linked list.
The suggested answer adds each digits individually and keeps a "carry" number. For example, if the first digits of the two numbers are "5" and "7". The algorithm records "2" in the first digit of resulting sum and keeps "1" as a "carry" to add to 10th digit of result.
However, my solution is to traverse the two linked lists and translate them into two integers. Then I add the numbers and translate sum to a new linked list. Wouldn't my solution be more straight forward and efficient?
Thanks!

While your solution may be more straightforward, I don't think it's actually more efficient.
I'm assuming the "correct" algorithm is something like this:
pop the first element of both lists
Add them together (with the carry if there is one) and make a new node using the ones digit
Pass the carry (dividing the sum by 10 to get the actual thing to carry) and repeat 1) and 2), with each successive node being pointed to by the previous one.
The main things I see when I'm comparing your algorithm with that one is:
In terms of memory, you want to create two BigIntegers to store intermediate values (I'm assuming you're using BigInteger or some equivalent Object to avoid the constraints of an int or long), on top of the final linked list itself. The original algorithm doesn't use more than a couple of ints to do intermediate calculations, and so in the long run, the original algorithm actually uses less memory.
You're also suggesting that you do all of your arithmetic using the BigIntegers, rather that in ints. Although it's possible that BigInteger is really optimized to the point where it isn't much slower than primitive operations, I highly doubt that calling BigInteger#add is faster than doing the + operation on ints.
Also, some more food for thought. Suppose you didn't have something handy like BigInteger to store arbitrarily large integers. Then you're going to have to have some way to store arbitrarily large integers for your algorithm to work properly. After that, you basically need a way to add arbitrarily large integers to add arbitrarily large integers, and you end up with a problem where you either have to do something like the original algorithm anyway, or you end up using a completely different representation in a subroutine (yikes).

(Assuming by "integer" you mean int.)
Your solution does not scale beyond numbers that can fit in an int, whereas the original solution is only limited by the amount of available memory.
As far as efficiency is concerned, there is nothing about your solution that would make it more efficient than the original.

Your solution is more straightforward to describe, certainly - and an argument might be made in certain situations that the readability of the code your solution would produce would be preferable, when working with large teams.
However, most of the time - their suggested answer is a lot more memory efficient, and probably more CPU-efficient.
You're suggesting going through the first linked-list, storing it as a number (+1 store). Going through the second, storing it as a number (+1 store). Adding the 2 numbers, saving the result (+1 store). Converting this number into a linked list, and saving it (+1 store)
Their solution involves going through the first and second linked list, while writing to the third, and storing it as a new one (+1 store)
This is a +1 store, vs. your +4 store. This might seem like not much, but if we were to try and add n pairs of numbers at the same time (on a distributed system or something), you're looking at 4n stores, rather than just n stores. Which could be a big deal.

Algorithm Complexity (Big-O) of sudoku solver

I'm look for the "how do you find it" because I have no idea how to approach finding the algorithm complexity of my program.
I wrote a sudoku solver using java, without efficiency in mind (I wanted to try to make it work recursively, which i succeeded with!)
Some background:
my strategy employs backtracking to determine, for a given Sudoku puzzle, whether the puzzle only has one unique solution or not. So i basically read in a given puzzle, and solve it. Once i found one solution, i'm not necessarily done, need to continue to explore for further solutions. At the end, one of three possible outcomes happens: the puzzle is not solvable at all, the puzzle has a unique solution, or the puzzle has multiple solutions.
My program reads in the puzzle coordinates from a file that has one line for each given digit, consisting of the row, column, and digit. By my own convention, the upper left square of 7 is written as 007.
Implementation:
I load the values in, from the file, and stored them in a 2-D array
I go down the array until i find a Blank (unfilled value), and set it to 1. And check for any conflicts (whether the value i entered is valid or not).
If yes, I move onto the next value.
If no, I increment the value by 1, until I find a digit that works, or if none of them work (1 through 9), I go back 1 step to the last value that I adjusted and I increment that one (using recursion).
I am done solving when all 81 elements have been filled, without conflicts.
If any solutions are found, I print them to the terminal.
Otherwise, if I try to "go back one step" on the FIRST element that I initially modified, it means that there were no solutions.
How can my programs algorithm complexity? I thought it might be linear [ O(n) ], but I am accessing the array multiple times, so i'm not sure :(
Any help is appreciated

O(n ^ m) where n is the number of possibilities for each square (i.e., 9 in classic Sudoku) and m is the number of spaces that are blank.
This can be seen by working backwards from only a single blank. If there is only one blank, then you have n possibilities that you must work through in the worst case. If there are two blanks, then you must work through n possibilities for the first blank and n possibilities for the second blank for each of the possibilities for the first blank. If there are three blanks, then you must work through n possibilities for the first blank. Each of those possibilities will yield a puzzle with two blanks that has n^2 possibilities.
This algorithm performs a depth-first search through the possible solutions. Each level of the graph represents the choices for a single square. The depth of the graph is the number of squares that need to be filled. With a branching factor of n and a depth of m, finding a solution in the graph has a worst-case performance of O(n ^ m).

In many Sudokus, there will be a few numbers that can be placed directly with a bit of thought. By placing a number in the first empty cell, you give up on a lot of opportunities to reduce the possibilities. If the first ten empty cells have lots of possibilities, you get exponential growth. I'd ask the questions:
Where in the first line can the number 1 go?
Where in the first line can the number 2 go?
...
Where in the last line can the number 9 go?
Same but with nine columns?
Same but with the nine boxes?
Which number can go into the first cell?
Which number can go into the 81st cell?
That's 324 questions. If any question has exactly one answer, you pick that answer. If any question has no answer at all, you backtrack. If every question has two or more answers, you pick a question with the minimal number of answers.
You may get exponential growth, but only for problems that are really hard.

How to get the last digits of 2 power

I’m working on a problem where its demanded to get the last 100 digits of 2^1001. The solution must be in java and without using BigInteger, only with int or Long. For the moment I think to create a class that handle 100 digits. So my question is, if there is another way the handle the overflow using int or long to get a number with 100 digits.
Thanks for all.

EDITED: I was off by a couple powers of 10 in my modulo operator (oops)
The last 100 digits of 2^1001 is the number 2^1001 (mod 10^100).
Note that 2^1001 (mod 10^100) = 2*(2^1000(mod 10^100)) (mod 10^100).
Look into properties of modulo: http://www.math.okstate.edu/~wrightd/crypt/lecnotes/node17.html
This is 99% math problem, 1% programming problem. :)
With this approach, though, you wouldn't be able to use just an integer, as 10^100 won't fit in an int or long.
However, you could use this to find, say, the last 10 digits of 2^1001, then use a separate routine to find the next 10 digits, etc, where each routine uses this feature...

The fastest way to do it (coding wise) is to create an array of 100 integers and then have a function which starts from the ones place and doubles every digit. Make sure to take the modulus of 10 and carry over 1s if needed. For the 100th digit, simply eliminate the carry over. Do it 1001 times and you're done.

Way to store a large dictionary with low memory footprint + fast lookups (on Android)

I'm developing an android word game app that needs a large (~250,000 word dictionary) available. I need:
reasonably fast look ups e.g. constant time preferable, need to do maybe 200 lookups a second on occasion to solve a word puzzle and maybe 20 lookups within 0.2 second more often to check words the user just spelled.
EDIT: Lookups are typically asking "Is in the dictionary?". I'd like to support up to two wildcards in the word as well, but this is easy enough by just generating all possible letters the wildcards could have been and checking the generated words (i.e. 26 * 26 lookups for a word with two wildcards).
as it's a mobile app, using as little memory as possible and requiring only a small initial download for the dictionary data is top priority.
My first naive attempts used Java's HashMap class, which caused an out of memory exception. I've looked into using the SQL lite databases available on android, but this seems like overkill.
What's a good way to do what I need?

You can achieve your goals with more lowly approaches also... if it's a word game then I suspect you are handling 27 letters alphabet. So suppose an alphabet of not more than 32 letters, i.e. 5 bits per letter. You can cram then 12 letters (12 x 5 = 60 bits) into a single Java long by using 5 bits/letter trivial encoding.
This means that actually if you don't have longer words than 12 letters / word you can just represent your dictionary as a set of Java longs. If you have 250,000 words a trivial presentation of this set as a single, sorted array of longs should take 250,000 words x 8 bytes / word = 2,000,000 ~ 2MB memory. Lookup is then by binary search, which should be very fast given the small size of the data set (less than 20 comparisons as 2^20 takes you to above one million).
IF you have longer words than 12 letters, then I would store the >12 letters words in another array where 1 word would be represented by 2 concatenated Java longs in an obvious manner.
NOTE: the reason why this works and is likely more space-efficient than a trie and at least very simple to implement is that the dictionary is constant... search trees are good if you need to modify the data set, but if the data set is constant, you can often run a way with simple binary search.

I am assuming that you want to check if given word belongs to dictionary.
Have a look at bloom filter.
The bloom filter can do "does X belong to predefined set" type of queries with very small storage requirements. If the answer to query is yes, it has small (and adjustable) probability to be wrong, if the answer to query is no, then the answer guaranteed to be correct.
According the Wikipedia article you could need less than 4 MB space for your dictionary of 250 000 words with 1% error probability.
The bloom filter will correctly answer "is in dictionary" if the word actually is contained in dictionary. If dictionary does not have the word, the bloom filter may falsely give answer "is in dictionary" with some small probability.

A very efficient way to store a directory is a Directed Acyclic Word Graph (DAWG).
Here are some links:
Directed Acyclic Word Graph or DAWG description with sourcecode
Construction of the CDAWG for a Trie
Implementation of directed acyclic word graph

You'll be wanting some sort of trie. Perhaps a ternary search trie would be good I think. They give very fast look-up and low memory usage. This paper gives some more info about TSTs. It also talks about sorting so not all of it will apply. This article might be a little more applicable. As the article says, TSTs
combine the time efficiency of digital
tries with the space efficiency of
binary search trees.
As this table shows, the look-up times are very comparable to using a hash table.

You could also use the Android NDK and do the structure in C or C++.

The devices that I worked basically worked from a binary compressed file, with a topology that resembled the structure of a binary tree. At the leafs, you would have the Huffmann compressed text. Finding a node would involve having to skip to various locations of the file, and then only load the portion of the data really needed.

Very cool idea as suggested by "Antti Huima" trying to Store dictionary words using long. and then search using binary search.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.