Binary splitting an array and retrieving "leaf" arrays

Binary splitting an array and retrieving "leaf" arrays - java

I have a number of entries in an array (FT = [-10.5, 6.5, 7.5, -7.5]) which I am applying on binary splitting to append to a result array of arrays (LT = [[-10.5],[6.5, 7.5, -7.5],[6.5,7.5],[-7.5]] the tree describing the splitting for my example is below:
[-10.5, 6.5, 7.5, -7.5]
/ \
[-10.5] [6.5, 7.5, -7.5]
/ \
[6.5, 7.5] [ -7.5]
Now from the array LT I want to retrieve only "leaf" arrays (T = [[-10.5],[6.5,7.5],[-7.5]]) given the size of the initial array FT.
How to achieve this (get T) in Java?

I am presenting a way of thinking about your problem. I am not fledgling it out into a full algorithm; I am leaving some parts for yourself to fill in.
First, if LT is empty, no splitting has occurred. In this case the original FT was the leaf array, and we have no way of telling what it was. The problem cannot be solved.
If LT contains n arrays, then there must exist some m (0 < m < n) so that the first m arrays form the left subtree and the rest form the right subtree. We don’t know m, so we simply try all possible values of m in turn. For each possible m we check whether a solution for this value of m is possible by trying to reconstruct each subtree.
So define an auxiliary method to check if a part of LT can form a subtree and return the leaves if it can.
Your auxiliary method will work like this: If there is only one array, it’s a leaf, so return it. If there are two arrays, they cannot form a subtree. I there are three, they form a subtree exactly if the first is the concatenation of the other two. If there are more than three, then again we need to consider all the possibilities of how they are distributed into subtrees. The difference from before is that we know which full array the subtrees come from, namely the frontmost array. So all solutions should be checked against this. For starters, if the second array is not a prefix of the first, we cannot have a subtree.
Your algorithm will no doubt get recursive at some point.
Pruning opportunity: It seems to me that a binary tree always has an odd number of leaves. So for a solution to exist n needs to be even and m needs to be odd.
I would consider coding my algorithm using lists rather than arrays because I think it’s more convenient to pass lists of lists rather than arrays of arrays or lists of arrays around.
Happy further refinement and coding.

Related

Count how many palindromes in a string

I want to find out all possible palindromes that can be possible using substrings from a given string.
Example: input = abbcbbd.
Possible palindromes are a,b,b,c,b,b,d,bb,bcb, bbcbb,bb
Here is the logic I have implemented:
public int palindromeCount(String input) {
int size = input.length();// all single characters in string are treated as palindromes.
int count = size;
for(int i=0; i<size; i++) {
for(int j=i+2; j<=size; j++) {
String value = input.substring(i, j);
String reverse = new StringBuilder(value).reverse().toString();
if(value.equals(reverse)) {
count++;
}
}
}
return count;
}
Here the time complexity is more, how can I improve the performance of this logic?

Here are some things you can think about when optimizing this algorithm:
What are palindromes? A palindrome is a symmetrical string, which means it must have a center pivot! The pivot may be one of the following:
A letter, as in "aba", or
The position between two letters, as in the position between the letters "aa"
That means there are a total of 2n − 1 possible pivots.
Then, you can search outwards from each pivot. Here is an example:
Sample string: "abcba"
First, let's take a pivot at "c":
abcba → c is a palindrome, then widen your search by 1 on each side.
abcba → bcb is a palindrome, then widen your search by 1 on each side.
abcba → abcba is a palindrome, so we know there are at least 3 palindromes in the string.
Continue this with all pivots.
Which this approach, the runtime complexity is O(n2).

If you're comfortable busting out some heavyweight data structures, it's possible to do this in time O(n), though I'll admit that this isn't something that will be particularly easy to code up. :-)
We're going to need two tools in order to solve this problem.
Tool One: Generalized Suffix Trees. A generalized suffix tree is data structure that, intuitively, is a trie containing all suffixes of two strings S and T, but represented in a more space-efficient way.
Tool Two: Lowest Common Ancestor Queries. A lowest common ancestor query structure (or LCA query) is a data structure built around a particular tree. It's designed to efficiently answer queries of the form "given two nodes in the tree, what is their lowest common ancestor?"
Importantly, a generalized suffix tree for two strings of length m can be built in time O(m), and an LCA query can be built in time O(m) such that all queries take time O(1). These are not obvious runtimes; the algorithms and data structures needed here were publishable results when they were first discovered.
Assuming we have these two structures, we can build a third data structure, which is what we'll use to get our final algorithm:
Tool Three: Longest Common Extension Queries. A longest common extension query data structure (or LCE query) is a data structure built around two strings S and T. It supports queries of the following form: given an index i into string S and an index j into string T, what is the length of the longest string that appears starting at index i in S and index j in T?
As an example, take these two strings:
S: OFFENSE
0123456
T: NONSENSE
01234567
If we did an LCE query starting at index 3 in string S and index 4 in string T, the answer would be the string ENSE. On the other hand, if we did an LCE query starting at index 4 in string S and index 0 in string T, we'd get back the string N.
(More strictly, the LCE query structure doesn't actually return the actual string you'd find at both places, but rather its length.)
It's possible to build an LCE data structure for a pair of strings S and T of length m in time O(m) so that each query takes time O(1). The technique for doing so involves building a generalized suffix tree for the two strings, then constructing an LCA data structure on top. The key insight is that the LCE starting at position i in string S and j in string T is given by the lowest common ancestor of suffix i of string S and suffix j of string T in the suffix tree.
The LCE structure is extremely useful for this problem. To see why, let's take your sample string abbcbbd. Now, consider both that string and its reverse, as shown here:
S: abbcbbd
0123456
T: dbbcbba
0123456
Every palindrome in a string takes one of two forms. First, it can be an odd-length palindrome. A palindrome like that has some central character c, plus some "radius" stretching out forwards and backwards from that center. For example, the string bbcbb is an odd-length palindrome with center c and radius bb.
We can count up how many odd-length palindromes there are in a string by using LCE queries. Specifically, build an LCE query structure over both the string and its reverse. Then, for each position within the original string, ask for the LCE of that position in the original string and its corresponding position in the mirrored string. This will give you the longest odd-length palindrome centered at that point. (More specifically, it'll give you the length of the radius plus one, since the character itself will always match at those two points). Once we know the longest odd-length palindrome centered at that position in the string, we can count the number of odd-length palindromes centered at that position in the string: that will be equal to all the ways we can take that longer palindrome and shorten it by cutting off the front and back character.
With this in mind, we can count all the odd-length palindromes in the string as follows:
for i from 0 to length(S) - 1:
total += LCE(i, length(S) - 1 - i)
The other class of palindromes are even-length palindromes, which don't have a center and instead consist of two equal radii. We can also find these using LCE queries, except instead of looking at some position i and its corresponding position in the reversed string, we'll look at position i in the original string and the place that corresponds to index i - 1 in the reversed string. That can be done here:
for i from 1 to length(S) - 1:
total += LCE(i, length(S) - i)
Overall, this solution
Constructs an LCE query structure from the original string and the reversed string. Reversing the string takes time O(m) and building the LCE query structure takes time O(m). Total time for this step: O(m).
Makes 2m - 1 total queries to the LCE structure. Each query takes time O(1). Total time for this step: O(m).
I'm fairly certain it's possible to achieve this runtime without using such heavyweight tools, but this at least shows that a linear-time solution exists.
Hope this helps!

merging and sorting 2 sorted arrays with big o looking for clarification.

This is a school work. I am not looking for code help but since my teacher isn't helping I came here.
I am asked to merge and sort two sorted arrays following two cases:
When sizes of the two arrays are equal
When sizes of the two arrays are different
Now I have done case 2 which also does case 1 :/ I just don't get it how I could write a code for case 1 or how it could differ from case 2. array length doesn't connect with the problem or I am not understanding correctly.
Then I am asked to compute big(o).
I am not looking for code here. If anyone by any chance understands what my teacher is asking really please give me hints to solve it.

It is very good to learn instead of copying.
as you suggest, there is no difference between case 1 and 2 but the worst case of algorithms depend on your solution. So I describe my solution (no code) and give you its worst case.
You can in both case, the arrays must ends with infinity so add infinity to them. then iterate over all of elements of each array, at each time, pick the one which is smaller and put in in your result array (merge of tow arrays).
With this solution, you can calculate worst case easily. we must iterate both of arrays once, and we add a infinity to both of them, if their length is n and m so our worst and best case is O(m + n) (you do m + n + 2 - 1 comparison and -1 because you don't compare the end of both array, I mean infinity)
but why adding infinity add the end of array? because for that we must make a copy of array with one more space? it is one way and worst case of that is O(m + n) for copying arrays too. but there is another solution too. you can compare until you get at the end of array, then you must add the rest of array which is not compared completely to end of your result array. but with infinity, it is automatic.
I hope helped you. if there is something wrong, comment it.

Merging two sorted arrays is a linear complexity operation. This means in terms of Big-O notation it is O(m+n) where m and n are lengths of two sorted arrays.
So when you say the array length doesn't connect with the problem your understanding is correct. Irrespective of the lengths of two sorted arrays the merging of these arrays involves taking elements from each sorted array and comparing them and copying the one to new array(depending whether you want the merged sorted array in ascending or descending order) and incrementing the counter of the array from which you copied the element to new sorted array.

Another way to approach this question is to look at each array as having a head and a tail, and solving the problem recursively. This way, we can use a base case, two arrays of size 1, to sort through the entirety of the two arrays m and n. Since both arrays are already sorted, simply compare the two heads of each array and add the element that comes first to your newly-created merged array, and move to the next element in that array. Your function will call itself again after adding the element. This will keep happening until one of the two arrays is empty. Now, you can simply add what is left of the nonempty array to the end of your merged array, and you are done.
I'm not sure if your professor will allow you to use recursive calls, but this method could make the coding much easier. Runtime would still be O(m+n), as you are basically iterating through both arrays once.
Hope this helps.

How can I write a recursive permutation function without using arrays and substrings in Java

My teacher and I were discussing whether or not a recursive permutation function could be written without the use of substrings and/or arrays in Java.
Is there a way to do this?

The answer is yes, this can be done. I'm assuming that "without the use of substrings and/or arrays" refers to the info being passed to the recursion. You have to have some sort of container for the elements that are to be permuted.
In that case it can be done by pulling some hideous tricks with numerically encoding the indices of the elements as digits of a numeric argument. For instance, if there are 3 elements and I use 1 as a sentinel value in the left-most digit (so you can have 0 as the leading index sometimes), 1 means I haven't started, 10 means the first element has been selected, 102 means the first and third, and 1021 means I'm ready to print the permutation since I now have a 4 digit argument and there are 3 elements in the set. I can then deconstruct which elements to print using % 10 and / 10 arithmetic to pick them off.
I implemented this in Ruby rather than Java, and I'm not going to share the actual code because it's too horrible to contemplate. However, it works recursively with only the input array of elements and an integer as arguments, no partial solution substrings or arrays.

Java Map vs Array (Temp) Sort of Last Index

I have looked at similar questions that detail the sorting of Maps and sorting of arrays of primitive data types, but no question directly details the difference between a one-time sort of a Java Map vs primitive data type array ([]).
Primary note* I know that 'TreeMap' is the sorted version (by key) of Map in Java, but I don't know how much about the 'behind-the-scenes' of how TreeMap sorts the keys (either while data is being added, or after the data is FINISHED being added)?
Primary note 2* Dijkstra's algorithm in this case is not an EXACT implementation. We are just finding the shortest path of weighted edges in a graph G of size M nodes. This means that adjacency matrix (format seen below) is of size M x M. This is not a SMART implementation. Pretty much just as base-line as you can get... sorry for the confusion!
We are given an adjacency matrix, where elements are related to each other ('connected') in the following example:
0,1,5 // 0 is connected to 1, and the weight of the edge is 5
0,2,7 // 0 is connected to 2, and the weight of the edge is 7
0,3,8 // 0 is connected to 3, and the weight of the edge is 8
1,2,10 // 1 is connected to 2, and the weight of the edge is 10
1,3,7 // 1 is connected to 3, and the weight of the edge is 7
2,3,3 // 2 is connected to 3, and the weight of the edge is 3
But never mind the input, just assume that we have a matrix of values to manipulate.
We are looking at storing all the possible paths in a "shortest path" algorithm (I'm sure 75% or more of people on SO know Dijkstra's algorithm). This IS for homework, but an implementation question, not a "solve this for me" question.
ASSUME that the size of the matrix is very large (size M x M), maybe more than 50x50 in size. This would result in [50-1]!/2 = 1.52 × 10^64 results in the result list assuming that our algorithm was smart enough to pick out duplicates and not find the length of a duplicate path (which it is not, because we are noobs at Graph Theory and Java, so please don't suggest any algorithm to avoid duplicates...).
My friend says that a temp sort (using a temporary variable) on an index of int[n] in a List, where int[n] is the last index and value of the shortest path (ALGORITHM_1) may be faster than TreeMap (ALGORITHM_2) where the key of the Map is the value of the shortest path.
We were debating as to what implementation would be faster in trying to find ALL lengths of the shortest path. We can store it as the last index of each path (have an int[] where the last element is the value (sum) of the shortest path (all elements int the array) (ALGORITHM_1), OR we can store that sum as the KEY of the Map (ALGORITHM_2).
Because this is a shortest path algorithm (albeit not a great one...), we NEED to sort the results of each path by length, which is the sum of each edge in the graph, in order to find the shortest path.
So the real question is: what would be faster in sorting the results ONLY ONE TIME? Through a Map.sort() algorithm (built into the Java standard library) or through creating a temporary variable to hold the value of the most recent 'length' in each int[]? For example:
myMap.sort(); // Unless TreeMap in Java does 'behind=the-scenes' sorting on keys...
myMap.get(0); // This would return the first element of the map, which is the shortest path
OR
int temp = myList.get(0)[m]; // Store a temp variable that is the 'shortest path'
for( int[] i in myList<int[]>) {
if (temp > myList.get(i)[m]) { // Check if the current path is shorter than the previous
temp = myList.get(i)[m]; // Replace temp if current path is shorter
}
}
Note that I haven't actually tested the implementations yet, nor have I checked my own Java syntax, so I don't know if these statements are declared correctly. This is just a theoretical question. Which would run faster? This is my 3rd year of Java and I don't know the underlying data structures used in HashMap, nor the Big O notation of either implementation.
Perhaps someone who knows the Java standard could describe what kind of data structures or implementations are used in HashMap vs (Primitive data type)[], and what the differences in run times might be in a ONE-TIME-ONLY sort of the structures.
I hope that this inquiry makes sense, and I thank anyone who takes the time to answer my question; I always appreciate the time and effort generous people such as yourselves put into helping to educate the newbies!
Regards,
Chris

It may not be necessary to sort your data in order to find the shortest path. Instead, you could iterate through the data and keep track of the shortest path that you've encountered.
Assuming the data is stored in an array of Data objects, with data.pathLength giving the path length,
Data[] data; // array of data
Data shortest = data[0]; // initialize shortest variable
for(int i = 1; i < data.length; i++) {
if(data[i].pathLength < shortest.pathLength)
shortest = data[i];
}
That said, TreeMap is a Red-Black Tree, which is a form of balanced binary tree. Unlike a standard binary tree, a balanced binary tree will rotate its branches in order to ensure that it is approximately balanced, which ensures log(n) lookups and insertions. A red-black tree ensures that the longest branch is no more than twice the length of the shortest branch; an AVL Tree is a balanced binary tree with even tighter restrictions. Long story short, a TreeMap will sort its data in n*log(n) time (log(n) for each insertion, times n data points). Your one-time array sort will also sort its data in n*log(n) time, assuming you're using Mergesort or Quicksort or Heapsort etc (as opposed to Bubblesort or another n^2 sort algorithm). You cannot do better than n*log(n) with a comparison sort; incidentally, you can use a transformation sort like Radix Sort that has a big-oh of O(n), but transformation sorts are usually memory hogs and exhibit poor cache behavior, so you're usually better off with one of the n*log(n) comparison sorts.
Since TreeMap and your custom sort are both n*log(n), this means that there's not much theoretical advantage to either one, so just use the one that's easier to implement. TreeMap's complex data structure does not come free, however, so your custom sorting algorithm will probably exhibit slightly better performance, e.g. maybe a factor of 2; this probably isn't worth the complexity of implementing a custom sort as opposed to using a TreeMap, especially for a one-shot sort, but that's your call. If you want to play around with boosting your program's performance, then implement a sorting algorithm that's amenable to parallelization (like Mergesort) and see how much of an improvement that'll get you when you split the sorting task up among multiple threads.

If you want the shortest path, sorting isn't necessary. Just track the shortest path as you finish each path, and update your shortest path when you encounter a shorter path. That'll wind up giving you an O(n), where n is the number of paths. You couldn't realistically store 10^64 paths anyway, so some truncation of the result set will be required.

how much about the 'behind-the-scenes' of how TreeMap sorts the keys (either while data
is being added, or after the data is FINISHED being added)?
TreeMap uses RedBlack tree algorithm (a variant of BST) where operations containsKey, get, put and remove operations take O(log(n)) time which is the very good. The keys gets sorted after each add of element as the TreeMap definition (in the link) explains it. Sorting will take O(nlog(n))
I am not sure why you are comparing a Map type - which uses key, value pair against Array. You have mentioned about using length of shortest paths as keys in TreeMap but what is that you are putting as values? If you just want to store "length of paths", I would suggest put them in array and sort them using Arrays.Sort() which will also sort in O(nlog(n)) using a different algorithm Dual-Pivot Quicksort.
Hope this helps!

Is it possible to get k-th element of m-character-length combination in O(1)?

Do you know any way to get k-th element of m-element combination in O(1)? Expected solution should work for any size of input data and any m value.
Let me explain this problem by example (python code):
>>> import itertools
>>> data = ['a', 'b', 'c', 'd']
>>> k = 2
>>> m = 3
>>> result = [''.join(el) for el in itertools.combinations(data, m)]
>>> print result
['abc', 'abd', 'acd', 'bcd']
>>> print result[k-1]
abd
For a given data the k-th (2-nd in this example) element of m-element combination is abd. Is it possible to that value (abd) without creating the whole combinatory list?
I'am asking because I have data of ~1,000,000 characters and it is impossible to create full m-character-length combinatory list to get k-th element.
The solution can be pseudo code, or a link the page describing this problem (unfortunately, I didn't find one).
Thanks!

http://en.wikipedia.org/wiki/Permutation#Numbering_permutations
Basically, express the index in the factorial number system, and use its digits as a selection from the original sequence (without replacement).

Not necessarily O(1), but the following should be very fast:
Take the original combinations algorithm:
def combinations(elems, m):
#The k-th element depends on what order you use for
#the combinations. Assuming it looks something like this...
if m == 0:
return [[]]
else:
combs = []
for e in elems:
combs += combinations(remove(e,elems), m-1)
For n initial elements and m combination length, we have n!/(n-m)!m! total combinations. We can use this fact to skip directly to our desired combination:
def kth_comb(elems, m, k):
#High level pseudo code
#Untested and probably full of errors
if m == 0:
return []
else:
combs_per_set = ncombs(len(elems) - 1, m-1)
i = k / combs_per_set
k = k % combs_per_set
x = elems[i]
return x + kth_comb(remove(x,elems), m-1, k)

first calculate r = !n/(!m*!(n-m)) with n the amount of elements
then floor(r/k) is the index of the first element in the result,
remove it (shift everything following to the left)
do m--, n-- and k = r%k
and repeat until m is 0 (hint when k is 0 just copy the following chars to the result)

I have written a class to handle common functions for working with the binomial coefficient, which is the type of problem that your problem appears to fall under. It performs the following tasks:
Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters. This method makes solving this type of problem quite trivial.
Converts the K-indexes to the proper index of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle. My paper talks about this. I believe I am the first to discover and publish this technique, but I could be wrong.
Converts the index in a sorted binomial coefficient table to the corresponding K-indexes. I believe it too is faster than other published techniques.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to perform the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.
To read about this class and download the code, see Tablizing The Binomial Coeffieicent.
It should not be hard to convert this class to Java, Python, or C++.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.