Dynamic Programming approach - Interleaving Parentheses - java

Below is my code for the problem described on https://community.topcoder.com/stat?c=problem_statement&pm=14635. It keeps track of possible interleaves (as described in the problem description given) through a static variable countPossible.
public class InterleavingParentheses{
public static int countPossible = 0;
public static Set<String> dpyes = new HashSet<>(); //used for dp
public static Set<String> dpno = new HashSet<>(); //used for dp
public static void numInterleaves(char[] s1, char[] s2, int size1, int size2){
char[] result = new char[size1+size2];
numInterleavesHelper(result,s1,s2,size1,size2,0,0,0);
}
public static void numInterleavesHelper(char[] res, char[] s1, char[] s2, int size1, int size2, int pos, int start1, int start2){
if (pos == size1+size2){
if (dpyes.contains(new String(res))){
countPossible+=1;
}
else{
if(dpno.contains(new String(res))){
countPossible+=0;
}
else if (isValid(res)){
dpyes.add(new String(res));
countPossible+=1;
}
else{
dpno.add(new String(res));
}
}
}
if (start1 < size1){
res[pos] = s1[start1];
numInterleavesHelper(res,s1,s2,size1,size2,pos+1,start1+1,start2);
}
if (start2 < size2){
res[pos] = s2[start2];
numInterleavesHelper(res,s1,s2,size1,size2,pos+1,start1,start2+1);
}
}
private static boolean isValid(char[] string){
//basically checking to see if parens are balanced
LinkedList<Character> myStack = new LinkedList<>();
for (int i=0; i<string.length; i++){
if (string[i] == "(".charAt(0)){
myStack.push(string[i]);
}
else{
if (myStack.isEmpty()){
return false;
}
if (string[i] == ")".charAt(0)){
myStack.pop();
}
}
}
return myStack.isEmpty();
}
}
I use the scanner class to put in the input strings s1 = "()()()()()()()()()()()()()()()()()()()()" and s2 = "()()()()()()()()()()()()()()()()()" into this function and while the use of the HashSet greatly lowers the time because duplicate interleaves are accounted for, large input strings still take up a lot of time. The sizes of the input strings are supposed to be at most 2500 characters and my code is not working for strings that long. How can i modify this to make it better?

Your dp set is only used at the end, so at best you can save an O(n), but you've already done many O(n) operations to reach that point so the algorithm completexity is about the same. For dp to be effective, you need to be reducing O(2^n) operations to, say O(n^2).
As one of the testcases has an answer of 487,340,184, then for your program to produce this answer, it would need that number of calls to numInterleavesHelper because each call can only increment countPossible by 1. The question asking for the answer "modulo 10^9 + 7" as well indicates that there is a large number expected as an answer.
This rules out things like creating every possible resulting string, most string manipulation, and counting 1 string at a time. Even if you optimized it, then the number of iterations alone makes it unfeasible.
Instead, think of algorithms that have about 10,000,000 iterations. Each string has a length of 2500. These constraints were chosen on purpose so that 2500 * 2500 fits within this number of iterations, suggesting a 2D dp solution.
If you create an array:
int ways[2501][2501] = new int[2501][2501];
then you want the answer to be:
ways[2500][2500]
Here ways[x][y] is the number of ways of creating valid strings where x characters have been taken from the first string, and y characters have been taken from the second string. Each time you add a character, you have 2 choices, taking from the first string or taking from the second. The new number of ways is the sum of the previous ones, so:
ways[x][y] = ways[x-1][y] + ways[x][y-1]
You also need to check that each string is valid. They're valid if each time you add a character, the number of opening parens minus the number of closing parens is 0 or greater, and this number is 0 at the end. The number of parens of each type in every prefix of s1 and s2 can be precalculated to make this a constant-time check.

Related

ArrayList vs HashMap time complexity

The scenario is the following:
You have 2 strings (s1, s2) and want to check whether one is a permutation of the other so you generate all permutations of lets say s1 and store them and then iterate over and compare against s2 until either it's found or not.
Now, in this scenario, i am deliberating whether an ArrayList is better to use or a HashMap when considering strictly time complexity as i believe both have O(N) space complexity.
According to the javadocs, ArrayList has a search complexity of O(N) whereas HashMap is O(1). If this is the case, is there any reason to favor using ArrayList over HashMap here since HashMap would be faster?
The only potential downside i could think of is that your (k,v) pairs might be a bit weird if you did something like where the key = value, i.e. {k = "ABCD", v = "ABCD"}, etc..
As shown here:
import java.io.*;
import java.util.*;
class GFG{
static int NO_OF_CHARS = 256;
/* function to check whether two strings
are Permutation of each other */
static boolean arePermutation(char str1[], char str2[])
{
// Create 2 count arrays and initialize
// all values as 0
int count1[] = new int [NO_OF_CHARS];
Arrays.fill(count1, 0);
int count2[] = new int [NO_OF_CHARS];
Arrays.fill(count2, 0);
int i;
// For each character in input strings,
// increment count in the corresponding
// count array
for (i = 0; i <str1.length && i < str2.length ;
i++)
{
count1[str1[i]]++;
count2[str2[i]]++;
}
// If both strings are of different length.
// Removing this condition will make the program
// fail for strings like "aaca" and "aca"
if (str1.length != str2.length)
return false;
// Compare count arrays
for (i = 0; i < NO_OF_CHARS; i++)
if (count1[i] != count2[i])
return false;
return true;
}
/* Driver program to test to print printDups*/
public static void main(String args[])
{
char str1[] = ("geeksforgeeks").toCharArray();
char str2[] = ("forgeeksgeeks").toCharArray();
if ( arePermutation(str1, str2) )
System.out.println("Yes");
else
System.out.println("No");
}
}
// This code is contributed by Nikita Tiwari.
If you're glued to your implementation, use a HashSet, it still has O(1) lookup time, just without keys
You can use HashSet as you need only one parameter.

How to handle the time complexity for permutation of strings during anagrams search?

I have a program that computes that whether two strings are anagrams or not.
It works fine for inputs of strings below length of 10.
When I input two strings whose lengths are equal and have lengths of more than 10 program runs and doesn't produce an answer .
My concept is that if two strings are anagrams one string must be a permutation of other string.
This program generates the all permutations from one string, and after that it checks is there any matching permutation for the other string. In this case I wanted to ignore cases.
It returns false when there is no matching string found or the comparing strings are not equal in length, otherwise returns true.
public class Anagrams {
static ArrayList<String> str = new ArrayList<>();
static boolean isAnagram(String a, String b) {
// there is no need for checking these two
// strings because their length doesn't match
if (a.length() != b.length())
return false;
Anagrams.permute(a, 0, a.length() - 1);
for (String string : Anagrams.str)
if (string.equalsIgnoreCase(b))
// returns true if there is a matching string
// for b in the permuted string list of a
return true;
// returns false if there is no matching string
// for b in the permuted string list of a
return false;
}
private static void permute(String str, int l, int r) {
if (l == r)
// adds the permuted strings to the ArrayList
Anagrams.str.add(str);
else {
for (int i = l; i <= r; i++) {
str = Anagrams.swap(str, l, i);
Anagrams.permute(str, l + 1, r);
str = Anagrams.swap(str, l, i);
}
}
}
public static String swap(String a, int i, int j) {
char temp;
char[] charArray = a.toCharArray();
temp = charArray[i];
charArray[i] = charArray[j];
charArray[j] = temp;
return String.valueOf(charArray);
}
}
1. I want to know why can't this program process larger strings
2. I want to know how to fix this problem
Can you figure it out?
To solve this problem and check whether two strings are anagrams you don't actually need to generate every single permutation of the source string and then match it against the second one. What you can do instead, is count the frequency of each character in the first string, and then verify whether the same frequency applies for the second string.
The solution above requires one pass for each string, hence Θ(n) time complexity. In addition, you need auxiliary storage for counting characters which is Θ(1) space complexity. These are asymptotically tight bounds.
you're doing it in very expensive way and the time complexity here is exponential because your'e using permutations which requires factorials and factorials grow very fast , as you're doing permutations it will take time to get the output when the input is greater than 10.
11 factorial = 39916800
12 factorial = 479001600
13 factorial = 6227020800
and so on...
So don't think you're not getting an output for big numbers you will eventually get it
If you go something like 20-30 factorial i think i will take years to produce any output , if you use loops , with recursion you will overflow the stack.
fact : 50 factorial is a number that big it is more than the number of sand grains on earth , and computer surrender when they have to deal with numbers that big.
That is why they make you include special character in passwords to make the number of permutations too big that computers will not able to crack it for years if they try every permutations , and encryption also depends on that weakness of the computers.
So you don't have to and should not do that to solve it (because computer are not good very at it), it is an overkill
why don't you take each character from one string and match it with every character of other string, it will be quadratic at in worst case.
And if you sort both the strings then you can just say
string1.equals(string2)
true means anagram
false means not anagram
and it will take linear time,except the time taken in sorting.
You can first get arrays of characters from these strings, then sort them, and then compare the two sorted arrays. This method works with both regular characters and surrogate pairs.
public static void main(String[] args) {
System.out.println(isAnagram("ABCD", "DCBA")); // true
System.out.println(isAnagram("𝗔𝗕𝗖𝗗", "𝗗𝗖𝗕𝗔")); // true
}
static boolean isAnagram(String a, String b) {
// invalid incoming data
if (a == null || b == null
|| a.length() != b.length())
return false;
char[] aArr = a.toCharArray();
char[] bArr = b.toCharArray();
Arrays.sort(aArr);
Arrays.sort(bArr);
return Arrays.equals(aArr, bArr);
}
See also: Check if one array is a subset of the other array - special case

Generate all Palindromic numbers in a given number system?

I need to generate all palindromic numbers for a given number base (which should be able to be of size up to 10,000), in a given range. I need a efficient way to do it.
I stumbled upon this answer, which is related to base 10 directly. I'm trying to adapt it to work for "all" bases:
public static Set<String> allPalindromic(long limit, int base, char[] list) {
Set<String> result = new HashSet<String>();
for (long i = 0; i <= base-1 && i <= limit; i++) {
result.add(convert(i, base, list));
}
boolean cont = true;
for (long i = 1; cont; i++) {
StringBuffer rev = new StringBuffer("" + convert(i, base, list)).reverse();
cont = false;
for (char d : list) {
String n = "" + convert(i, base, list) + d + rev;
if (convertBack(n, base, list) <= limit) {
cont = true;
result.add(n);
}
}
}
return result;
}
convert() method converts a number to a string representation of that number in a given base using a list of chars for digits.
convertBack() converts back the string representation of a number to base 10.
When testing my method for base 10, it leaves out two-digit palindromes and then the next ones it leaves out are 1001,1111,1221... and so on.
I'm not sure why.
Here are the conversion methods if needed.
Turns out, this gets slower with my other code because of constant conversions since I need the all numbers in order and in decimal. I'll just stick to iterating over every integer and converting it to every base and then checking if its a palindrome.
I don't have enough reputation to comment, but if you are only missing even length palindromes, then most probably there is something wrong with your list. Most probably you have forgot to add an empty entry in list as to generate 1001, it should be like num(10) + empty("") + rev(01).
There is no so many appropriate chars for digits in all possible bases (like 0xDEADBEEF for hex, and I suppose that convert has some limit like 36), so forget about exotic digits, and use simple lists or arrays like [8888, 123, 5583] for digits in 10000-base.
Then convert limit into need base, store it.
Now generate symmetric arrays of odd and even length like
[175, 2, 175] or [13, 221, 221, 13]. If length is the same as limit length, compare array values and reject too high numbers.
You can also use limit array as starting and generate only palindromes with lesser values.

Alternating string of char and int [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
when given a input string i am suppose to break it up into two groups
char
int.
with these two groups i want to create a new alternating string.
for example
abc1234defgh567jk89
will transform into
a1b2c3d5e6f7j8k9
notice that the digit 4,g,h has been discarded.
i figured that a queue can be implemented in this case.
queue1> abc
queue2> 123
index 0 to 2 is a char
index three is a int, so for queue 2 we only take in 3 values.
my question is there a more efficient data structure to perform this operation?
and during implementation, how to i compare to see if a particular value is a int or a char?
please advise.
Treating the string as an array of char integers would make this easier to compare, as you can do a simple comparision on the entry. If array[x]>64 it is a character else it is a number. You can use two pointers to do the interleaving. One for character and the other for integer. Find a character and then advance the integer pointer until it finds a match, then advance them as long as they are both true, then fast forward both of them. For example:
char array[]=(char *)string;
int letter=array[0];
int number=array[0];
// Initialize
while(number >= 64)
number++;
while (letter<64)
letter++;
//Now that the pointers are initialized, interleave them.
while(letter>=64 && number<64)
{
output[i++]=letter;
output[i++]=number;
number++;
letter++;
}
// Now you need to advance to the next batch, so you need to see the comparison false and then true again.
....
You are right, a queue is a good data structure for this problem. If, however, you want fancier methods at hand, a Linked List would be another very similar alternative.
To check if a particular value is a letter or a number, you can use the Character class. For example,
String sample = "hello1";
Character.isLetter( sample.charAt(0) ); // returns true
Character.isLetter( sample.charAt(5) ); // returns false
how to i compare to see if a particular value is a int or a char?
You can do something like this:
String string = "abc1234defgh567jk89";
for(int i=0; i<string.length;i++){
int c = (int)string.charAt(i);
boolean isChar = 97<=c&&c<=122 || 65<=c&&c<=90;
boolean isNum = 48<=c&&c<=57;
if(!isChar && !isNum){
throw new IllegalArgumentException("I don't know what you are")
}
}
About the datastructutures, personally I will use two single linked list, one for chars and one for numbers and every character will be stored in a different node. Why?, well if you store the characters (in general, I mean chars and ints) in groups of threes later you will have to add more code to split those groups and put chars and ints together, putting them in a linked list makes sense because
you can put as much nodes as you want (or memory lets you but let's assume is infinite)
data will be stored in order (which looks like some kind of requirement you have in order to display the output, also this discards trees and stacks(FILO))
since you only need to go forward when generating the output a double linked list will be over engineering.
To generate the output:
Having two datastructures let's you add another check like:
if(listChars.size() != listNums.size()){
throw new IllegalArgumentException("Wrong input!!!")
}
Additionally,
Reviewing the list will take you O(n) time, memory used will be O(n), reviewing both list will take you O(n/m) where m is the size of the initial group of chars.
You can do that like this:
Iterator<Character> iterChar = listChar.iterator();
Iterator<Integer> iterNum = listChar.iterator();
String result = "";
while(iterChar.hasNext() && iterNum.hasNext() ){
result+=iterChar.next()+iterNum.next();
}
Finally, you can use queues or linked list here both give you the same in this scenario
To check if the next char is a letter or number you can use this:
public static boolean isNumber(char c) { return c >= '0' && c <= '9'; }
public static boolean isLetter(char c) { return c >= 'a' && c <= 'z'; }
These functions find the index of the next number or letter, starting at pos i:
public static int nextNumber(String s, int i) {
while(i < s.length() && !isNumber(s.charAt(i))) i++;
return i;
}
public static int nextLetter(String s, int i) {
while(i < s.length() && !isLetter(s.charAt(i))) i++;
return i;
}
You don't really need a data structure, all you need is 3 pointers:
public static String alternate(String s){
// pointers
int start = 0, mid = 0, end = 0;
StringBuilder sb = new StringBuilder(s.length());
while(end < s.length()){
// E.g. for 'abc1234' {start, mid, end} = {0, 3, 7}
start = Math.min(nextLetter(s, end), nextNumber(s, end));
mid = Math.max(nextLetter(s, end), nextNumber(s, end));
end = Math.max(nextLetter(s, mid), nextNumber(s, mid));
for(int i = 0; i < Math.min(mid - start, end - mid); i++)
sb.append(s.charAt(start + i)).append(s.charAt(mid + i));
}
return sb.toString();
}
Running the example below outputs the desired result: a1b2c3d5e6f7j8k9
public static void main(String... args){
System.out.println(alternate("abc1234defgh567jk89"));
}

Sudden slow-down and java.lang.OutOfMemoryError during Java string search

I am writing a program for pattern discovery in RNA sequences that mostly works. In order to find 'patterns' in the sequences, I am generating some possible patterns and scanning through the input file of all sequences for them (there's more to the algorithm, but this is the bit that is breaking). Possible patterns generated are of a specified length given by the user.
This works well for all sequence lengths up to 8 characters long. Then at 9, the program runs for an very long time, then gives a java.lang.OutOfMemoryError. After some debugging, I found that the weak point is the pattern generation method:
/* Get elementary pattern (ep) substrings, to later combine into full patterns */
public static void init_ep_subs(int length) {
ep_subs = new ArrayList<Substring>(); // clear static ep_subs data field
/* ep subs are of the form C1...C2...C3 where C1, C2, C3 are characters in the
alphabet and the whole length of the string is equal to the input parameter
'length'. The number of dots varies for different lengths.
The middle character C2 can occur instead of any dot, or not at all.*/
for (int i = 1; i < length-1; i++) { // for each potential position of C2
// for each alphabet character to be C1
for (int first = 0; first < alphabet.length; first++) {
// for each alphabet character to be C3
for (int last = 0; last < alphabet.length; last++) {
// make blank pattern, i.e. no C2
Substring s_blank = new Substring(-1, alphabet[first],
'0', alphabet[last]);
// get its frequency in the input string
s_blank.occurrences = search_sequences(s_blank.toString());
// if blank ep is found frequently enough in the input string, store it
if (s_blank.frequency()>=nP) ep_subs.add(s_blank);
// when C2 is present, for each character it could be
for (int mid = 0; mid < alphabet.length; mid++) {
// make pattern C1,C2,C3
Substring s = new Substring(i, alphabet[first],
alphabet[mid],
alphabet[last]);
// search input string for pattern s
s.occurrences = search_sequences(s.toString());
// if s is frequent enough, store it
if (s.frequency()>=nP) ep_subs.add(s);
}
}
}
}
}
Here's what happens: When I time the calls to search_sequences, they start out at around 40-100ms each and carry on that way for the first patterns. Then after a couple hundred patterns (around 'C.....G.C') those calls suddenly start to take about ten times as long, 1000-2000ms. After that, the times steadily increase until at about 12000ms ('C......TA') it gives this error:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.<init>(String.java:215)
at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
at java.nio.CharBuffer.toString(CharBuffer.java:1157)
at java.util.regex.Matcher.toMatchResult(Matcher.java:232)
at java.util.Scanner.match(Scanner.java:1270)
at java.util.Scanner.hasNextLine(Scanner.java:1478)
at PatternFinder4.search_sequences(PatternFinder4.java:217)
at PatternFinder4.init_ep_subs(PatternFinder4.java:256)
at PatternFinder4.main(PatternFinder4.java:62)
This is the search_sequences method:
/* Searches the input string 'sequences' for occurrences of the parameter string 'sub' */
public static ArrayList<int[]> search_sequences(String sub) {
/* arraylist returned holding int arrays with coordinates of the places where 'sub'
was found, i.e. {l,i} l = lines number, i = index within line */
ArrayList<int[]> occurrences = new ArrayList<int[]>();
s = new Scanner(sequences);
int line_index = 0;
String line = "";
while (s.hasNextLine()) {
line = s.nextLine();
pattern = Pattern.compile(sub);
matcher = pattern.matcher(line);
pattern = null; // all the =nulls were intended to help memory management, had no effect
int index = 0;
// for each occurrence of 'sub' in the line being scanned
while (matcher.find(index)) {
int start = matcher.start(); // get the index of the next occurrence
int[] occurrence = {line_index, start}; // make up the coordinate array
occurrences.add(occurrence); // store that occurrence
index = start+1; // start looking from after the last occurence found
}
matcher=null;
line=null;
line_index++;
}
s=null;
return occurrences;
}
I've tried the program on a couple of different computers of differing speeds, and while the actual times time complete search_sequence are smaller on faster computers, the relative times are the same; at around the same number of iterations, search_sequence starts taking ten times as long to complete.
I've tried googling about memory efficiency and speed of different input streams such as BufferedReader etc, but the general consensus seems to be that they are all roughly equivalent to Scanner. Do any of you have any advice about what this bug is or how I could try to figure it out myself?
If anyone wants to see any more of the code, just ask.
EDIT:
1 - The input file 'sequences' is 1000 protein sequences (each on one line) of varying lengths around a couple hundred characters. I should also mention this program will /only ever need to work/ up to patterns of length nine.
2 - Here are the Substring class methods used in the above code
static class Substring {
int residue; // position of the middle character C2
char front, mid, end; // alphabet characters for C1, C2 and C3
ArrayList<int[]> occurrences; // list of positions the substring occurs in 'sequences'
String string; // string representation of the substring
public Substring(int inresidue, char infront, char inmid, char inend) {
occurrences = new ArrayList<int[]>();
residue = inresidue;
front = infront;
mid = inmid;
end = inend;
setString(); // makes the string representation using characters and their positions
}
/* gets the frequency of the substring given the places it occurs in 'sequences'.
This only counts the substring /once per line ist occurs in/. */
public int frequency() {
return PatternFinder.frequency(occurrences);
}
public String toString() {
return string;
}
/* makes the string representation using the substring's characters and their positions */
private void setString() {
if (residue>-1) {
String left_mid = "";
for (int j = 0; j < residue-1; j++) left_mid += ".";
String right_mid = "";
for (int j = residue+1; j < length-1; j++) right_mid += ".";
string = front + left_mid + mid + right_mid + end;
} else {
String mid = "";
for (int i = 0; i < length-2; i++) mid += ".";
string = front + mid + end;
}
}
}
... and the PatternFinder.frequency method (called in Substring.frequency()) :
public static int frequency(ArrayList<int[]> occurrences) {
HashSet<String> lines_present = new HashSet<String>();
for (int[] occurrence : occurrences) {
lines_present.add(new String(occurrence[0]+""));
}
return lines_present.size();
}
What is alphabet? What kind of regexs are you giving it? Have you checked the number of occurrences you're storing? It's possible that simply storing the occurrences is enough to make it run out of memory, since you're doing an exponential number of searches.
It sounds like your algorithm has a hidden exponential resource usage. You need to rethink what you are trying to do.
Also, setting a local variable to null won't help since the JVM already does data flow and liveness analysis.
Edit: Here's a page that explains how even short regexes can take an exponential amount of time to run.
I can't spot an obvious memory leak, but your program does have a number of inefficiencies. Here are some recommendations:
Indent your code properly. It will make reading it, both for you and for others, much easier. In its current form it's very hard to read.
If you're referring to a member variable, prefix it with this., otherwise readers of code snippets won't know for sure what you're referring to.
Avoid static members and methods unless they're absolutely necessary. When referring to them, use the Classname.membername form, for the same reasons.
How is the code of frequency() different from just return occurrences.size()?
In search_sequences(), the regex string sub is a constant. You need to compile it only once, but you're recompiling it for every line.
Split the input string (sequences) into lines once and store them in an array or ArrayList. Don't re-split inside search_sequences(), pass the split collection in.
There are probably more things to fix, but this is the list that jumps out.
Fix all these and if you still have problems, you may need to use a profiler to find out what's happening.

Categories