Find every possible subset given a string [duplicate] - java

This question already has answers here:
Memory efficient power set algorithm
(5 answers)
Closed 8 years ago.
I'm trying to find every possible anagram of a string in Java - By this I mean that if I have a 4 character long word I want all the possible 3 character long words derived from it, all the 2 character long and all the 1 character long. The most straightforward way I tought of is to use two nested for loops and iterare over the string. This is my code as of now:
private ArrayList<String> subsets(String word){
ArrayList<String> s = new ArrayList<String>();
int length = word.length();
for (int c=0; c<length; c++){
for (int i=0; i<length-c; i++){
String sub = word.substring(c, c+i+1);
System.out.println(sub);
//if (!s.contains(sub) && sub!=null)
s.add(sub);
}
}
//java.util.Collections.sort(s, new MyComparator());
//System.out.println(s.toString());
return s;
}
My problem is that it works for 3 letter words, fun yelds this result (Don't mind the ordering, the word is processed so that I have a string with the letters in alphabetical order):
f
fn
fnu
n
nu
u
But when I try 4 letter words, it leaves something out, as in catq gives me:
a
ac
acq
acqt
c
cq
cqt
q
qt
t
i.e., I don't see the 3 character long word act - which is the one I'm looking for when testing this method. I can't understand what the problem is, and it's most likely a logical error I'm making when creating the substrings. If anyone can help me out, please don't give me the code for it but rather the reasoning behind your solution. This is a piece of coursework and I need to come up with the code on my own.
EDIT: to clear something out, for me acq, qca, caq, aqc, cqa, qac, etc. are the same thing - To make it even clearer, what happens is that the string gets sorted in alphabetical order, so all those permutations should come up as one unique result, acq. So, I don't need all the permutations of a string, but rather, given a 4 character long string, all the 3 character long ones that I can derive from it - that means taking out one character at a time and returning that string as a result, doing that for every character in the original string.
I hope I have made my problem a bit clearer

It's working fine, you just misspelled "caqt" as "acqt" in your tests/input.
(The issue is probably that you're sorting your input. If you want substrings, you have to leave the input unsorted.)
After your edits: see Generating all permutations of a given string Then just sort the individual letters, and put them in a set.

Ok, as you've already devised your own solution, I'll give you my take on it. Firstly, consider how big your result list is going to be. You're essentially taking each letter in turn, and either including it or not. 2 possibilities for each letter, gives you 2^n total results, where n is the number of letters. This of course includes the case where you don't use any letter, and end up with an empty string.
Next, if you enumerate every possibility with a 0 for 'include this letter' and a 1 for don't include it, taking your 'fnu' example you end up with:
000 - ''
001 - 'u'
010 - 'n'
011 - 'nu'
100 - 'f'
101 - 'fu' (no offense intended)
110 - 'fn'
111 - 'fnu'.
Clearly, these are just binary numbers, and you can derive a function that given any number from 0-7 and the three letter input, will calculate the corresponding subset.
It's fairly easy to do in java.. don't have a java compiler to hand, but this should be approximately correct:
public string getSubSet(string input, int index) {
// Should check that index >=0 and < 2^input.length here.
// Should also check that input.length <= 31.
string returnValue = "";
for (int i = 0; i < input.length; i++) {
if (i & (1 << i) != 0) // 1 << i is the equivalent of 2^i
returnValue += input[i];
}
return returnValue;
}
Then, if you need to you can just do a loop that calls this function, like this:
for (i = 1; i < (1 << input.length); i++)
getSubSet(input, i); // this doesn't do anything, but you can add it to a list, or output it as desired.
Note I started from 1 instead of 0- this is because the result at index 0 will be the empty string. Incidentally, this actually does the least significant bit first, so your output list would be 'f', 'n', 'fn', 'u', 'fu', 'nu', 'fnu', but the order didn't seem important.

This is the method I came up with, seems like it's working
private void subsets(String word, ArrayList<String> subset){
if(word.length() == 1){
subset.add(word);
return;
}
else {
String firstChar = word.substring(0,1);
word = word.substring(1);
subsets(word, subset);
int size = subset.size();
for (int i = 0; i < size; i++){
String temp = firstChar + subset.get(i);
subset.add(temp);
}
subset.add(firstChar);
return;
}
}
What I do is check if the word is bigger than one character, otherwise I'll add the character alone to the ArrayList and start the recursive process. If it is bigger, I save the first character and make a recursive call with the rest of the String. What happens is that the whole string gets sliced in characters saved in the recursive stack, until I hit the point where my word has become of length 1, only one character remaining.
When that happens, as I said at the start, the character gets added to the List, now the recursion starts and it looks at the size of the array, in the first iteration is 1, and then with a for loop adds the character saved in the stack for the previous call concatenated with every element in the ArrayList. Then it adds the character on its own and unwinds the recursion again.
I.E., with the word funthis happens:
f saved
List empty
recursive call(un)
-
u saved
List empty
recursive call(n)
-
n.length == 1
List = [n]
return
-
list.size=1
temp = u + list[0]
List = [n, un]
add the character saved in the stack on its own
List = [n, un, u]
return
-
list.size=3
temp = f + list[0]
List = [n, un, u, fn]
temp = f + list[1]
List = [n, un, u, fn, fun]
temp = f + list[2]
List = [n, un, u, fn, fun, fu]
add the character saved in the stack on its own
List = [n, un, u, fn, fun, fu, f]
return
I have been as clear as possible, I hope this clarifies what was my initial problem and how to solve it.

This is working code:
public static void main(String[] args) {
String input = "abcde";
Set<String> returnList = permutations(input);
System.out.println(returnList);
}
private static Set<String> permutations(String input) {
if (input.length() == 1) {
Set<String> a = new TreeSet<>();
a.add(input);
return a;
}
Set<String> returnSet = new TreeSet<>();
for (int i = 0; i < input.length(); i++) {
String prefix = input.substring(i, i + 1);
Set<String> permutations = permutations(input.substring(i + 1));
returnSet.add(prefix);
returnSet.addAll(permutations);
Iterator<String> it = permutations.iterator();
while (it.hasNext()) {
returnSet.add(prefix + it.next());
}
}
return returnSet;
}

Related

Inserting letters into a string at every possible spot [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I am looking to insert a single letter one by one into every possible index of a string.
For example the string ry
Would go "ary" "bry" "cry" ... "zry" ... "ray" "rby" .... "rzy" ... "rya" "ryb"
I am not sure how to begin, any help?
Try this
System.out.println("originaltext".replaceAll(".{1}","$0ry"));
The above is using the String replaceAll(String regex, String replacement) method - "Replaces each substring of this string that matches the given regular expression with the given replacement."
".{1}" - The regular expression used to find exactly one occurrence({1}) of any character(.)
"$0ry" - The replacement string with "$0" for the matched value followed by the required characters(ry).
This is repeated for all matches!
Example Code
String originalString = /* your original string */;
char[] characters = /* array of characters you want to insert */;
Vector<String> newStrings = new Vector<>();
String newString;
for (int idx = 0; idx < originalString.length() + 1; ++idx) {
for (char ch : characters) {
newString = originalString.substring(0, idx)
+ ch
+ originalString.substring(idx, originalString.length());
newStrings.add(newString);
}
}
Explanation
Processing all cases:
In order to insert every single letter into an index in a string, you need a loop to iterate through every letter.
In order to insert a letter into every index in a string, you need a loop to iterate through every index in the string.
To do both at once, you should nest one loop inside the other. That way, every combination of an index and a character will be processed. In the problem you presented, it does not matter which loop goes inside the other--it will work either way.
(you actually have to iterate through every index in the string +1... I explain why below)
Forming the new string:
First, it is important to note the following:
What you want to do is not "insert a character into an index" but rather "insert a character between two indices". The distinction is important because you do not want to replace the previous character at that index, but rather move all characters starting at that index to the right by one index in order to make room for a new character "at that index."
This is why you must iterate through every index of the original string plus one. Because once you "insert" the character, the length of the new string is actually equal to originalString.length() + 1, i.e. there are n + 1 possible locations where you can "insert" the character.
Considering this, the way you actually form a new string (in the way you want to) is by getting everything to the left of your target index, getting everything to the right of your target index, and then concatenating them with the new character in between, e.g. leftSubstring + newCharacter + rightSubstring.
Now, it might seem that this would not work for the very first and very last index, because the leftSubstring and/or rightSubstring would be an empty string. However, string concatenation still works even with an empty string.
Notes about Example Code
characters can also be any collection that implements iterable. It does not have to be a primitive array.
characters does not have to contain primitive char elements. It may contain any type that can be concatenated with a String.
Note that the substring(int,int) method of String returns the substring including the character at beginIndex but not including the character at endIndex. One implication of this is that endIndex may be equal to string.length() without any problems.
Well you will need a loop for from i = 0 to your string's length
Then another loop for every character you want to insert - so if you want to keep creating new strings with every possible letter from A to Z make a loop from char A = 'a' to 'z' and keep increasing them ++A(this should work in java).
This should give you some ideas.
Supposing, for instance, you want them all in a list, you could do something like that (iterating all places to insert and iterating over all letters):
List<String> insertLetters(String s) {
List<String> list = new ArrayList<String>();
for (int i = 0; i <= s.length(); i++) {
String prefix = s.substring(0, i);
String postfix = s.substring(i, s.length());
for (char letter = 'a'; letter <= 'z'; letter++) {
String newString = prefix + letter + postfix;
list.add(newString);
}
}
return list;
}
String x = "ry";
char[] alphabets = "abcdefghijklmnopqrstuvwxyz".toCharArray();
String[] alphabets2 = new String[alphabets.length];
for (int i=0;i<alphabets.length;i++){
char z = alphabets[i];
alphabets2[i] = z + x;
}
for (String s: alphabets2
) {
System.out.println(s);
}
First of all you would need a Array of alphabet (Easier for you to continue).
I will not give you exact answer, but something to start, so you would learn.
int length = 0;
Arrays here
while(length <= 2) {
think what would be here: hint you put to index (length 0) the all alphabet and move to index 1 and then 2
}

Alternating string of char and int [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
when given a input string i am suppose to break it up into two groups
char
int.
with these two groups i want to create a new alternating string.
for example
abc1234defgh567jk89
will transform into
a1b2c3d5e6f7j8k9
notice that the digit 4,g,h has been discarded.
i figured that a queue can be implemented in this case.
queue1> abc
queue2> 123
index 0 to 2 is a char
index three is a int, so for queue 2 we only take in 3 values.
my question is there a more efficient data structure to perform this operation?
and during implementation, how to i compare to see if a particular value is a int or a char?
please advise.
Treating the string as an array of char integers would make this easier to compare, as you can do a simple comparision on the entry. If array[x]>64 it is a character else it is a number. You can use two pointers to do the interleaving. One for character and the other for integer. Find a character and then advance the integer pointer until it finds a match, then advance them as long as they are both true, then fast forward both of them. For example:
char array[]=(char *)string;
int letter=array[0];
int number=array[0];
// Initialize
while(number >= 64)
number++;
while (letter<64)
letter++;
//Now that the pointers are initialized, interleave them.
while(letter>=64 && number<64)
{
output[i++]=letter;
output[i++]=number;
number++;
letter++;
}
// Now you need to advance to the next batch, so you need to see the comparison false and then true again.
....
You are right, a queue is a good data structure for this problem. If, however, you want fancier methods at hand, a Linked List would be another very similar alternative.
To check if a particular value is a letter or a number, you can use the Character class. For example,
String sample = "hello1";
Character.isLetter( sample.charAt(0) ); // returns true
Character.isLetter( sample.charAt(5) ); // returns false
how to i compare to see if a particular value is a int or a char?
You can do something like this:
String string = "abc1234defgh567jk89";
for(int i=0; i<string.length;i++){
int c = (int)string.charAt(i);
boolean isChar = 97<=c&&c<=122 || 65<=c&&c<=90;
boolean isNum = 48<=c&&c<=57;
if(!isChar && !isNum){
throw new IllegalArgumentException("I don't know what you are")
}
}
About the datastructutures, personally I will use two single linked list, one for chars and one for numbers and every character will be stored in a different node. Why?, well if you store the characters (in general, I mean chars and ints) in groups of threes later you will have to add more code to split those groups and put chars and ints together, putting them in a linked list makes sense because
you can put as much nodes as you want (or memory lets you but let's assume is infinite)
data will be stored in order (which looks like some kind of requirement you have in order to display the output, also this discards trees and stacks(FILO))
since you only need to go forward when generating the output a double linked list will be over engineering.
To generate the output:
Having two datastructures let's you add another check like:
if(listChars.size() != listNums.size()){
throw new IllegalArgumentException("Wrong input!!!")
}
Additionally,
Reviewing the list will take you O(n) time, memory used will be O(n), reviewing both list will take you O(n/m) where m is the size of the initial group of chars.
You can do that like this:
Iterator<Character> iterChar = listChar.iterator();
Iterator<Integer> iterNum = listChar.iterator();
String result = "";
while(iterChar.hasNext() && iterNum.hasNext() ){
result+=iterChar.next()+iterNum.next();
}
Finally, you can use queues or linked list here both give you the same in this scenario
To check if the next char is a letter or number you can use this:
public static boolean isNumber(char c) { return c >= '0' && c <= '9'; }
public static boolean isLetter(char c) { return c >= 'a' && c <= 'z'; }
These functions find the index of the next number or letter, starting at pos i:
public static int nextNumber(String s, int i) {
while(i < s.length() && !isNumber(s.charAt(i))) i++;
return i;
}
public static int nextLetter(String s, int i) {
while(i < s.length() && !isLetter(s.charAt(i))) i++;
return i;
}
You don't really need a data structure, all you need is 3 pointers:
public static String alternate(String s){
// pointers
int start = 0, mid = 0, end = 0;
StringBuilder sb = new StringBuilder(s.length());
while(end < s.length()){
// E.g. for 'abc1234' {start, mid, end} = {0, 3, 7}
start = Math.min(nextLetter(s, end), nextNumber(s, end));
mid = Math.max(nextLetter(s, end), nextNumber(s, end));
end = Math.max(nextLetter(s, mid), nextNumber(s, mid));
for(int i = 0; i < Math.min(mid - start, end - mid); i++)
sb.append(s.charAt(start + i)).append(s.charAt(mid + i));
}
return sb.toString();
}
Running the example below outputs the desired result: a1b2c3d5e6f7j8k9
public static void main(String... args){
System.out.println(alternate("abc1234defgh567jk89"));
}

Finding the index of a permutation within a string

I just attempted a programming challenge, which I was not able to successfully complete. The specification is to read 2 lines of input from System.in.
A list of 1-100 space separated words, all of the same length and between 1-10 characters.
A string up to a million characters in length, which contains a permutation of the above list just once. Return the index of where this permutation begins in the string.
For example, we may have:
dog cat rat
abcratdogcattgh
3
Where 3 is the result (as printed by System.out).
It's legal to have a duplicated word in the list:
dog cat rat cat
abccatratdogzzzzdogcatratcat
16
The code that I produced worked providing that the word that the answer begins with has not occurred previously. In the 2nd example here, my code will fail because dog has already appeared before where the answer begins at index 16.
My theory was to:
Find the index where each word occurs in the string
Extract this substring (as we have a number of known words with a known length, this is possible)
Check that all of the words occur in the substring
If they do, return the index that this substring occurs in the original string
Here is my code (it should be compilable):
import java.io.BufferedReader;
import java.io.InputStreamReader;
public class Solution {
public static void main(String[] args) throws Exception {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String line = br.readLine();
String[] l = line.split(" ");
String s = br.readLine();
int wl = l[0].length();
int len = wl * l.length;
int sl = s.length();
for (String word : l) {
int i = s.indexOf(word);
int z = i;
//while (i != -1) {
int y = i + len;
if (y <= sl) {
String sub = s.substring(i, y);
if (containsAllWords(l, sub)) {
System.out.println(s.indexOf(sub));
System.exit(0);
}
}
//z+= wl;
//i = s.indexOf(word, z);
//}
}
System.out.println("-1");
}
private static boolean containsAllWords(String[] l, String s) {
String s2 = s;
for (String word : l) {
s2 = s2.replaceFirst(word, "");
}
if (s2.equals(""))
return true;
return false;
}
}
I am able to solve my issue and make it pass the 2nd example by un-commenting the while loop. However this has serious performance implications. When we have an input of 100 words at 10 characters each and a string of 1000000 characters, the time taken to complete is just awful.
Given that each case in the test bench has a maximum execution time, the addition of the while loop would cause the test to fail on the basis of not completing the execution in time.
What would be a better way to approach and solve this problem? I feel defeated.
If you concatenate the strings together and use the new string to search with.
String a = "dog"
String b = "cat"
String c = a+b; //output of c would be "dogcat"
Like this you would overcome the problem of dog appearing somewhere.
But this wouldn't work if catdog is a valid value too.
Here is an approach (pseudo code)
stringArray keys(n) = {"cat", "dog", "rat", "roo", ...};
string bigString(1000000);
L = strlen(keys[0]); // since all are same length
int indices(n, 1000000/L); // much too big - but safe if only one word repeated over and over
for each s in keys
f = -1
do:
f = find s in bigString starting at f+1 // use bigString.indexOf(s, f+1)
write index of f to indices
until no more found
When you are all done, you will have a series of indices (location of first letter of match). Now comes the tricky part. Since the words are all the same length, we're looking for a sequence of indices that are all spaced the same way, in the 10 different "collections". This is a little bit tedious but it should complete in a finite time. Note that it's faster to do it this way than to keep comparing strings (comparing numbers is faster than making sure a complete string is matched, obviously). I would again break it into two parts - first find "any sequence of 10 matches", then "see if this is a unique permutation".
sIndx = sort(indices(:))
dsIndx = diff(sIndx);
sequence = find {n} * 10 in dsIndx
for each s in sequence
check if unique permutation
I hope this gets you going.
Perhaps not the best optimized version, but how about following theory to give you some ideas:
Count length of all words in row.
Take random word from list and find the starting index of its first
occurence.
Take a substring with length counted above before and after that
index (e.g. if index is 15 and 3 words of 4 letters long, take
substring from 15-8 to 15+11).
Make a copy of the word list with earlier random word removed.
Check the appending/prepending [word_length] letters to see if they
match a new word on the list.
If word matches copy of list, remove it from copy of list and move further
If all words found, break loop.
If not all words found, find starting index of next occurence of
earlier random word and go back to 3.
Why it would help:
Which word you pick to begin with wouldn't matter, since every word
needs to be in the succcessful match anyway.
You don't have to manually loop through a lot of the characters,
unless there are lots of near complete false matches.
As a supposed match keeps growing, you have less words on the list copy left to compare to.
Can also keep track or furthest index you've gone to, so you can
sometimes limit the backwards length of picked substring (as it
cannot overlap to where you've already been, if the occurence are
closeby to each other).

Sudden slow-down and java.lang.OutOfMemoryError during Java string search

I am writing a program for pattern discovery in RNA sequences that mostly works. In order to find 'patterns' in the sequences, I am generating some possible patterns and scanning through the input file of all sequences for them (there's more to the algorithm, but this is the bit that is breaking). Possible patterns generated are of a specified length given by the user.
This works well for all sequence lengths up to 8 characters long. Then at 9, the program runs for an very long time, then gives a java.lang.OutOfMemoryError. After some debugging, I found that the weak point is the pattern generation method:
/* Get elementary pattern (ep) substrings, to later combine into full patterns */
public static void init_ep_subs(int length) {
ep_subs = new ArrayList<Substring>(); // clear static ep_subs data field
/* ep subs are of the form C1...C2...C3 where C1, C2, C3 are characters in the
alphabet and the whole length of the string is equal to the input parameter
'length'. The number of dots varies for different lengths.
The middle character C2 can occur instead of any dot, or not at all.*/
for (int i = 1; i < length-1; i++) { // for each potential position of C2
// for each alphabet character to be C1
for (int first = 0; first < alphabet.length; first++) {
// for each alphabet character to be C3
for (int last = 0; last < alphabet.length; last++) {
// make blank pattern, i.e. no C2
Substring s_blank = new Substring(-1, alphabet[first],
'0', alphabet[last]);
// get its frequency in the input string
s_blank.occurrences = search_sequences(s_blank.toString());
// if blank ep is found frequently enough in the input string, store it
if (s_blank.frequency()>=nP) ep_subs.add(s_blank);
// when C2 is present, for each character it could be
for (int mid = 0; mid < alphabet.length; mid++) {
// make pattern C1,C2,C3
Substring s = new Substring(i, alphabet[first],
alphabet[mid],
alphabet[last]);
// search input string for pattern s
s.occurrences = search_sequences(s.toString());
// if s is frequent enough, store it
if (s.frequency()>=nP) ep_subs.add(s);
}
}
}
}
}
Here's what happens: When I time the calls to search_sequences, they start out at around 40-100ms each and carry on that way for the first patterns. Then after a couple hundred patterns (around 'C.....G.C') those calls suddenly start to take about ten times as long, 1000-2000ms. After that, the times steadily increase until at about 12000ms ('C......TA') it gives this error:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.<init>(String.java:215)
at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
at java.nio.CharBuffer.toString(CharBuffer.java:1157)
at java.util.regex.Matcher.toMatchResult(Matcher.java:232)
at java.util.Scanner.match(Scanner.java:1270)
at java.util.Scanner.hasNextLine(Scanner.java:1478)
at PatternFinder4.search_sequences(PatternFinder4.java:217)
at PatternFinder4.init_ep_subs(PatternFinder4.java:256)
at PatternFinder4.main(PatternFinder4.java:62)
This is the search_sequences method:
/* Searches the input string 'sequences' for occurrences of the parameter string 'sub' */
public static ArrayList<int[]> search_sequences(String sub) {
/* arraylist returned holding int arrays with coordinates of the places where 'sub'
was found, i.e. {l,i} l = lines number, i = index within line */
ArrayList<int[]> occurrences = new ArrayList<int[]>();
s = new Scanner(sequences);
int line_index = 0;
String line = "";
while (s.hasNextLine()) {
line = s.nextLine();
pattern = Pattern.compile(sub);
matcher = pattern.matcher(line);
pattern = null; // all the =nulls were intended to help memory management, had no effect
int index = 0;
// for each occurrence of 'sub' in the line being scanned
while (matcher.find(index)) {
int start = matcher.start(); // get the index of the next occurrence
int[] occurrence = {line_index, start}; // make up the coordinate array
occurrences.add(occurrence); // store that occurrence
index = start+1; // start looking from after the last occurence found
}
matcher=null;
line=null;
line_index++;
}
s=null;
return occurrences;
}
I've tried the program on a couple of different computers of differing speeds, and while the actual times time complete search_sequence are smaller on faster computers, the relative times are the same; at around the same number of iterations, search_sequence starts taking ten times as long to complete.
I've tried googling about memory efficiency and speed of different input streams such as BufferedReader etc, but the general consensus seems to be that they are all roughly equivalent to Scanner. Do any of you have any advice about what this bug is or how I could try to figure it out myself?
If anyone wants to see any more of the code, just ask.
EDIT:
1 - The input file 'sequences' is 1000 protein sequences (each on one line) of varying lengths around a couple hundred characters. I should also mention this program will /only ever need to work/ up to patterns of length nine.
2 - Here are the Substring class methods used in the above code
static class Substring {
int residue; // position of the middle character C2
char front, mid, end; // alphabet characters for C1, C2 and C3
ArrayList<int[]> occurrences; // list of positions the substring occurs in 'sequences'
String string; // string representation of the substring
public Substring(int inresidue, char infront, char inmid, char inend) {
occurrences = new ArrayList<int[]>();
residue = inresidue;
front = infront;
mid = inmid;
end = inend;
setString(); // makes the string representation using characters and their positions
}
/* gets the frequency of the substring given the places it occurs in 'sequences'.
This only counts the substring /once per line ist occurs in/. */
public int frequency() {
return PatternFinder.frequency(occurrences);
}
public String toString() {
return string;
}
/* makes the string representation using the substring's characters and their positions */
private void setString() {
if (residue>-1) {
String left_mid = "";
for (int j = 0; j < residue-1; j++) left_mid += ".";
String right_mid = "";
for (int j = residue+1; j < length-1; j++) right_mid += ".";
string = front + left_mid + mid + right_mid + end;
} else {
String mid = "";
for (int i = 0; i < length-2; i++) mid += ".";
string = front + mid + end;
}
}
}
... and the PatternFinder.frequency method (called in Substring.frequency()) :
public static int frequency(ArrayList<int[]> occurrences) {
HashSet<String> lines_present = new HashSet<String>();
for (int[] occurrence : occurrences) {
lines_present.add(new String(occurrence[0]+""));
}
return lines_present.size();
}
What is alphabet? What kind of regexs are you giving it? Have you checked the number of occurrences you're storing? It's possible that simply storing the occurrences is enough to make it run out of memory, since you're doing an exponential number of searches.
It sounds like your algorithm has a hidden exponential resource usage. You need to rethink what you are trying to do.
Also, setting a local variable to null won't help since the JVM already does data flow and liveness analysis.
Edit: Here's a page that explains how even short regexes can take an exponential amount of time to run.
I can't spot an obvious memory leak, but your program does have a number of inefficiencies. Here are some recommendations:
Indent your code properly. It will make reading it, both for you and for others, much easier. In its current form it's very hard to read.
If you're referring to a member variable, prefix it with this., otherwise readers of code snippets won't know for sure what you're referring to.
Avoid static members and methods unless they're absolutely necessary. When referring to them, use the Classname.membername form, for the same reasons.
How is the code of frequency() different from just return occurrences.size()?
In search_sequences(), the regex string sub is a constant. You need to compile it only once, but you're recompiling it for every line.
Split the input string (sequences) into lines once and store them in an array or ArrayList. Don't re-split inside search_sequences(), pass the split collection in.
There are probably more things to fix, but this is the list that jumps out.
Fix all these and if you still have problems, you may need to use a profiler to find out what's happening.

Homework: Java IO Streaming and String manipulation

In Java,
I need to read lines of text from a file and then reverse each line, writing the reversed version into another file. I know how to read from one file and write to another. What I don't know how to do is manipulate the text so that "This is line 1" would be written into the second file as "1 enil si sihT"
since these are homeworks you are probably interested in your own implementation of reverse method.
The naive version visits the string backwards (from the last index to the index 0) while copying it in a StringBuilder:
public String reverse(String s) {
StringBuilder sb = new StringBuilder();
for (int i = s.length() - 1; i >= 0; i--) {
sb.append(s.charAt(i));
}
return sb.toString();
}
for example the String "hello":
H e l l o
0 1 2 3 4 // indexes for charAt()
the method start by the index 4 ('o') then the index 3 ('l') ... until 0 ('H').
StringBuilder buffer = new StringBuilder(theString);
return buffer.reverse().toString();
If this is homework, it would be better for you to understand how are data stored into the string it self.
A string may be represented as an array of characters
String line = // read line ....;
char [] data = line.toCharArray();
To reverse an array you have to swap the positions of the elements. The first in the last, the last in the first and so on.
int l = data.length;
char temp;
temp = data[0]; // put the first element in "temp" to avoid losing it.
data[0] = data[l - 1]; // put the last value in the first;
data[l - 1] = temp; // and the first in the last.
Continue with the rest of the elements ( hint use a loop ) in the array and then create a new String with the result:
String modifiedString = new String( data ); // where data is the reversed array.
If is not ( and you really just need to have the work done ) use:
StringBuilder.reverse()
Good luck.
String reversed = new StringBuilder(textLine).reverse().toString();
The provided answers all suggest using an already existing method, which is sound advice and usually more effective than writing your own.
Depending on the assignment, however, your teacher might expect you to write a method of your own. If that is the case, try using a for loop to walk through the string character by character, only instead of counting from zero and up, start counting from the last character index and down to zero, consecutively building the reversed string.
While we're feeding horrible, finished answers to the poor student, we might as well whet his appetite for the bizarre. If strings were guaranteed to be reasonably short and CPU time was no object, this is what I'd code:
public static String reverse(String str) {
if (str.length() == 0) return "";
else return reverse(str.substring(1)) + str.charAt(0);
}
(OK, I admit it: my current favorite language is Clojure, a Lisp!)
BONUS HOMEWORK: Figure out if, how and why this works!
java.lang.StringBuffer has a reverse method.

Categories