Java program malfunction - java

First half of my question: When I try to run my program it loads and loads forever; it never shows the results. Could someone check out my code and spot an error somewhere. This program is meant to find a start DNA codon ATG and keep looking until finding a stop codon TAA or TAG or TGA, and then print out the gene from start to stop. I'm using BlueJ.
Second half of my question: I'm supposed to write a program in which the following steps are needed to be taken:
To find the first gene, find the start codon ATG.
Next look immediately past ATG for the first occurrence of each of the three stop codons TAG, TGA, and TAA.
If the length of the substring between ATG and any of these three stop codons is a multiple of three, then a candidate for a gene is the start codon through the end of the stop codon.
If there is more than one valid candidate, the smallest such string is the gene. The gene includes the start and stop codon.
If no start codon was found, then you are done.
If a start codon was found, but no gene was found, then start searching for another gene via the next occurrence of a start codon starting immediately after the start codon that didn't yield a gene.
If a gene was found, then start searching for the next gene immediately after this found gene.
Note that according to this algorithm, for the string "ATGCTGACCTGATAG", ATGCTGACCTGATAG could be a gene, but ATGCTGACCTGA would not be, even though it is shorter, because another instance of 'TGA' is found first that is not a multiple of three away from the start codon.
In my assignment I'm asked to produce these methods as well:
Specifically, to implement the algorithm, you should do the following.
Write the method findStopIndex that has two parameters dna and index, where dna is a String of DNA and index is a position in the string. This method finds the first occurrence of each stop codon to the right of index. From those stop codons that are a multiple of three from index, it returns the smallest index position. It should return -1 if no stop codon was found and there is no such position. This method was discussed in one of the videos.
Write the void method printAll that has one parameter dna, a String of DNA. This method should print all the genes it finds in DNA. This method should repeatedly look for a gene, and if it finds one, print it and then look for another gene. This method should call findStopIndex. This method was also discussed in one of the videos.
Write the void method testFinder that will use the two small DNA example strings shown below. For each string, it should print the string, and then print the genes found in the string. Here is sample output that includes the two DNA strings:
Sample output is:
ATGAAATGAAAA
Gene found is:
ATGAAATGA
DNA string is:
ccatgccctaataaatgtctgtaatgtaga
Genes found are:
atgccctaa
atgtctgtaatgtag
DNA string is:
CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA
Genes found are:
ATGTAA
ATGAATGACTGATAG
ATGCTATGA
ATGTGA
I've thought it through and found this bit of code to be close to working order. I just need for my output to produce the results asked for in the instructions. Hopefully this isn't too messy, I'm just at a loss as to how to look for a stop codon after the start codon and then how I can grab the gene sequence. I'm also hoping to understand how to get the closest sequence of genes by finding which of the three tags (tag, tga, taa) is closer to atg. I know this is alot but hopefully it all makes sense.
import edu.duke.*;
import java.io.*;
public class FindMultiGenes {
public String findGenes(String dnaOri) {
String gene = new String();
String dna = dnaOri.toLowerCase();
int start = -1;
while(true){
start = dna.indexOf("atg", start);
if (start == -1) {
break;
}
int stop = findStopCodon(dna, start);
if(stop > start){
String currGene = dnaOri.substring(start, stop+3);
System.out.println("From: " + start + " to " + stop + "Gene: "
+currGene);}
}
return gene;
}
private int findStopCodon(String dna, int start){
for(int i = start + 3; i<dna.length()-3; i += 3){
String currFrameString = dna.substring(i, i+3);
if(currFrameString.equals("TAG")){
return i;
} else if( currFrameString.equals("TGA")){
return i;
} else if( currFrameString.equals("TAA")){
return i;
}
}
return -1;
}
public void testing(){
FindMultiGenes FMG = new FindMultiGenes();
String dna =
"CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA";
FMG.findGenes(dna);
System.out.println("DNA string is: " + dna);
}
}

Change your line start = dna.indexOf("atg", start); to
start = dna.indexOf("atg", start + 1);
What is currently happening is you find the "atg" at index k and in the next run search the string for the next "atg" from k onwards. That finds the next match at the exact same location since the start location is inclusive. Therefore you are going to find the same index k over and over again and will never halt.
By increasing the index by 1 you jump over the currently found index k and start searching for next match from k+1 onwards.

This program is meant to find a start DNA codon ATG and keep looking until finding a stop codon TAA or TAG or TGA, and then print out the gene from start to stop.
Since the first search always starts from 0 you can just set the start index there, then search the stop codon from the result. Here I do it with 1 of the stop codons:
public static void main(String[] args) {
String dna = "CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA";
String sequence = dna.toLowerCase();
int index = 0;
int newIndex = 0;
while (true) {
index = sequence.indexOf("atg", index);
if (index == -1)
return;
newIndex = sequence.indexOf("tag", index + 3);
if (newIndex == -1) // Check needed only if a stop codon is not guaranteed for each start codon.
return;
System.out.println("From " + (index + 3) + " to " + newIndex + " Gene: " + sequence.substring(index + 3, newIndex));
index = newIndex + 3;
}
}
Output:
From 4 to 7 Gene: taa
From 13 to 22 Gene: aatgactga
Also, you can use a regex to do a lot of the work for you:
public static void main(String[] args) {
String dna = "CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA";
Pattern p = Pattern.compile("ATG([ATGC]+?)TAG");
Matcher m = p.matcher(dna);
while (m.find())
System.out.println("From " + m.start(1) + " to " + m.end(1) + " Gene: " + m.group(1));
}
Output:
From 4 to 7 Gene: TAA
From 13 to 22 Gene: AATGACTGA

Related

Simple Java question using while loop, substring, and indexOf

I'm working on an exercise for learning Java where I am supposed to write a method to print to the screen all items that come after the word "category:". This is my attempt at it:
public static void main(String[] args) {
String str = "We have a large inventory of things in our warehouse falling in "
+ "the category:apperal and the slightly "
+ "more in demand category:makeup along with the category:furniture and _.";
printCategories(str);
}
public static void printCategories(String passedString) {
int startOfSubstring = passedString.indexOf(":") + 1;
int endOfSubstring = passedString.indexOf(" ", startOfSubstring);
String categories = passedString.substring(startOfSubstring,endOfSubstring);
while(startOfSubstring > 0) {
System.out.println(categories);
startOfSubstring = passedString.indexOf((":") + 1, passedString.indexOf(categories));
System.out.println(startOfSubstring);
System.out.println(categories);
}
}
So the program should print:
apperal
makeup
furniture
My attempt is that the program should print the substring where it finds the starting index as ":" and the ending index as " ". Then it does the same thing again, only except from starting the very beginning of the variable str, this time it starts from the beginning of the last category found.
Once there are no more ":" to be found, the indexOf (part of startOfSubstring) will return -1 and the loop will terminate. However, after printing the first category it keeps returning -1 and terminating before finding the next category.
The two lines:
System.out.println(startOfSubstring);
System.out.println(categories);
Confirm that it is returning -1 after printing the first category, and the last line confirms that the categories variable is still defined as "apperal". If I comment out the line:
startOfSubstring = passedString.indexOf((":") + 1, passedString.indexOf(categories));
It returns the startOfSubstring as 77. So it is something to do with that line and attempting to change the start of search position in the indexOf method that is causing it to return -1 prematurely, but I cannot figure out why this is happening. I've spent the last few hours trying to figure it out...
Please help :(
There are a couple of issues with the program:
You're searching passedString for (":") + 1 which is the string ":1", probably not what you want.
You should evaluate endOfSubstring and categories inside the loop.
This is probably close to what you want:
public static void printCategories(String passedString) {
int startOfSubstring = passedString.indexOf(":") + 1;
while(startOfSubstring > 0) {
int endOfSubstring = passedString.indexOf(" ", startOfSubstring);
// If "category:whatever" can appear at the end of the string
// without a space, adjust endOfSubstring here.
String categories = passedString.substring(startOfSubstring, endOfSubstring);
// Do something with categories here, maybe print it?
// Find next ":" starting with end of category string.
startOfSubstring = passedString.indexOf(":", endOfSubstring) + 1;
}
}
I have corrected (in a comment) where you set the new value of startOfSubstring
while(startOfSubstring > 0) { // better if you do startOfSubstring != -1 IMO
System.out.println(categories);
// this should be startOfSubstring = passedString.indexOf(":", startOfSubstring +1);
startOfSubstring = passedString.indexOf((":") + 1, passedString.indexOf(categories));
System.out.println(startOfSubstring);
System.out.println(categories);
}

How many times the word is used on the html page

I have a method that should return an integer which is the number of uses of the searchWord in the text of an HTML document:
public int searchForWord(String searchWord) {
int count = 0;
if(this.htmlDocument == null){
System.out.println("ERROR! Call crawl() before performing analysis on the document");
}
System.out.println("Searching for the word " + searchWord + "...");
String bodyText = this.htmlDocument.body().text();
if (bodyText.toLowerCase().contains(searchWord.toLowerCase())){
count++;
}
return count;
}
But my method always returns count=1, even if the word is used several times. I understand that the error should be obvious, but I’m stuck and I don’t see it.
You are currently only checking once that the text contains the search word, so the count will always be either 0 or 1. To find the total count, keep looping using String#indexOf(str, fromIndex) while the String can be found using the second argument that indicates the index to start searching from.
public int searchForWord(String searchWord) {
int count = 0;
if(this.htmlDocument == null){
System.out.println("ERROR! Call crawl() before performing analysis on the document");
}
System.out.println("Searching for the word " + searchWord + "...");
String bodyText = this.htmlDocument.body().text();
for(int idx = -1; (idx = bodyText.indexOf(searchWord, idx + 1)) != -1; count++);
return count;
}
According to the Java docs String#contains:
Returns true if and only if this string contains the specified sequence of char values.
You're asking if the word you're looking for is contained in the document, which it is.
You could:
Split the text on words (splitting it by spaces) and then count how many times it appears
Iterate the String using String#indexOf starting on index 0 and then from last index you found until the end of the String.
Iterate the String using contains but starting from a certain index (doing this logic yourself).
I'd go for the 2nd approach as it seems like the easiest one.
These are only conditional statements, you aren't looping through the HTML text, therefor, if it finds the instance of searchWord in bodyText, it'll increment it, and then exit the method with a value of 1. I suggest looping through every word in the html, adding it to an array, and counting it that way using something like this:
char[] bodyTextA = bodyText.toCharArray();
Or keep it in a string array and split it by a space, or new line, or whatever criteria you have. Example of space:
//puts hello, i'm, your, and string into their own array slots in the array
/split
str = "Hello I'm your String";
String[] split = str.split("\\s+");
Your issue here is that the if statement is checking if the text contains the word and the increments your count variable. So even if it contains the word multiple time, your logic goes basically, if it contains it at all, increase count by one. You will have to rewrite your code to check for multiple occurrences of the word. There are many ways you can go about this, you could loop through the entire body text, you could split the body text into an array of words and check that, or you could remove the search word from the text each time you find it and keep checking until it no longer contains the search word.
You can use indexOf(,) with an index for the last found word
public int searchForWord(String searchWord) {
int count = 0;
if(this.htmlDocument == null){
System.out.println("ERROR! Call crawl() before performing analysis on the document");
}
System.out.println("Searching for the word " + searchWord + "...");
String bodyText = this.htmlDocument.body().text();
int index = 0;
while ((index = bodyText.indexOf(searchWord, index + 1)) != -1) {
count++;
}
return count;
}

Java Searching through a String for a valid character sequence

I just took a codility test and was wondering why my solution only scored 37/100. The problem was that you were given a String and had to search through it for valid passwords.
Here are the rules:
1) A valid password starts with a capital letter and cannot contain any numbers. The input is restricted to any combination of a-z, A-Z and 0-9.
2)The method they wanted you to create is suppose to return the size of the largest valid password. So for example if you input "Aa8aaaArtd900d" the number 4 is suppose to be outputted by the solution method. If no valid String is found the method should return -1
I cannot seem to figure out where I went wrong in my solution. Any help would be greatly appreciated! Also any suggestions on how to better test code for something like this would be greatly appreciated.
class Solution2 {
public int solution(String S) {
int first = 0;
int last = S.length()-1;
int longest = -1;
for(int i = 0; i < S.length(); i++){
if(Character.isUpperCase(S.charAt(i))){
first = i;
last = first;
while(last < S.length()){
if(Character.isDigit(S.charAt(last))){
i = last;
break;
}
last++;
}
longest = Math.max(last - first, longest);
}
}
return longest;
}
}
added updated solution, any thoughts to optimize this further?
Your solution is too complicated. Since you are not asked to find the longest password, only the length of the longest password, there is no reason to create or store strings with that longest password. Therefore, you do not need to use substring or an array of Strings, only int variables.
The algorithm for finding the solution is straightforward:
Make an int pos = 0 variable representing the current position in s
Make a loop that searches for the next candidate password
Starting at position pos, find the next uppercase letter
If you hit the end of line, exit
Starting at the position of the uppercase letter, find the next digit
If you hit the end of line, stop
Find the difference between the position of the digit (or the end of line) and the position of the uppercase letter.
If the difference is above max that you have previously found, replace max with the difference
Advance pos to the position of the last letter (or the end of line)
If pos is under s.length, continue the loop at step 2
Return max.
Demo.

Finding the index of a permutation within a string

I just attempted a programming challenge, which I was not able to successfully complete. The specification is to read 2 lines of input from System.in.
A list of 1-100 space separated words, all of the same length and between 1-10 characters.
A string up to a million characters in length, which contains a permutation of the above list just once. Return the index of where this permutation begins in the string.
For example, we may have:
dog cat rat
abcratdogcattgh
3
Where 3 is the result (as printed by System.out).
It's legal to have a duplicated word in the list:
dog cat rat cat
abccatratdogzzzzdogcatratcat
16
The code that I produced worked providing that the word that the answer begins with has not occurred previously. In the 2nd example here, my code will fail because dog has already appeared before where the answer begins at index 16.
My theory was to:
Find the index where each word occurs in the string
Extract this substring (as we have a number of known words with a known length, this is possible)
Check that all of the words occur in the substring
If they do, return the index that this substring occurs in the original string
Here is my code (it should be compilable):
import java.io.BufferedReader;
import java.io.InputStreamReader;
public class Solution {
public static void main(String[] args) throws Exception {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String line = br.readLine();
String[] l = line.split(" ");
String s = br.readLine();
int wl = l[0].length();
int len = wl * l.length;
int sl = s.length();
for (String word : l) {
int i = s.indexOf(word);
int z = i;
//while (i != -1) {
int y = i + len;
if (y <= sl) {
String sub = s.substring(i, y);
if (containsAllWords(l, sub)) {
System.out.println(s.indexOf(sub));
System.exit(0);
}
}
//z+= wl;
//i = s.indexOf(word, z);
//}
}
System.out.println("-1");
}
private static boolean containsAllWords(String[] l, String s) {
String s2 = s;
for (String word : l) {
s2 = s2.replaceFirst(word, "");
}
if (s2.equals(""))
return true;
return false;
}
}
I am able to solve my issue and make it pass the 2nd example by un-commenting the while loop. However this has serious performance implications. When we have an input of 100 words at 10 characters each and a string of 1000000 characters, the time taken to complete is just awful.
Given that each case in the test bench has a maximum execution time, the addition of the while loop would cause the test to fail on the basis of not completing the execution in time.
What would be a better way to approach and solve this problem? I feel defeated.
If you concatenate the strings together and use the new string to search with.
String a = "dog"
String b = "cat"
String c = a+b; //output of c would be "dogcat"
Like this you would overcome the problem of dog appearing somewhere.
But this wouldn't work if catdog is a valid value too.
Here is an approach (pseudo code)
stringArray keys(n) = {"cat", "dog", "rat", "roo", ...};
string bigString(1000000);
L = strlen(keys[0]); // since all are same length
int indices(n, 1000000/L); // much too big - but safe if only one word repeated over and over
for each s in keys
f = -1
do:
f = find s in bigString starting at f+1 // use bigString.indexOf(s, f+1)
write index of f to indices
until no more found
When you are all done, you will have a series of indices (location of first letter of match). Now comes the tricky part. Since the words are all the same length, we're looking for a sequence of indices that are all spaced the same way, in the 10 different "collections". This is a little bit tedious but it should complete in a finite time. Note that it's faster to do it this way than to keep comparing strings (comparing numbers is faster than making sure a complete string is matched, obviously). I would again break it into two parts - first find "any sequence of 10 matches", then "see if this is a unique permutation".
sIndx = sort(indices(:))
dsIndx = diff(sIndx);
sequence = find {n} * 10 in dsIndx
for each s in sequence
check if unique permutation
I hope this gets you going.
Perhaps not the best optimized version, but how about following theory to give you some ideas:
Count length of all words in row.
Take random word from list and find the starting index of its first
occurence.
Take a substring with length counted above before and after that
index (e.g. if index is 15 and 3 words of 4 letters long, take
substring from 15-8 to 15+11).
Make a copy of the word list with earlier random word removed.
Check the appending/prepending [word_length] letters to see if they
match a new word on the list.
If word matches copy of list, remove it from copy of list and move further
If all words found, break loop.
If not all words found, find starting index of next occurence of
earlier random word and go back to 3.
Why it would help:
Which word you pick to begin with wouldn't matter, since every word
needs to be in the succcessful match anyway.
You don't have to manually loop through a lot of the characters,
unless there are lots of near complete false matches.
As a supposed match keeps growing, you have less words on the list copy left to compare to.
Can also keep track or furthest index you've gone to, so you can
sometimes limit the backwards length of picked substring (as it
cannot overlap to where you've already been, if the occurence are
closeby to each other).

In following program, what is the purpose of the while loop?

There are no problems with the compilation, but whether or not I have the while loop in place or not, the result is the same. I can't understand why the while loop is included. BTW, this is just an example program from the Java SE tutorial:
public class ContinueWithLabelDemo {
public static void main(String[] args) {
String searchMe = "Look for a substring in me";
String substring = "sub";
boolean foundIt = false;
int max = searchMe.length() - substring.length();
test:
for (int i = 0; i <= max; i++) {
int n = substring.length();
int j = i;
int k = 0;
while (n-- != 0) { // WTF???
if (searchMe.charAt(j++) != substring.charAt(k++)) {
continue test;
}
}
foundIt = true;
break test;
}
System.out.println(foundIt ? "Found it" : "Didn't find it");
}
}
You can replace your
while (n-- != 0) { // WTF???
with
System.out.println("outside loop");
while (n-- != 0) { // WTF???
System.out.println("inside loop: comparing "
+ searchMe.charAt(j) + ":" + substring.charAt(k));
to see how this example works. Below is little explanation.
This code is searching for substring in searchMe string. Take a look at this example:
Look for a substring in me
^
sub
If you compare characters at position 0 in searchMe and substring you will notice that they are not the same L != s so we can skip matching rest of letters and go to next position (that is the purpose of continue test;)
Look for a substring in me
^
sub
So now we will try compare next letter with first letter of searchMe with first letter of substring. This time we get o!=s so there is no way that substring starts in this place, lets carry on.
After few comparisons we finally found promising place
Look for a substring in me
^
sub
where first letter of substring is the same as current letter in searchMe (s==s) so we wont jump from while loop yet and will try to check next letter. And we have another success
Look for a substring in me
^
sub
because u==u, so we will continue our loop until we iterate over our entire substring which can happen in next step.
Look for a substring in me
^
sub
And this time we compared b with b. Since they are equal and we don't have more letters in substring to check we can set value of foundIt to true and brake test for loop.
And that is the end.
If you remove while from your code you will get positive response as soon as you will find first character that will match first letter of substring in your case in after checking Look for a program will match s with first letter on substring which will also be s.
While loop is used here to iterate over entire substring and only in case of fail in matching corresponding characters we will move searching one place forward. If we would ignore this inner loop and just iterate over entire data we can ignore some positive results like in case where we would look for aab in aaab String. Take a look
aaab
aab
^^
^ will match but after them we will have to match a with b which will fail. Without inner while loop we would probably start another match from last checked position that failed which would be
aaab
aab
^
This time we also failed to find match for substring so we skipped a*aab* part.

Categories