search for words in an array of letters

search for words in an array of letters - java

I have a arrayList called dictionary which includes the following letters.
g,t,c,a,n,d,l,e,t,j,a,q.
I want the output to be for example,
2candle
3and etc.
the number is an offset from the start of the array being searched.
I want the output be list of locations of matches, each consisting of an offset from the beginning of the text and the string found.
Please HELP!!

If we think about this type of problem, the key is to figure out all of the different possibilities there are for places where you will find a word. Here is the skeleton of a method that I wrote that would go about this process:
public static String findWords(final char[] characters) {
String toRet = "";
// First iterate through every character in the array characters.
for (int i = 0; i < characters.length; i++) {
/*
* Then at each step in this loop, check all possible word
* combinations to see if it's a word. For example, check and see if
* characters[i], characters[i+1] forms a word. Then check and see
* if the word made by adding together characters[i],
* characters[i+1], and characters[i+2] is a word. Then check and
* see if the word formed by adding together the characters
* characters[i], characters[i+1], characters[i+2], characters[i+3]
* is a word.
*/
// Doing the above requires a nested loop inside of the original
// loop.
for (int j = 0; j < characters.length - i; j++) {
// When you do find a word that is formed, then go ahead and add
// to your string toRet the details about the word formed.
}
}
return toRet;
}
This doesn't fully answer your question, but I hope it

Related

How to read and return the second index of an array if first index matches a string?

I'm trying to write a translate method using the following parameters. However, every time I run the method it skips the first if statement and goes right to the second for loop.
/**
Translates a word according to the data in wordList then matches the case.
The parameter wordList contains the mappings for the translation. The data is
organized in an ArrayList containing String arrays of length 2. The first
cell (index 0) contains the word in the original language, called the key,
and the second cell (index 1) contains the translation.
It is assumed that the items in the wordList are sorted in ascending order
according to the keys in the first cell.
#param word
The word to translate.
#param wordList
An ArrayList containing the translation mappings.
#return The mapping in the wordList with the same case as the original. If no
match is found in wordList, it returns a string of Config.LINE_CHAR of the same length as word.
*/
public static String translate(String word, ArrayList<String[]> wordList) {
String newWord = "";
int i = 0;
for (i = 0; i < wordList.size(); i++) {
word = matchCase(wordList.get(i)[0], word); //make cases match
if (word.equals(wordList.get(i)[0])) { //check each index at 0
newWord = wordList.get(i)[1]; //update newWord to skip second for loop
return wordList.get(i)[1];
}
}
if (newWord == "") {
for (i = 0; i < word.length(); i++) {
newWord += Config.LINE_CHAR;
}
}
return newWord;
}
For the files I'm running, each word should have a translated word so no Config.LINE_CHAR should be printed. But this is the only thing that prints. How do I fix this.

You are initializing newWord to the value "". The only time newWord can possibly change is in the first loop, where it is promptly followed by a return statement, exiting your method. The only way your if statement can be reached is if you didn't return during the first loop, so if it reaches that if statement, then newWord must be unchanged since its initial assignment of "".
Some unrelated advice: You should use the equals operator when comparing strings. For example, if ("".equals(newWord)). Otherwise, you're comparing the memory address of the two String objects rather than their values.
You may need to share your matchCase method to ensure all bugs are addressed, though.

Searching for words in Multi-Dimension Array,Java

I want to search for words in a Word Puzzle in Java.
The search,as stated is in horizontal,vertical and Diagonal.
I created an Array, but I just don't know how to create a String, and search for words in my String. I need to know how can I have a String that keeps all the values of the table, and how can I be able to type a word, and search for it here.
I know that the search of the words is done with indexOf Function,but I don't know how to perform it.
import java.util.Scanner;
public class main {
public static void main(String[] args) {
// TODO Auto-generated method stub
int IntegerPosition;
int IntegerPosition2;
String position="";
String word="";
Scanner s = new Scanner(System.in);
String content="";
String[][] sopa = {
{"X","F","E","K","J","U","I","R","S","H"},
{"Z","H","S","W","E","R","T","G","O","T"},
{"B","R","A","B","F","B","P","M","V","U"},
{"D","W","E","R","O","O","J","L","L","W"},
{"U","T","O","N","I","R","O","B","C","R"},
{"O","P","R","O","V","I","I","K","V","B"},
{"N","I","Q","U","E","N","T","N","S","A"},
{"O","V","U","L","R","O","S","S","O","T"},
{"A","S","A","X","J","T","R","R","I","T"},
{"R","K","M","E","P","U","B","O","T","A"}
};
for (int i = 0; i < sopa[0].length; i++){
for(int j = 0; j < sopa[i].length; j++){
content += sopa[i][j];
}
System.out.println(content);
content = "";
}
System.out.println("Type the word you are looking for");
word = s.next();
for (int i = 0; i < sopa[0].length-1; i++){//t1.length
for(int j = 0; j < sopa[i].length-1; j++){
}
}
System.out.println(content);
content = "";
}
}

First, you should declare what "finding a word" means. I guess you want to find the sequence of letters in each row and column. What about diagonal? Backwards? Wrapping around?
Two solutions come to mind:
Use a String index:
Build a String of all characters. This needs to be done for each direction (horizontal, vertical, diagonal), but only in the forward order if you reverse the search term for backwards search. For an efficient implementation StringBuilder is your friend.
Use String.indexOf to find occurences of the term in your index. Finally you have to calculate row and column from the String position and, if wrapping is not allowed, check if the word crosses any row/column boundary.
I'd use this if I had to look for many terms.
Use the array
Also for each direction (horizontal, vertical, diagonal)
Look for occurences of the search term's first letter in your array (by simple iteration). Note that you can stop when the term would not fit the row/column, so for a 6-letter word, you can skip the 5 last rows/columns.
If you found an anchor (i.e. matching letter), check the subsequent letters of the word. Cancel on mismatch, otherwise you have found an occurence.
For a more sophisticated matching implementation, the Boyer-Moore algorithm may be of interest.

Recursive backtracking to create permutations of given string

I am currently working on a programming assignment where the user inputs a word
i.e. "that"
and the program should return all valid words that can be made from the given string
i.e. [that, hat, at]
The issue I am having is that the resulting words should be created using a recursive method that checks if the prefix is valid.
i.e. if the given word is "kevin" once the program tries the combination "kv" it should know that no words start with kv and try the next combination in order to save time.
Currently my code just creates ALL permutations which takes a relatively large amount of time when the input is larger than 8 letter.
protected static String wordCreator(String prefix, String letters) {
int length = letters.length();
//if each character has been used, return the current permutation of the letters
if (length == 0) {
return prefix;
}
//else recursively call on itself to permute possible combinations by incrementing the letters
else {
for (int i = 0; i < length; i++) {
words.add(wordCreator(prefix + letters.charAt(i), letters.substring(0, i) + letters.substring(i+1, length)));
}
}
return prefix;
}
If anyone could help me figure this out I'd be much appreciated. I am also using an AVL tree to store the dictionary words for validation incase that is needed.

Sudden slow-down and java.lang.OutOfMemoryError during Java string search

I am writing a program for pattern discovery in RNA sequences that mostly works. In order to find 'patterns' in the sequences, I am generating some possible patterns and scanning through the input file of all sequences for them (there's more to the algorithm, but this is the bit that is breaking). Possible patterns generated are of a specified length given by the user.
This works well for all sequence lengths up to 8 characters long. Then at 9, the program runs for an very long time, then gives a java.lang.OutOfMemoryError. After some debugging, I found that the weak point is the pattern generation method:
/* Get elementary pattern (ep) substrings, to later combine into full patterns */
public static void init_ep_subs(int length) {
ep_subs = new ArrayList<Substring>(); // clear static ep_subs data field
/* ep subs are of the form C1...C2...C3 where C1, C2, C3 are characters in the
alphabet and the whole length of the string is equal to the input parameter
'length'. The number of dots varies for different lengths.
The middle character C2 can occur instead of any dot, or not at all.*/
for (int i = 1; i < length-1; i++) { // for each potential position of C2
// for each alphabet character to be C1
for (int first = 0; first < alphabet.length; first++) {
// for each alphabet character to be C3
for (int last = 0; last < alphabet.length; last++) {
// make blank pattern, i.e. no C2
Substring s_blank = new Substring(-1, alphabet[first],
'0', alphabet[last]);
// get its frequency in the input string
s_blank.occurrences = search_sequences(s_blank.toString());
// if blank ep is found frequently enough in the input string, store it
if (s_blank.frequency()>=nP) ep_subs.add(s_blank);
// when C2 is present, for each character it could be
for (int mid = 0; mid < alphabet.length; mid++) {
// make pattern C1,C2,C3
Substring s = new Substring(i, alphabet[first],
alphabet[mid],
alphabet[last]);
// search input string for pattern s
s.occurrences = search_sequences(s.toString());
// if s is frequent enough, store it
if (s.frequency()>=nP) ep_subs.add(s);
}
}
}
}
}
Here's what happens: When I time the calls to search_sequences, they start out at around 40-100ms each and carry on that way for the first patterns. Then after a couple hundred patterns (around 'C.....G.C') those calls suddenly start to take about ten times as long, 1000-2000ms. After that, the times steadily increase until at about 12000ms ('C......TA') it gives this error:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.<init>(String.java:215)
at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
at java.nio.CharBuffer.toString(CharBuffer.java:1157)
at java.util.regex.Matcher.toMatchResult(Matcher.java:232)
at java.util.Scanner.match(Scanner.java:1270)
at java.util.Scanner.hasNextLine(Scanner.java:1478)
at PatternFinder4.search_sequences(PatternFinder4.java:217)
at PatternFinder4.init_ep_subs(PatternFinder4.java:256)
at PatternFinder4.main(PatternFinder4.java:62)
This is the search_sequences method:
/* Searches the input string 'sequences' for occurrences of the parameter string 'sub' */
public static ArrayList<int[]> search_sequences(String sub) {
/* arraylist returned holding int arrays with coordinates of the places where 'sub'
was found, i.e. {l,i} l = lines number, i = index within line */
ArrayList<int[]> occurrences = new ArrayList<int[]>();
s = new Scanner(sequences);
int line_index = 0;
String line = "";
while (s.hasNextLine()) {
line = s.nextLine();
pattern = Pattern.compile(sub);
matcher = pattern.matcher(line);
pattern = null; // all the =nulls were intended to help memory management, had no effect
int index = 0;
// for each occurrence of 'sub' in the line being scanned
while (matcher.find(index)) {
int start = matcher.start(); // get the index of the next occurrence
int[] occurrence = {line_index, start}; // make up the coordinate array
occurrences.add(occurrence); // store that occurrence
index = start+1; // start looking from after the last occurence found
}
matcher=null;
line=null;
line_index++;
}
s=null;
return occurrences;
}
I've tried the program on a couple of different computers of differing speeds, and while the actual times time complete search_sequence are smaller on faster computers, the relative times are the same; at around the same number of iterations, search_sequence starts taking ten times as long to complete.
I've tried googling about memory efficiency and speed of different input streams such as BufferedReader etc, but the general consensus seems to be that they are all roughly equivalent to Scanner. Do any of you have any advice about what this bug is or how I could try to figure it out myself?
If anyone wants to see any more of the code, just ask.
EDIT:
1 - The input file 'sequences' is 1000 protein sequences (each on one line) of varying lengths around a couple hundred characters. I should also mention this program will /only ever need to work/ up to patterns of length nine.
2 - Here are the Substring class methods used in the above code
static class Substring {
int residue; // position of the middle character C2
char front, mid, end; // alphabet characters for C1, C2 and C3
ArrayList<int[]> occurrences; // list of positions the substring occurs in 'sequences'
String string; // string representation of the substring
public Substring(int inresidue, char infront, char inmid, char inend) {
occurrences = new ArrayList<int[]>();
residue = inresidue;
front = infront;
mid = inmid;
end = inend;
setString(); // makes the string representation using characters and their positions
}
/* gets the frequency of the substring given the places it occurs in 'sequences'.
This only counts the substring /once per line ist occurs in/. */
public int frequency() {
return PatternFinder.frequency(occurrences);
}
public String toString() {
return string;
}
/* makes the string representation using the substring's characters and their positions */
private void setString() {
if (residue>-1) {
String left_mid = "";
for (int j = 0; j < residue-1; j++) left_mid += ".";
String right_mid = "";
for (int j = residue+1; j < length-1; j++) right_mid += ".";
string = front + left_mid + mid + right_mid + end;
} else {
String mid = "";
for (int i = 0; i < length-2; i++) mid += ".";
string = front + mid + end;
}
}
}
... and the PatternFinder.frequency method (called in Substring.frequency()) :
public static int frequency(ArrayList<int[]> occurrences) {
HashSet<String> lines_present = new HashSet<String>();
for (int[] occurrence : occurrences) {
lines_present.add(new String(occurrence[0]+""));
}
return lines_present.size();
}

What is alphabet? What kind of regexs are you giving it? Have you checked the number of occurrences you're storing? It's possible that simply storing the occurrences is enough to make it run out of memory, since you're doing an exponential number of searches.
It sounds like your algorithm has a hidden exponential resource usage. You need to rethink what you are trying to do.
Also, setting a local variable to null won't help since the JVM already does data flow and liveness analysis.
Edit: Here's a page that explains how even short regexes can take an exponential amount of time to run.

I can't spot an obvious memory leak, but your program does have a number of inefficiencies. Here are some recommendations:
Indent your code properly. It will make reading it, both for you and for others, much easier. In its current form it's very hard to read.
If you're referring to a member variable, prefix it with this., otherwise readers of code snippets won't know for sure what you're referring to.
Avoid static members and methods unless they're absolutely necessary. When referring to them, use the Classname.membername form, for the same reasons.
How is the code of frequency() different from just return occurrences.size()?
In search_sequences(), the regex string sub is a constant. You need to compile it only once, but you're recompiling it for every line.
Split the input string (sequences) into lines once and store them in an array or ArrayList. Don't re-split inside search_sequences(), pass the split collection in.
There are probably more things to fix, but this is the list that jumps out.
Fix all these and if you still have problems, you may need to use a profiler to find out what's happening.

indexOf (String str) - equals string to other string

I need to write method that will chek "String str" on other string, and return the index that the str starts.
That's sound like homework, and it is some of homework but for my use to learn for a test...
i've tried:
public int IndexOf (String str) {
for (i= 0;i<_st.length();i++)
{
if (_st.charAt(i) == str.charAt(i)) {
i++;
if (_st.charAt(i) == str.charAt(i)) {
return i;
}
}
}
return -1;
}
but i dont get the right return. why? i'm on the right way or don't even close?

I am afraid, you are not close.
Here's what you have to do:
Loop on the characters of the string (the one on which you are supposed to do an indexOf, I will call this the master) (you are going this right)
For every character check whether your other string's character and this character are the same.
If they are (a potential start of the same sequence) check whether the next characters in the master match with your String to check (You might want to loop through the elements of the string and check one by one).
If they don't match, continue with the characters in the master string
Something like:
Loop master string
for every character (using index i, lets say)
check whether this is same as first character of the other string
if it is
//potential match
loop through the characters in the child string (lets say using index j)
match them with the consecutive characters in the master string
(something like master[j+i] == sub[j])
If everything match, 'i' is what you want
otherwise, continue with the master, hoping you find a match
Some other points:
In java, method names start with a
lower case letter by convention
(meaning, the compiler won't
complain, but your fellow programmers
may). So IndexOf should actually be
indexOf
Having instance variables
(class level variables) start with a
_ (as in _st) is not a really good
practice. If your professor insists,
you may not have many options, but
keep this in mind)

Not really very close, I'm afraid. What that code basically does is check there if the two strings have two characters in the same positions at any point and, if so, returns the index of the second of those characters. E.g., if _str is "abcdefg" and str is "12cd45", you'll return 3 because they have "cd" in the same place, and that's the index of the "d". At least, that's as near as I can tell what it's actually doing. That's because you're indexing into both strings with the same indexing variable.
To re-write indexOf, looking for str within _st, you have to scan _st for the first character in str and then check whether the remaining characters match; if not, bump forward one place from where you started checking and continue your scan. (There are optimisations you can do, but that's the essence of it.) So for instance, if you find the first character of str at index 4 in _st and str is six characters long, having found the first character you need to see if the remaining five (str's indexes 1-5 inclusive) match _st's indexes 5-10 inclusive (easiest just to check all six of str's characters against a substring of _st starting at 4 and going for six charactesr). If everything matches, return the index at which you found the first character (so, 4 in that example). You can stop scanning at _st.length() - str.length() since if you haven't found it starting prior to that point, you're not going to find it at all.
Side point: Don't call the length function on every loop. The JIT may be able to optimize out the call, but if you know that _st won't change during the course of this function (and if you don't know that, you should require it), grab length() to a local and then refer to that. And of course, since you know you can stop earlier than length(), you'l use a local to remember where you can stop.

You are using i for both strings equal, but what you wan't is the first string to always start at 0 unless the character is found is the other string. Then check if the next characters are equal and so on.
Hope this helps

Your code loops through the string being searched and if the characters at position i match, it checks the next position. If the strings match at the next position, you assume that the string str is contained in _st.
What you probably want to do is:
keep track of whether the whole of str is contained in _st. You could probably check whether the string that you are searching for has length equal to the number of matching characters so far.
if you do the above then you could get the starting index by subtracting the number of matches so far from the current value of i.
One question:
Why are you not using the built in String.IndexOf() function? Is this assignment meant for you to implement this functionality on your own?

Maybe the Oracle Java API Source code does help:
/**
* Returns the index within this string of the first occurrence of the
* specified substring. The integer returned is the smallest value
* <i>k</i> such that:
* <blockquote><pre>
* this.startsWith(str, <i>k</i>)
* </pre></blockquote>
* is <code>true</code>.
*
* #param str any string.
* #return if the string argument occurs as a substring within this
* object, then the index of the first character of the first
* such substring is returned; if it does not occur as a
* substring, <code>-1</code> is returned.
*/
public int indexOf(String str) {
return indexOf(str, 0);
}
/**
* Returns the index within this string of the first occurrence of the
* specified substring, starting at the specified index. The integer
* returned is the smallest value <tt>k</tt> for which:
* <blockquote><pre>
* k >= Math.min(fromIndex, this.length()) && this.startsWith(str, k)
* </pre></blockquote>
* If no such value of <i>k</i> exists, then -1 is returned.
*
* #param str the substring for which to search.
* #param fromIndex the index from which to start the search.
* #return the index within this string of the first occurrence of the
* specified substring, starting at the specified index.
*/
public int indexOf(String str, int fromIndex) {
return indexOf(value, offset, count,
str.value, str.offset, str.count, fromIndex);
}
/**
* Code shared by String and StringBuffer to do searches. The
* source is the character array being searched, and the target
* is the string being searched for.
*
* #param source the characters being searched.
* #param sourceOffset offset of the source string.
* #param sourceCount count of the source string.
* #param target the characters being searched for.
* #param targetOffset offset of the target string.
* #param targetCount count of the target string.
* #param fromIndex the index to begin searching from.
*/
static int indexOf(char[] source, int sourceOffset, int sourceCount,
char[] target, int targetOffset, int targetCount,
int fromIndex) {
if (fromIndex >= sourceCount) {
return (targetCount == 0 ? sourceCount : -1);
}
if (fromIndex < 0) {
fromIndex = 0;
}
if (targetCount == 0) {
return fromIndex;
}
char first = target[targetOffset];
int max = sourceOffset + (sourceCount - targetCount);
for (int i = sourceOffset + fromIndex; i <= max; i++) {
/* Look for first character. */
if (source[i] != first) {
while (++i <= max && source[i] != first);
}
/* Found first character, now look at the rest of v2 */
if (i <= max) {
int j = i + 1;
int end = j + targetCount - 1;
for (int k = targetOffset + 1; j < end && source[j] ==
target[k]; j++, k++);
if (j == end) {
/* Found whole string. */
return i - sourceOffset;
}
}
}
return -1;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

search for words in an array of letters - java

Related

How to read and return the second index of an array if first index matches a string?

Searching for words in Multi-Dimension Array,Java

Recursive backtracking to create permutations of given string

Sudden slow-down and java.lang.OutOfMemoryError during Java string search

indexOf (String str) - equals string to other string

Categories

Resources