How to know whether a string can be segmented into two strings

How to know whether a string can be segmented into two strings - java

I was asked in interview following question. I could not figure out how to approach this question. Please guide me.
Question: How to know whether a string can be segmented into two strings - like breadbanana is segmentable into bread and banana, while breadbanan is not. You will be given a dictionary which contains all the valid words.

Build a trie of the words you have in the dictionary, which will make searching faster.
Search the tree according to the following letters of your input string. When you've found a word, which is in the tree, recursively start from the position after that word in the input string. If you get to the end of the input string, you've found one possible fragmentation. If you got stuck, come back and recursively try another words.
EDIT: sorry, missed the fact, that there must be just two words.
In this case, limit the recursion depth to 2.
The pseudocode for 2 words would be:
T = trie of words in the dictionary
for every word in T, which can be found going down the tree by choosing the next letter of the input string each time we move to the child:
p <- length(word)
if T contains input_string[p:length(intput_string)]:
return true
return false
Assuming you can go down to a child node in the trie in O(1) (ascii indexes of children), you can find all prefixes of the input string in O(n+p), where p is the number of prefixes, and n the length of the input. Upper bound on this is O(n+m), where m is the number of words in dictionary. Checking for containing will take O(w) where w is the length of word, for which the upper bound would be m, so the time complexity of the algorithm is O(nm), since O(n) is distributed in the first phase between all found words.
But because we can't find more than n words in the first phase, the complexity is also limited to O(n^2).
So the search complexity would be O(n*min(n, m))
Before that you need to build the trie which will take O(s), where s is the sum of lengths of words in the dictionary. The upper bound on this is O(n*m), since the maximum length of every word is n.

you go through your dictionary and compare every term as a substring with the original term e.g. "breadbanana". If the first term matches with the first substring, cut the first term out of the original search term and compare the next dictionary entries with the rest of the original term...
let me try to explain that in java:
e.g.
String dictTerm = "bread";
String original = "breadbanana";
// first part matches
if (dictTerm.equals(original.substring(0, dictTerm.length()))) {
// first part matches, get the rest
String lastPart = original.substring(dictTerm.length());
String nextDictTerm = "banana";
if (nextDictTerm.equals(lastPart)) {
System.out.println("String " + original +
" contains the dictionary terms " +
dictTerm + " and " + lastPart);
}
}

The simplest solution:
Split the string between every pair of consecutive characters and see whether or not both substrings (to the left of the split point and to the right of it) are in the dictionary.

One approach could be:
Put all elements of dictionary in some set or list
now you can use contains & substring function to remove words which matches dictionary. if at the end string is null -> string can be segmented else not. You can also take care of count.

public boolean canBeSegmented(String s) {
for (String word : dictionary.getWords()) {
if (s.contains(word) {
String sub = s.subString(0, s.indexOf(word));
s = sub + s.subString(s.indexOf(word)+word.length(), s.length()-1);
}
return s.equals("");
}
}
This code checks if your given String can be fully segmented. It checks if a word from the dictionary is inside your string and then subtracks it. If you want to segment it in the process you have to order the subtracted sementents in the order they are inside the word.
Just two words makes it easier:
public boolean canBeSegmented(String s) {
boolean wordDetected = false;
for (String word : dictionary.getWords()) {
if (s.contains(word) {
String sub = s.subString(0, s.indexOf(word));
s = sub + s.subString(s.indexOf(word)+word.length(), s.length()-1);
if(!wordDetected)
wordDetected = true;
else
return s.equals("");
}
return false;
}
}
This code checks for one Word and if there is another word in the String and just these two words it returns true otherwise false.

this is a mere idea , you can implement it better if you want
package farzi;
import java.util.ArrayList;
public class StringPossibility {
public static void main(String[] args) {
String str = "breadbanana";
ArrayList<String> dict = new ArrayList<String>();
dict.add("bread");
dict.add("banana");
for(int i=0;i<str.length();i++)
{
String word1 = str.substring(0,i);
String word2 = str.substring(i,str.length());
System.out.println(word1+"===>>>"+word2);
if(dict.contains(word1))
{
System.out.println("word 1 found : "+word1+" at index "+i);
}
if(dict.contains(word2))
{
System.out.println("word 2 found : "+ word2+" at index "+i);
}
}
}
}

Related

How to find better algorithm for my prefix matcher algorithm

I was solving online problem and the task was something like this:
There are two arrays: numbers and prefixes.
Array numbers contains numbers: “+432112345”, “+9990”, “+4450505”
Array prefixes contains prefixes: “+4321”, “+43211”, “+7700”, “+4452”, “+4”
Find longest prefix for each number. If no prefix found for number, match with empty string.
For example:
“+432112345” matches with the longest prefix “+43211” (not +4321, cause 43211 is longer).
“+9990” doesn't match with anything, so empty string "".
“+4450505” matches with “+4” (“+4452” doesn’t match because of the 2).
I came up with the most straight forward solution where I loop through each number with each prefix. So each time new number, I check prefixes, if some prefix is longer than last one, I will change.
Map<String, String> numsAndPrefixes = new HashMap<>();
for (String number : A) {
for (String prefix : B) {
if (number.contains(prefix)) {
// if map already contains this number, check for prefixes.
// if longer exists, switch longer one
if (numsAndPrefixes.containsKey(number)) {
int prefixLength = prefix.length();
int currentLen = numsAndPrefixes.get(number).length();
if (prefixLength > currentLen) {
numsAndPrefixes.put(number, prefix);
}
} else {
numsAndPrefixes.put(number, prefix);
}
} else if (!number.contains(prefix) && !numsAndPrefixes.containsKey(number)){
numsAndPrefixes.put(number, "");
}
}
}
So it will have two for loops. I see that each time I am doing the same job over and over, e.g checking for prefixes. It works, but it is slow. The problem is that I can’t come up with anything better.
Could someone explain how they would approach to find better algorithm?
And more general, how do you proceed if you have somewhat working solution and trying to find better one? What knowledge am I still missing?

I would implement this using a TreeSet and the floor(E e) method.
String[] numbers = { "+432112345", "+9990", "+4450505" };
String[] prefixes = { "+4321", "+43211", "+7700", "+4452", "+4" };
TreeSet<String> prefixSet = new TreeSet<>(Arrays.asList(prefixes));
for (String number : numbers) {
String prefix = prefixSet.floor(number);
while (prefix != null && ! number.startsWith(prefix))
prefix = prefixSet.floor(prefix.substring(0, prefix.length() - 1));
if (prefix == null)
prefix = "";
System.out.println(number + " -> " + prefix);
}
Output
+432112345 -> +43211
+9990 ->
+4450505 -> +4

The data structure you need is trie.
Add all prefixes in trie
For each string S in numbers:
Start from the root of trie
For each character in S:
If there is a link from current node, associated with current character, go by this link to the next node
If there is no link, then you reached the longest prefix - prefix stored in the current node is the answer for S
This algorithm works in O(length(prefixes) + length(numbers))

You are using .contains(). You should use .startsWith(). It's a lot faster.
Then in your else if you are checking what you already checked in the if.
This is only one approach on how to improve the algorithm:
Sort the prefixes:
+43211 +4321 +4452 +4 +7700
What is good about this? Well, it will always find the longest prefix first. You can exit the loop and don't have to look for longer prefixes.
Arrays.sort(prefixes, new Comparator<String>() {
#Override
public int compare(String o1, String o2) {
return o1.startsWith(o2) ? 1 : o1.compareTo(o2);
}
});
Map<String, String> numsAndPrefixes = new HashMap<>();
for (String number: numbers) {
numsAndPrefixes.put(number, "");
for (String prefix: prefixes) {
if (number.startsWith(prefix, 1)) {
numsAndPrefixes.put(number, prefix);
break;
}
}
}
But if your number starts with +1 and there is no prefix it will continue checking all the prefixes with +2 +3 +4 ... which are obviously not matching. (Issue 1)
Also if your number starts with +9 the prefix will be found very late. (Issue 2)
How to fix this? Well you can save the indices where +1 starts, +2 starts, ...:
In our prefix list:
0 1 2 3 4 5 (index)
+1233 +123 +2233 +2 +3 +4
+2 starts at index [2] and +3 starts at index [4]. So when you want to know the prefix for a number starting with +2 you only have to check elements [2] and [3]. This will both fix issue 1 and 2.
It would also be possible to store the indices for more digits (for example where +13 starts).

Finding longest concatenated word

I have a dictionary with many words. And i hope search the longest concatenated word (that is, the longest word that is comprised entirely of
shorter words in the file). I give the method a descending word from their length. How can I check that all the symbols have been used from the dictionary?
public boolean tryMatch(String s, List dictionary) {
String nextWord = new String();
int contaned = 0;
//Цикл перебирающий каждое слово словаря
for(int i = 1; i < dictionary.size();i++) {
nextWord = (String) dictionary.get(i);
if (nextWord == s) {
nextWord = (String) dictionary.get(i + 1);
}
if (s.contains(nextWord)) {
contaned++;
}
}
if(contaned >1) {
return true;
}
return false;
}

If you have a sorted list of words, finding compound words is easy, but it will only perform well if the words are in a Set.
Let's look at the compound word football, and of course assume that both ball and foot are in the work list.
By definition, any compound word using foot as the first sub-word must start with foot.
So, when iterating the list, remember the current active "stem" words, e.g. when seeing foot, remember it.
Now, when seeing football, you check if the word starts with the stem word. If not, clear the stem word, and make new word the stem word.
If it does, the new word (football) is a candidate for being a compound word. The part after the stem is ball, so we need to check if that is a word, and if so, we found a compound word.
Checking is easy for simple case, i.e. wordSet.contains(remain).
However, compound words can be made up of more than 2 words, e.g. whatsoever. So after finding that it is a candidate from the stem word what, the remain is soever.
You can simply try all lengths of that (soever, soeve, soev, soe, so, s), and if one of the shorter ones are words, you repeat the process.

Time Complexity of Code for finding longest word inside dictionary

Problem is as follows: You start with a 2 letter word, and you can append letters to the front and back of the word. You have to return the longest word that exists inside a dictionary that you can form by appending letters to the front and back of the 2 letter word, and every new word that you formed must also be inside the dictionary as well
For example:
Start: 'at'
Dict: [hat, chat, chats, rat, rate, orange]
Output: 'chats', because: at -> hat -> chat -> chats
I have the code as follows:
public static String longest(ArrayList<String> input) {
return helper('at', dict);
}
public static String helper(String in, ArrayList<String> dict) {
ArrayList<String> maxes = new ArrayList<String>();
for (char a = 'a'; a < 'z'; a++) {
String front = Character.toString(a) + in;
String back = in + Character.toString(a);
if (dict.contains(front)) {
maxes.add(helper(front, dict));
}
if (dict.contains(back)) {
maxes.add(helper(back, dict));
}
}
if (maxes.size() == 0) {
return in;
}
String word = "";
for (String w : maxes) {
if (w.length() > word.length()) {
word = w;
}
}
return word;
}
I was wondering what the time complexity for this algorithm would be? I can't for the life of me figure it out.

The answer strongly depends on your dictionary (n words with max reachable length L<=n+1) and on your data structure for storing it. Each call to helper (without its recursive calls) is O(n L) with dict being an ArrayList, whereas with a hash table it's O(L) (absent unlikely collisions). (There can be very long unreachable words in the dictionary, but it still costs only O(L) to compare against them because your trial words can't be longer.)
As for the number of calls to helper: this is just a depth-first search on the tree of words related by prepending/appending a letter. As such, it's O(v), where v is the number of vertices visited. The values of v for various input words depends on your dictionary as well: v<=n, of course, and is often much less. As an example: using the 71813 lines in my /usr/share/dict/words that are all ASCII letters (and ignoring case), the most words ever considered is 593 (for "Ar" as in argon).
The worst-case dictionary will have all its words forming a chain "ab", "abc", "abcd", etc.. You visit every word for a total cost of O(v n L)=O(n^3) (O(v L)=O(n^2) with the hash table). Realistic dictionaries will be much faster not only because L is smaller but also because v is; the exact speedup is unfortunately difficult to analyze. It's probably reasonable to assume L is Θ(log(n)); there's no meaningful asymptotic expression for v as a function of n because realistic dictionaries don't have arbitrarily large n.

Finding the index of a permutation within a string

I just attempted a programming challenge, which I was not able to successfully complete. The specification is to read 2 lines of input from System.in.
A list of 1-100 space separated words, all of the same length and between 1-10 characters.
A string up to a million characters in length, which contains a permutation of the above list just once. Return the index of where this permutation begins in the string.
For example, we may have:
dog cat rat
abcratdogcattgh
3
Where 3 is the result (as printed by System.out).
It's legal to have a duplicated word in the list:
dog cat rat cat
abccatratdogzzzzdogcatratcat
16
The code that I produced worked providing that the word that the answer begins with has not occurred previously. In the 2nd example here, my code will fail because dog has already appeared before where the answer begins at index 16.
My theory was to:
Find the index where each word occurs in the string
Extract this substring (as we have a number of known words with a known length, this is possible)
Check that all of the words occur in the substring
If they do, return the index that this substring occurs in the original string
Here is my code (it should be compilable):
import java.io.BufferedReader;
import java.io.InputStreamReader;
public class Solution {
public static void main(String[] args) throws Exception {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String line = br.readLine();
String[] l = line.split(" ");
String s = br.readLine();
int wl = l[0].length();
int len = wl * l.length;
int sl = s.length();
for (String word : l) {
int i = s.indexOf(word);
int z = i;
//while (i != -1) {
int y = i + len;
if (y <= sl) {
String sub = s.substring(i, y);
if (containsAllWords(l, sub)) {
System.out.println(s.indexOf(sub));
System.exit(0);
}
}
//z+= wl;
//i = s.indexOf(word, z);
//}
}
System.out.println("-1");
}
private static boolean containsAllWords(String[] l, String s) {
String s2 = s;
for (String word : l) {
s2 = s2.replaceFirst(word, "");
}
if (s2.equals(""))
return true;
return false;
}
}
I am able to solve my issue and make it pass the 2nd example by un-commenting the while loop. However this has serious performance implications. When we have an input of 100 words at 10 characters each and a string of 1000000 characters, the time taken to complete is just awful.
Given that each case in the test bench has a maximum execution time, the addition of the while loop would cause the test to fail on the basis of not completing the execution in time.
What would be a better way to approach and solve this problem? I feel defeated.

If you concatenate the strings together and use the new string to search with.
String a = "dog"
String b = "cat"
String c = a+b; //output of c would be "dogcat"
Like this you would overcome the problem of dog appearing somewhere.
But this wouldn't work if catdog is a valid value too.

Here is an approach (pseudo code)
stringArray keys(n) = {"cat", "dog", "rat", "roo", ...};
string bigString(1000000);
L = strlen(keys[0]); // since all are same length
int indices(n, 1000000/L); // much too big - but safe if only one word repeated over and over
for each s in keys
f = -1
do:
f = find s in bigString starting at f+1 // use bigString.indexOf(s, f+1)
write index of f to indices
until no more found
When you are all done, you will have a series of indices (location of first letter of match). Now comes the tricky part. Since the words are all the same length, we're looking for a sequence of indices that are all spaced the same way, in the 10 different "collections". This is a little bit tedious but it should complete in a finite time. Note that it's faster to do it this way than to keep comparing strings (comparing numbers is faster than making sure a complete string is matched, obviously). I would again break it into two parts - first find "any sequence of 10 matches", then "see if this is a unique permutation".
sIndx = sort(indices(:))
dsIndx = diff(sIndx);
sequence = find {n} * 10 in dsIndx
for each s in sequence
check if unique permutation
I hope this gets you going.

Perhaps not the best optimized version, but how about following theory to give you some ideas:
Count length of all words in row.
Take random word from list and find the starting index of its first
occurence.
Take a substring with length counted above before and after that
index (e.g. if index is 15 and 3 words of 4 letters long, take
substring from 15-8 to 15+11).
Make a copy of the word list with earlier random word removed.
Check the appending/prepending [word_length] letters to see if they
match a new word on the list.
If word matches copy of list, remove it from copy of list and move further
If all words found, break loop.
If not all words found, find starting index of next occurence of
earlier random word and go back to 3.
Why it would help:
Which word you pick to begin with wouldn't matter, since every word
needs to be in the succcessful match anyway.
You don't have to manually loop through a lot of the characters,
unless there are lots of near complete false matches.
As a supposed match keeps growing, you have less words on the list copy left to compare to.
Can also keep track or furthest index you've gone to, so you can
sometimes limit the backwards length of picked substring (as it
cannot overlap to where you've already been, if the occurence are
closeby to each other).

How to find all permutations of a given word in a given text?

This is an interview question (phone screen): write a function (in Java) to find all permutations of a given word that appear in a given text. For example, for word abc and text abcxyaxbcayxycab the function should return abc, bca, cab.
I would answer this question as follows:
Obviously I can loop over all permutations of the given word and use a standard substring function. However it might be difficult (for me right now) to write code to generate all word permutations.
It is easier to loop over all text substrings of the word size, sort each substring and compare it with the "sorted" given word. I can code such a function immediately.
I can probably modify some substring search algorithm but I do not remember these algorithms now.
How would you answer this question?

This is probably not the most efficient solution algorithmically, but it is clean from a class design point of view. This solution takes the approach of comparing "sorted" given words.
We can say that a word is a permutation of another if it contains the same letters in the same number. This means that you can convert the word from a String to a Map<Character,Integer>. Such conversion will have complexity O(n) where n is the length of the String, assuming that insertions in your Map implementation cost O(1).
The Map will contain as keys all the characters found in the word and as values the frequencies of the characters.
Example. abbc is converted to [a->1, b->2, c->1]
bacb is converted to [a->1, b->2, c->1]
So if you have to know if two words are one the permutation of the other, you can convert them both into maps and then invoke Map.equals.
Then you have to iterate over the text string and apply the transformation to all the substrings of the same length of the words that you are looking for.
Improvement proposed by Inerdial
This approach can be improved by updating the Map in a "rolling" fashion.
I.e. if you're matching at index i=3 in the example haystack in the OP (the substring xya), the map will be [a->1, x->1, y->1]. When advancing in the haystack, decrement the character count for haystack[i], and increment the count for haystack[i+needle.length()].
(Dropping zeroes to make sure Map.equals() works, or just implementing a custom comparison.)
Improvement proposed by Max
What if we also introduce matchedCharactersCnt variable? At the beginning of the haystack it will be 0. Every time you change your map towards the desired value - you increment the variable. Every time you change it away from the desired value - you decrement the variable. Each iteration you check if the variable is equal to the length of needle. If it is - you've found a match. It would be faster than comparing the full map every time.
Pseudocode provided by Max:
needle = "abbc"
text = "abbcbbabbcaabbca"
needleSize = needle.length()
//Map of needle character counts
targetMap = [a->1, b->2, c->1]
matchedLength = 0
curMap = [a->0, b->0, c->0]
//Initial map initialization
for (int i=0;i<needle.length();i++) {
if (curMap.contains(haystack[i])) {
matchedLength++
curMap[haystack[i]]++
}
}
if (matchedLength == needleSize) {
System.out.println("Match found at: 0");
}
//Search itself
for (int i=0;i<haystack.length()-needle.length();i++) {
int targetValue1 = targetMap[haystack[i]]; //Reading from hashmap, O(1)
int curValue1 = curMap[haystack[i]]; //Another read
//If we are removing beneficial character
if (targetValue1 > 0 && curValue1 > 0 && curValue1 <= targetValue1) {
matchedLength--;
}
curMap[haystack[i]] = curValue1 + 1; //Write to hashmap, O(1)
int targetValue2 = targetMap[haystack[i+needle.length()]] //Read
int curValue2 = curMap[haystack[i+needle.length()]] //Read
//We are adding a beneficial character
if (targetValue2 > 0 && curValue2 < targetValue2) { //If we don't need this letter at all, the amount of matched letters decreases
matchedLength++;
}
curMap[haystack[i+needle.length()]] = curValue2 + 1; //Write
if (matchedLength == needleSize) {
System.out.println("Match found at: "+(i+1));
}
}
//Basically with 4 reads and 2 writes which are
//independent of the size of the needle,
//we get to the maximal possible performance: O(n)

To find a permutation of a string you can use number theory.
But you will have to know the 'theory' behind this algorithm in advance before you can answer the question using this algorithm.
There is a method where you can calculate a hash of a string using prime numbers.
Every permutation of the same string will give the same hash value. All other string combination which is not a permutation will give some other hash value.
The hash-value is calculated by c1 * p1 + c2 * p2 + ... + cn * pn
where ci is a unique value for the current char in the string and where pi is a unique prime number value for the ci char.
Here is the implementation.
public class Main {
static int[] primes = new int[] { 2, 3, 5, 7, 11, 13, 17,
19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71,
73, 79, 83, 89, 97, 101, 103 };
public static void main(String[] args) {
final char[] text = "abcxaaabbbccyaxbcayaaaxycab"
.toCharArray();
char[] abc = new char[]{'a','b','c'};
int match = val(abc);
for (int i = 0; i < text.length - 2; i++) {
char[] _123 = new char[]{text[i],text[i+1],text[i+2]};
if(val(_123)==match){
System.out.println(new String(_123) );
}
}
}
static int p(char c) {
return primes[(int)c - (int)'a'];
}
static int val(char[] cs) {
return
p(cs[0])*(int)cs[0] + p(cs[1])*(int)cs[1] + p(cs[2])*(int)cs[2];
}
}
The output of this is:
abc
bca
cab

You should be able to do this in a single pass. Start by building a map that contains all the characters in the word you're searching for. So initially the map contains [a, b, c].
Now, go through the text one character at a time. The loop looks something like this, in pseudo-code.
found_string = "";
for each character in text
if character is in map
remove character from map
append character to found_string
if map is empty
output found_string
found_string = ""
add all characters back to map
end if
else
// not a permutation of the string you're searching for
refresh map with characters from found_string
found_string = ""
end if
end for
If you want unique occurrences, change the output step so that it adds the found strings to a map. That'll eliminate duplicates.
There's the issue of words that contain duplicated letters. If that's a problem, make the key the letter and the value a count. 'Removing' a character means decrementing its count in the map. If the count goes to 0, then the character is in effect removed from the map.
The algorithm as written won't find overlapping occurrences. That is, given the text abcba, it will only find abc. If you want to handle overlapping occurrences, you can modify the algorithm so that when it finds a match, it decrements the index by one minus the length of the found string.
That was a fun puzzle. Thanks.

This is what I would do - set up a flag array with one
element equal to 0 or 1 to indicate whether that character
in STR had been matched
Set the first result string RESULT to empty.
for each character C in TEXT:
Set an array X equal to the length of STR to all zeroes.
for each character S in STR:
If C is the JTH character in STR, and
X[J] == 0, then set X[J] <= 1 and add
C to RESULT.
If the length of RESULT is equal to STR,
add RESULT to a list of permutations
and set the elements of X[] to zeroes again.
If C is not any character J in STR having X[J]==0,
then set the elements of X[] to zeroes again.

The second approach seems very elegant to me and should be perfectly acceptable. I think it scales at O(M * N log N), where N is word length and M is text length.
I can come up with a somewhat more complex O(M) algorithm:
Count the occurrence of each character in the word
Do the same for the first N (i.e. length(word)) characters of the text
Subtract the two frequency vectors, yielding subFreq
Count the number of non-zeroes in subFreq, yielding numDiff
If numDiff equals zero, there is a match
Update subFreq and numDiff in constant time by updating for the first and after-last character in the text
Go to 5 until reaching the end of the text
EDIT: See that several similar answers have been posted. Most of this algorithm is equivalent to the rolling frequency counting suggested by others. My humble addition is also updating the number of differences in a rolling fashion, yielding an O(M+N) algorithm rather than an O(M*N) one.
EDIT2: Just saw that Max has basically suggested this in the comments, so brownie points to him.

This code should do the work:
import java.util.ArrayList;
import java.util.List;
public class Permutations {
public static void main(String[] args) {
final String word = "abc";
final String text = "abcxaaabbbccyaxbcayxycab";
List<Character> charsActuallyFound = new ArrayList<Character>();
StringBuilder match = new StringBuilder(3);
for (Character c : text.toCharArray()) {
if (word.contains(c.toString()) && !charsActuallyFound.contains(c)) {
charsActuallyFound.add(c);
match.append(c);
if (match.length()==word.length())
{
System.out.println(match);
match = new StringBuilder(3);
charsActuallyFound.clear();
}
} else {
match = new StringBuilder(3);
charsActuallyFound.clear();
}
}
}
}
The charsActuallyFound List is used to keep track of character already found in the loop. It is needed to avoid mathing "aaa" "bbb" "ccc" (added by me to the text you specified).
After further reflection, I think my code only work if the given word has no duplicate characters.
The code above correctly print
abc
bca
cab
but if you seaarch for the word "aaa", then nothing is printed, because each char can not be matched more than one time. Inspired from Jim Mischel answer, I edit my code, ending with this:
import java.util.ArrayList;
import java.util.List;
public class Permutations {
public static void main(String[] args) {
final String text = "abcxaaabbbccyaxbcayaaaxycab";
printMatches("aaa", text);
printMatches("abc", text);
}
private static void printMatches(String word, String text) {
System.out.println("matches for "+word +" in "+text+":");
StringBuilder match = new StringBuilder(3);
StringBuilder notYetFounds=new StringBuilder(word);
for (Character c : text.toCharArray()) {
int idx = notYetFounds.indexOf(c.toString());
if (idx!=-1) {
notYetFounds.replace(idx,idx+1,"");
match.append(c);
if (match.length()==word.length())
{
System.out.println(match);
match = new StringBuilder(3);
notYetFounds=new StringBuilder(word);
}
} else {
match = new StringBuilder(3);
notYetFounds=new StringBuilder(word);
}
}
System.out.println();
}
}
This give me following output:
matches for aaa in abcxaaabbbccyaxbcayaaaxycab:
aaa
aaa
matches for abc in abcxaaabbbccyaxbcayaaaxycab:
abc
bca
cab
Did some benchmark, the code above found 30815 matches of "abc" in a random string of 36M in just 4,5 seconds. As Jim already said, thanks for this puzzle...

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to know whether a string can be segmented into two strings - java

The simplest solution: Split the string between every pair of consecutive characters and see whether or not both substrings (to the left of the split point and to the right of it) are in the dictionary.

One approach could be: Put all elements of dictionary in some set or list now you can use contains & substring function to remove words which matches dictionary. if at the end string is null -> string can be segmented else not. You can also take care of count.

Related

How to find better algorithm for my prefix matcher algorithm

Finding longest concatenated word

Time Complexity of Code for finding longest word inside dictionary

Finding the index of a permutation within a string

How to find all permutations of a given word in a given text?

Categories

Resources