I have an ArrayList that contains a bunch of words each in their own cell that come from a file. Some of those words are complete word like "physicist, water, gravity". However, other words are just letters that got split throughout the process of the program. For example, "it's" became "it" and "s". As such, I want to remove all of the single letter words except "I" and "A" because these are actual words.
This is the code I have for now:
for(int i=0;i<dictionnary.size();i++) {
if(dictionnary.get(i).compareToIgnoreCase("I")!=0||dictionnary.get(i).compareToIgnoreCase("A")!=0||dictionnary.get(i).length()==1){
dictionnary.remove(i);
}
}
Where dictionnary is my ArrayList. However, when I print out the content of my arrayList the "s" from it's remains. I also know that there was originally a word "E" that got removed throughout the process above. I'm confused as to why the "S" remains and how to fix it.
From my understanding this code goes through the ArrayList and checks if the length of the case is 1 (which is the case for all single letter words) as well as checking if that case is a case of "I" or "A" regardless of if it is capitalized or not. It then removes the cases that don't correspond to the "I" or "A".
Consider using the Collection Iterator for safe removal of elements during iteration.
for (Iterator<String> iter = dictionary.iterator() ; iter.hasNext() ; ) {
String word = iter.next();
if (word.length() == 1
&& !"I".equals(word)
&& !"A".equalsIgnoreCase(word)) {
iter.remove();
}
}
My suggestion is the following:
You can use removeIf in a next way.
removeIf takes a predicate.
public static void main(String[] args) {
List<String> dictionary = new ArrayList<>();
dictionary.add("I");
dictionary.add("A");
dictionary.add("p");
dictionary.add("its");
dictionary.add("water");
dictionary.add("s");
Integer sizeRemove =1;
dictionary.removeIf(
word ->
!"I".equals(word)
&& !"A".equalsIgnoreCase(word)
&& word.length() == sizeRemove
);
System.out.println(dictionary);
}
The output is the following:
[I, A, its, water]
Reference:
https://www.programiz.com/java-programming/library/arraylist/removeif
Use iterators instead. Let's say you have a list of (1,2,3,4,5) and you want to remove the numbers 2 and 3. You start looping through and get to the second element 2. Here your i is 1. You remove that element and go to i=2. What you have now is (1,3,4,5). Since i=2, you have missed one element.
And that's the reason you should use iterators instead. Refer to #vsfDawg answer.
What will be the best way to remove any item from the arraylist, which contains all the characters of the same type?
Please refer the example string array list data below:
Element 1: FFFFFFFF
Element 2: 123
Element 3: ABCD1234
Element 4: FFFFFFFFFFFFFFFFF
Element 5: ABCDEF
From the above data, I want to remove 1st and 4th records because they contain all the characters as "F".
What I have tried so far is explained in pseudo-code below:
1. Iterated the list till the end in a loop
2. Get the data of current element
3. Check if the element string contains all "F" characters and nothing else.
4. If yes, note the index position of current element else move to next element
5. Use second loop to remove the elements from the stored index position
6. Here I got stuck because removing an element from arraylist changes its size and index position of remaining elements
Note# It will be more helpful if the method is dynamic to supply any character(like if the element contains all "A").
You can call List.removeIf() with a regex to test for repeating characters:
listOfData.removeIf(s -> s.matches("(.)\\1*"));
To break down the regex:
. matches any character
(.) captures that first character
\1 backreferences that capture
* finds 0 or more of the same
In other words, if the string consists of only a character followed by itself n times, remove it.
If you want to test for a specific repeating character, say c, it's even easier:
listOfData.removeIf(s -> s.matches(c + "+"));
This means "match one or more instances of c". Note that this doesn't handle special characters like '('.
String s=//populate data of string here
int distinct = 1 ;
for (int j = 0; j < s.length(); j++) {
if(s.charAt(0)==s.charAt(j))
{
distinct++;
}
}
if(s.length==distinct){
//all characters are same and remove
}
If you are using Java 8, you can use List.removeIf() as #shmosel suggested. But if you want to compile your code under older java versions, try something like below.
public static void removeCharacterSetElementFromList(char character, ArrayList<String> list){
ArrayList<String> listCopy = (ArrayList<String>)list.clone();
for (String listItem : listCopy){
boolean removable = listItem.length()>0 && character==listItem.charAt(0);
for (int i = 0; i < listItem.length(); i++){
char current = listItem.charAt(i);
if (character!=current) {
removable=false;
break;
}
}
if(removable) list.remove(listItem);
}
}
Then you can simply call removeCharacterSetElementFromList('F',listOfData); to remove recodes.
I am working on a problem, which is to write a program to find the longest word made of other words in a list of words.
EXAMPLE
Input: test, tester, testertest, testing, testingtester
Output: testingtester
I searched and find the following solution, my question is I am confused in step 2, why we should break each word in all possible ways? Why not use each word directly as a whole? If anyone could give some insights, it will be great.
The solution below does the following:
Sort the array by size, putting the longest word at the front
For each word, split it in all possible ways. That is, for “test”, split it into {“t”, “est”}, {“te”, “st”} and {“tes”, “t”}.
Then, for each pairing, check if the first half and the second both exist elsewhere in the array.
“Short circuit” by returning the first string we find that fits condition #3.
Answering your question indirectly, I believe the following is an efficient way to solve this problem using tries.
Build a trie from all of the words in your string.
Sort the words so that the longest word comes first.
Now, for each word W, start at the top of the trie and begin following the word down the tree one letter at a time using letters from the word you are testing.
Each time a word ends, recursively re-enter the trie from the top making a note that you have "branched". If you run out of letters at the end of the word and have branched, you've found a compound word and, because the words were sorted, this is the longest compound word.
If the letters stop matching at any point, or you run out and are not at the end of the word, just back track to wherever it was that you branched and keep plugging along.
I'm afraid I don't know Java that well, so I'm unable to provide you sample code in that language. I have, however, written out a solution in Python (using a trie implementation from this answer). Hopefully it is clear to you:
#!/usr/bin/env python3
#End of word symbol
_end = '_end_'
#Make a trie out of nested HashMap, UnorderedMap, dict structures
def MakeTrie(words):
root = dict()
for word in words:
current_dict = root
for letter in word:
current_dict = current_dict.setdefault(letter, {})
current_dict[_end] = _end
return root
def LongestCompoundWord(original_trie, trie, word, level=0):
first_letter = word[0]
if not first_letter in trie:
return False
if len(word)==1 and _end in trie[first_letter]:
return level>0
if _end in trie[first_letter] and LongestCompoundWord(original_trie, original_trie, word[1:], level+1):
return True
return LongestCompoundWord(original_trie, trie[first_letter], word[1:], level)
#Words that were in your question
words = ['test','testing','tester','teste', 'testingtester', 'testingtestm', 'testtest','testingtest']
trie = MakeTrie(words)
#Sort words in order of decreasing length
words = sorted(words, key=lambda x: len(x), reverse=True)
for word in words:
if LongestCompoundWord(trie,trie,word):
print("Longest compound word was '{0:}'".format(word))
break
With the above in mind, the answer to your original question becomes clearer: we do not know ahead of time which combination of prefix words will take us successfully through the tree. Therefore, we need to be prepared to check all possible combinations of prefix words.
Since the algorithm you found does not have an efficient way of knowing what subsets of a word are prefixes, it splits the word at all possible points in word to ensure that all prefixes are generated.
Richard's answer will work well in many cases, but it can take exponential time: this will happen if there are many segments of the string W, each of which can be decomposed in multiple different ways. For example, suppose W is abcabcabcd, and the other words are ab, c, a and bc. Then the first 3 letters of W can be decomposed either as ab|c or as a|bc... and so can the next 3 letters, and the next 3, for 2^3 = 8 possible decompositions of the first 9 letters overall:
a|bc|a|bc|a|bc
a|bc|a|bc|ab|c
a|bc|ab|c|a|bc
a|bc|ab|c|ab|c
ab|c|a|bc|a|bc
ab|c|a|bc|ab|c
ab|c|ab|c|a|bc
ab|c|ab|c|ab|c
All of these partial decompositions necessarily fail in the end, since there is no word in the input that contains W's final letter d -- but his algorithm will explore them all before discovering this. In general, a word consisting of n copies of abc followed by a single d will take O(n*2^n) time.
We can improve this to O(n^2) worst-case time (at the cost of O(n) space) by recording extra information about the decomposability of suffixes of W as we go along -- that is, suffixes of W that we have already discovered we can or cannot match to word sequences. This type of algorithm is called dynamic programming.
The condition we need for some word W to be decomposable is exactly that W begins with some word X from the set of other words, and the suffix of W beginning at position |X|+1 is decomposable. (I'm using 1-based indices here, and I'll denote a substring of a string S beginning at position i and ending at position j by S[i..j].)
Whenever we discover that the suffix of the current word W beginning at some position i is or is not decomposable, we can record this fact and make use of it later to save time. For example, after testing the first 4 decompositions in the 8 listed earlier, we know that the suffix of W beginning at position 4 (i.e., abcabcd) is not decomposable. Then when we try the 5th decomposition, i.e., the first one starting with ab, we first ask the question: Is the rest of W, i.e. the suffix of W beginning at position 3, decomposable? We don't know yet, so we try adding c to get ab|c, and then we ask: Is the rest of W, i.e. the suffix of W beginning at position 4, decomposable? And we find that it has already been found not to be -- so we can immediately conclude that no decomposition of W beginning with ab|c is possible either, instead of having to grind through all 4 possibilities.
Assuming for the moment that the current word W is fixed, what we want to build is a function f(i) that determines whether the suffix of W beginning at position i is decomposable. Pseudo-code for this could look like:
- Build a trie the same way as Richard's solution does.
- Initialise the array KnownDecomposable[] to |W| DUNNO values.
f(i):
- If i == |W|+1 then return 1. (The empty suffix means we're finished.)
- If KnownDecomposable[i] is TRUE or FALSE, then immediately return it.
- MAIN BODY BEGINS HERE
- Walk through Richard's trie from the root, following characters in the
suffix W[i..|W|]. Whenever we find a trie node at some depth j that
marks the end of a word in the set:
- Call f(i+j) to determine whether the rest of W can be decomposed.
- If it can (i.e. if f(i+j) == 1):
- Set KnownDecomposable[i] = TRUE.
- Return TRUE.
- If we make it to this point, then we have considered all other
words that form a prefix of W[i..|W|], and found that none of
them yield a suffix that can be decomposed.
- Set KnownDecomposable[i] = FALSE.
- Return FALSE.
Calling f(1) then tells us whether W is decomposable.
By the time a call to f(i) returns, KnownDecomposable[i] has been set to a non-DUNNO value (TRUE or FALSE). The main body of the function is only run if KnownDecomposable[i] is DUNNO. Together these facts imply that the main body of the function will only run as many times as there are distinct values i that the function can be called with. There are at most |W|+1 such values, which is O(n), and outside of recursive calls, a call to f(i) takes at most O(n) time to walk through Richard's trie, so overall the time complexity is bounded by O(n^2).
I guess you are just making a confusion about which words are split.
After sorting, you consider the words one after the other, by decreasing length. Let us call a "candidate" a word you are trying to decompose.
If the candidate is made of other words, it certainly starts with a word, so you will compare all prefixes of the candidate to all possible words.
During the comparison step, you compare a candidate prefix to the whole words, not to split words.
By the way, the given solution will not work for triwords and longer. The fix is as follows:
try every prefix of the candidate and compare it to all words
in case of a match, repeat the search with the suffix.
Example:
testingtester gives the prefixes
t, te, tes, test, testi, testin, testing, testingt, testingte, testingtes and testingteste
Among these, test and testing are words. Then you need to try the corresponding suffixes ingtester and tester.
ingtester gives
i, in, ing, ingt, ingte, ingtes, ingtest and ingteste, none of which are words.
tester is a word and you are done.
IsComposite(InitialCandidate, Candidate):
For all Prefixes of Candidate:
if Prefix is in Words:
Suffix= Candidate - Prefix
if Suffix == "":
return Candidate != InitialCandidate
else:
return IsComposite(InitialCandidate, Suffix)
For all Candidate words by decreasing size:
if IsComposite(Candidate, Candidate):
print Candidate
break
I would probably use recursion here. Start with the longest word and find words it starts with. For any such word remove it from the original word and continue with the remaining part in the same manner.
Pseudo code:
function iscomposed(orininalword, wordpart)
for word in allwords
if word <> orininalword
if wordpart = word
return yes
elseif wordpart starts with word
if iscomposed(orininalword, wordpart - word)
return yes
endif
endif
endif
next
return no
end
main
sort allwords by length descending
for word in allwords
if iscomposed(word, word) return word
next
end
Example:
words:
abcdef
abcde
abc
cde
ab
Passes:
1. abcdef starts with abcde. rest = f. 2. no word f starts with found.
1. abcdef starts with abc. rest = def. 2. no word def starts with found.
1. abcdef starts with ab. rest = cdef. 2. cdef starts with cde. rest = f. 3. no word f starts with found.
1. abcde starts with abc. rest = cde. 2. cde itself found. abcde is a composed word
To find longest world using recursion
class FindLongestWord {
public static void main(String[] args) {
List<String> input = new ArrayList<>(
Arrays.asList("cat", "banana", "rat", "dog", "nana", "walk", "walker", "dogcatwalker"));
List<String> sortedList = input.stream().sorted(Comparator.comparing(String::length).reversed())
.collect(Collectors.toList());
boolean isWordFound = false;
for (String word : sortedList) {
input.remove(word);
if (findPrefix(input, word)) {
System.out.println("Longest word is : " + word);
isWordFound = true;
break;
}
}
if (!isWordFound)
System.out.println("Longest word not found");
}
public static boolean findPrefix(List<String> input, String word) {
boolean output = false;
if (word.isEmpty())
return true;
else {
for (int i = 0; i < input.size(); i++) {
if (word.startsWith(input.get(i))) {
output = findPrefix(input, word.replace(input.get(i), ""));
if (output)
return true;
}
}
}
return output;
}
}
for (int i=0; i<Intlength; i++){
int intPosition;
intPosition=strAlphabet.indexOf(strMessage.charAt(i));
System.out.println(intPosition);
System.out.println("BREAK");
for (int k=0; k<Intlength2; k++){
int intPosition2;
intPosition2=strAlphabet.indexOf(strKeyword.charAt(k));
System.out.println(intPosition2);
System.out.println("BREAK-------------");
}
}
i will ask the user to type in two words. one is a message and one is a keyword.
the first loop above will check that if i
will add 1, and print out the first letters position number. for example if the message was "red". i would first want it to output the position number of "r" which is 17. then it mmust move to the second loop, and do the exact same for the keyword. for example if the keyword was "cat" i would want it to print the first letter position of the first letter in this case "c" has the position value of 2. in this way i want the output to be as such:
first letter position of message
first letter position of keyword
second letter position of message
second letter position of keyword
etc.
therefore sticking to the message "red" and the keyword "cat" i would want the output as such:
17
2
4
0
3
19
i have added in break texts to distinguish what was happening to my coding and this was the result.
Please give me a message:
red
Thank you! Now please give me a keyword:
cat
17
BREAK
2
BREAK-------------
0
BREAK-------------
19
BREAK-------------
4
BREAK
2
BREAK-------------
0
BREAK-------------
19
BREAK-------------
3
BREAK
2
BREAK-------------
0
BREAK-------------
19
BREAK-------------
as you can see it putputs the first letter position of the message, then all three positions of the keyword, and goes onto the second position letter of the message then again outputting all three position values of the keyword.
how do i fix this to get the output i want, i am sure that i am not writing the forloop correctly.
What you've got are nested for loops to make it do what you suggested they just need to be one after the other, not nested inside each other. When you put the second for loop inside the first what you're telling it to do is print each letter in the keyword for every letter in the message.
What you want is a loop like this (It's unclear what you want it to do if one string is longer than the other, I'm assuming you want it to stop with the shorter string but you can change that.)
if(strMessage.length > keyword.length){
intLength = keyword.length;
} else {
intLength = strMessage.length;
}
for (int i=0; i<intLength; i++){
//Print the position of the i'th letter of the message
int intPosition;
intPosition=strAlphabet.indexOf(strMessage.charAt(i));
System.out.println(intPosition);
//Print the position of the i'th letter of the keyword
int intPosition2;
intPosition2=strAlphabet.indexOf(strKeyword.charAt(i));
System.out.println(intPosition2);
}
When you nest loops, the inner one is executed from start to end for each iteration of the outer loop.
You probably need something like that (note that there's only one loop):
for (int i=0; i < Math.max(strMessage.length, keyword.length); i++){
if (i < strMessage.length) {
System.out.println(strAlphabet.indexOf(strMessage.charAt(i)));
} else {
// To be defined
}
if (i < strKeyword.length) {
System.out.println(strAlphabet.indexOf(strKeyword.charAt(i)));
} else {
// To be defined
}
}
(NB: not tested, not compiled)
If you break your requirement, you will get to know that you have straight forward functionality of taking a character at a given index from your both the string.
So you only need single for loop. Inside that you can take character from first string and then from second string.
Note: You need to take care of length of both the strings. Depending on it you can fetch the character from it.
Another Approach without IF else block
for (int i=0; i<Math.max(strMessage.length, keyword.length);i++){
int intPosition;
try{
intPosition=strAlphabet.indexOf(strMessage.charAt(i));
System.out.println(intPosition);
}catch(Exception e){
}
try{
intPosition=strAlphabet.indexOf(strKeyword.charAt(i));
System.out.println(intPosition);
}catch(Exception e){
}
}
String strAlphabet="ABCDEFGHIJKLMNOPQRSTUWXYZ";
String strMessage="red".toUpperCase();
String strKeyword="cat".toUpperCase();
int Intlength=strMessage.length();
int Intlength2=strKeyword.length();
for (int i=0; (i<Intlength) || (i<Intlength2); i++){
int intPosition=strAlphabet.indexOf(strMessage.charAt(i));
System.out.println(intPosition);
int intPosition2=strAlphabet.indexOf(strKeyword.charAt(i));
System.out.println(intPosition2);
}
1) Make the comparison case-insensitive. Don't be sure that user will take care of case while typing words.
2) Ensure the expected behavior in case message length is shorter than keyword or vice versa.
I just attempted a programming challenge, which I was not able to successfully complete. The specification is to read 2 lines of input from System.in.
A list of 1-100 space separated words, all of the same length and between 1-10 characters.
A string up to a million characters in length, which contains a permutation of the above list just once. Return the index of where this permutation begins in the string.
For example, we may have:
dog cat rat
abcratdogcattgh
3
Where 3 is the result (as printed by System.out).
It's legal to have a duplicated word in the list:
dog cat rat cat
abccatratdogzzzzdogcatratcat
16
The code that I produced worked providing that the word that the answer begins with has not occurred previously. In the 2nd example here, my code will fail because dog has already appeared before where the answer begins at index 16.
My theory was to:
Find the index where each word occurs in the string
Extract this substring (as we have a number of known words with a known length, this is possible)
Check that all of the words occur in the substring
If they do, return the index that this substring occurs in the original string
Here is my code (it should be compilable):
import java.io.BufferedReader;
import java.io.InputStreamReader;
public class Solution {
public static void main(String[] args) throws Exception {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String line = br.readLine();
String[] l = line.split(" ");
String s = br.readLine();
int wl = l[0].length();
int len = wl * l.length;
int sl = s.length();
for (String word : l) {
int i = s.indexOf(word);
int z = i;
//while (i != -1) {
int y = i + len;
if (y <= sl) {
String sub = s.substring(i, y);
if (containsAllWords(l, sub)) {
System.out.println(s.indexOf(sub));
System.exit(0);
}
}
//z+= wl;
//i = s.indexOf(word, z);
//}
}
System.out.println("-1");
}
private static boolean containsAllWords(String[] l, String s) {
String s2 = s;
for (String word : l) {
s2 = s2.replaceFirst(word, "");
}
if (s2.equals(""))
return true;
return false;
}
}
I am able to solve my issue and make it pass the 2nd example by un-commenting the while loop. However this has serious performance implications. When we have an input of 100 words at 10 characters each and a string of 1000000 characters, the time taken to complete is just awful.
Given that each case in the test bench has a maximum execution time, the addition of the while loop would cause the test to fail on the basis of not completing the execution in time.
What would be a better way to approach and solve this problem? I feel defeated.
If you concatenate the strings together and use the new string to search with.
String a = "dog"
String b = "cat"
String c = a+b; //output of c would be "dogcat"
Like this you would overcome the problem of dog appearing somewhere.
But this wouldn't work if catdog is a valid value too.
Here is an approach (pseudo code)
stringArray keys(n) = {"cat", "dog", "rat", "roo", ...};
string bigString(1000000);
L = strlen(keys[0]); // since all are same length
int indices(n, 1000000/L); // much too big - but safe if only one word repeated over and over
for each s in keys
f = -1
do:
f = find s in bigString starting at f+1 // use bigString.indexOf(s, f+1)
write index of f to indices
until no more found
When you are all done, you will have a series of indices (location of first letter of match). Now comes the tricky part. Since the words are all the same length, we're looking for a sequence of indices that are all spaced the same way, in the 10 different "collections". This is a little bit tedious but it should complete in a finite time. Note that it's faster to do it this way than to keep comparing strings (comparing numbers is faster than making sure a complete string is matched, obviously). I would again break it into two parts - first find "any sequence of 10 matches", then "see if this is a unique permutation".
sIndx = sort(indices(:))
dsIndx = diff(sIndx);
sequence = find {n} * 10 in dsIndx
for each s in sequence
check if unique permutation
I hope this gets you going.
Perhaps not the best optimized version, but how about following theory to give you some ideas:
Count length of all words in row.
Take random word from list and find the starting index of its first
occurence.
Take a substring with length counted above before and after that
index (e.g. if index is 15 and 3 words of 4 letters long, take
substring from 15-8 to 15+11).
Make a copy of the word list with earlier random word removed.
Check the appending/prepending [word_length] letters to see if they
match a new word on the list.
If word matches copy of list, remove it from copy of list and move further
If all words found, break loop.
If not all words found, find starting index of next occurence of
earlier random word and go back to 3.
Why it would help:
Which word you pick to begin with wouldn't matter, since every word
needs to be in the succcessful match anyway.
You don't have to manually loop through a lot of the characters,
unless there are lots of near complete false matches.
As a supposed match keeps growing, you have less words on the list copy left to compare to.
Can also keep track or furthest index you've gone to, so you can
sometimes limit the backwards length of picked substring (as it
cannot overlap to where you've already been, if the occurence are
closeby to each other).