Finding Distinct Words between 2 sentences in java - java

What is an efficient way to find out all the unique words between 2 sentences in java and store them? What data structure should be used to store the words?

Store words from the first sentence in hashset and then iterate over ords in second sentence to see if its already there in hashset

Put all words from one sentence in a set, then pass through words of the second sentence. If the word exists in a set, take it out of the set, otherwise put it into the set.

A simple way of achieving this is:
//I use regular expression to remove punctuation marks
//II use split to convert the sentences into collections of "words"
//III create a variable that is an implementation of java.util.set (to store unique words)
//III iterate over the collections
// add words from each sentence to the set variable (that way the word will only be stored once)
Hope this helps

Related

Fill in the Blank String

I am studying for an interview and having trouble with this question.
Basically, you have a word that has spaces in it like c_t.
You have a word bank and have to find all the possible words that can be made with the given string. So for in this case, if cat was in the word bank we would return true.
Any help on solving this question (like an optimal algorithm would be appreciated).
I think we can start with checking lengths of strings in the word bank and then maybe use a hashmap somehow.
Step 1.) Eliminate all words in the wordbook that don't have the same length as the specified one.
Step 2.) Eliminate all words in the bank that don't have the same starting sequence and ending sequence.
Step 3.) If the specified string is fragmented like c_ter_il_ar, for each word left in the bank check if it contains the isolated sequences at those exact same indexes such as ter and il and eliminate those that don't have it
Step 4.) At this point all the words left in the bank are viable solutions, so return true if the bank is non-empty
It may depend on what your interviewer is looking for... creativity, knowledge of algorithms, mastery of data structures? One off-the-cuff solution would be to substitute underscores for any spaces and use a LIKE clause in a SQL query.
SELECT word FROM dictionary WHERE word LIKE 'c_t'; should return "cat", "cot" and "cut".
If you're being evaluated on your ability to divide and conquer, then you should be able to reason whether it's more work to extract a list of candidate words and evaluate each against your criteria, or to generate a list of candidate words from your criteria and evaluate each against your dictionary.

How can I let a set retain all strings that contain my specified substring in java?

I use a hashset for a dictionary. Now I would like to filter out words that do not start with my substring. So it should be something like this:
String word = 'ab';
List<String> list = Arrays.asList(word);
boolean result = lexiconSet.retainAll(list);
And instead of this resulting in the lexicon only containing the word 'ab', I would like to keep all words beginning with 'ab'. How can I do this?
I know I can convert the set to a string arraylist, and loop over all elements to see if the strings starts with 'ab', but since I think this can be time consuming and not efficient, I would like to hear better solutions. Thank you in advance!
With Java 8, life is easy:
list.removeIf(s -> !s.startsWith("ab"));
This will remove all elements that don't begin with "ab".
Note that you can use values() to retrieve the map's values and work directly on them, without the need to convert to ArrayList.

Word Search: two string arrays in alphabetical order using merge sort

For my class project, we have to go through the Shakespeare sonnet and check if each word is in the dictionary or not. Now I have two String arrays both in alphabetical order, one consists of the words from the sonnet and the other one is consisted of the word from the dictionary. I am asked to use the merge sort to check if the word in the sonnet exists in the dictionary. Can anyone give me an idea of how I can implement this??? Thanks in advance!
The idea is to:
Sort both of the arrays (with merge sort)
Remove any duplicates
Iterate through both of the sorted arrays simultaneously (can be done using the merging procedure in mergesort) and check if the next word in the sonnet list equals the next word in the dictionary. If it does not, remove it, and mark it as "not in dictionary", if it is, mark it as "in the dictionary", and proceed to the next element in both lists
However, this approach assumes that all of the words in the dictionary is contained in the sonnet. If this is not the case, you would have to remove those words up front.
Really though; this doesn't sound like a sort problem.
The best approach would be to use a HashMap and put all the dictionary words in that. Then you could iterate through the sonnet, and check for existence in the map.

guess words using dictionary

I am guessing the key of a less-simple simple substitution ciphertext. The rule that I evaluate the correctness of the key is number of english words in the putative decryption.
Are there any tools in java that can check the number of english words in a string. For example,
"thefoitedstateswasat"-> 4 words
"thefortedxyzstateswasathat"->5 words.
I loaded words list and using HashSet as a dictionay. As I dont know the inter-word spaces belong in the text, I can't validate words using simple dictionary.
Thanks.
I gave an answer to a similar question here:
If a word is made up of two valid words
It has some Java-esque pseudocode in it that might be adaptable into something that solves this problem.
Sorry I'm new and does not have the rep to comment yet.
But wouldn't the code be very slow as the number of checks and permutations is very big?
I guess you just have to brute force your way through by using (n-1) words nested for loop. And then search the dictionary for each substring.
Surely there's a better way to test the accuracy of your key?
But that's not the point, here's what I'd do:
Using "quackdogsomethinggodknowswhat"
I'd have a recursive method where starting at the beginning of the string, I'd call a recursive method for all the words with which the subject string starts, in this case "qua", and "quack" with the string not containing the word ("dogsomethinggodknowswhat" for quack). Return whatever is greater: 1 + the greatest value returned out of all your method calls OR 0 + the method call for the string starting at index 1 ("uackdogsomethinggodknowswhat").
This would probably work best if you kept your wordlist in a tree of some sort.
If you need some pseudocode, ask!

How to get English word alone from 100 words using Java program

I have 100 words. All 100 words are look like this.
EnglishWord,EngMeaning,NumberofW… meaning,31
In that I want to retrieve EnglishWord, e.g. Friendship alone for 100 words by using Java program.
I am assuming you have a "body" (main string), containing a list of substrings and you want to retrieve any specific one substring from within.
This looks a lot like homework/exercise, so I'll avoid giving you a ready-to-roll answer, since you need to achieve a solution yourself for it to be of any value, but the general steps you will need are the following:
1:
Be able to separate each substring (entry) from the others (the base string) in an organized fashion.
This can be done (for the string case), as #kylc said, with String's split function, which uses a REGEX (PATTERN) to define divisors (one or more), that then is/are used to divide the string into an array of multiple substrings.
String[] arrayOfEntries /*something to hold the result*/ = yourStringVar.split("," /*your split regex pattern*/);
NOTE: For more information on these, here are the links: String's split function, Pattern.
2:
Be able to acquire any specific entry withing an array of entries.
This is best done with a function you can reuse for other works. You need to define a "target" (what/which is going to be acquired) and a "source" (group of entries to acquire "target" from).
All you have to do is loop the "source", and for each entry there, compare to "target" for a match; When a match is found, just return it.
That's it! The rest is up to you!

Categories