Java: Finding how many words appear in BOTH data sources? - java

I'm trying to figure out if there is an easy way to count the number of words that appear in small paragraph (#1) and small paragraph (#2).
Generally, Im determining how much overlap there is in these paragraphs on a word by word basis. So if (#1) contains the word "happy" and (#2) contains the word "happy" that would be like a +1 value.
I know that I could use a String.contains() for each word in (#1) applied to (#2). But I was wondering if there is something more efficient that I could use

You can create two sets s1 and s2, containing all words from first and second paragraph respectively, and intersect them: s1.retainAll(s2). Sounds easy enough.
update
Works for me
Set<String> s1 = new HashSet<String>(Arrays.asList("abc xyz 123".split("\\s")));
Set<String> s2 = new HashSet<String>(Arrays.asList("xyz 000 111".split("\\s")));
s1.retainAll(s2);
System.out.println(s1.size());
Don't forget to remove empty word from both sets.

Related

how to add special character to All Lines in EditText or in String?

i hope you are fine.
i want a way to add a special character to begenning of every new Line (to the line break), for example i have this text line by line :
Paragraphs are the building blocks of papers.
Many students define paragraphs in terms of length.
a paragraph is a group of at least five sentences.
a paragraph is half a page long, etc. In reality.
though, the unity and coherence of ideas among.
sentences is what constitutes a paragraph.
and i want it to be like this (with "-" in every beginning of line) :
-Paragraphs are the building blocks of papers.
-Many students define paragraphs in terms of length.
-a paragraph is a group of at least five sentences.
-a paragraph is half a page long, etc. In reality.
-though, the unity and coherence of ideas among.
-sentences is what constitutes a paragraph.
and this is my small code :
String string = edittext1.getText().toString();
//here i want to add code for adding a "special character like -+*" in every new line
textview1.setText(string);
and thank you in advance.
Here is a simple way to do that
String string = edittext1.getText.toString;
StringBuilder result = new StringBuilder();
String[] array = string.split("\\n");
for(String line : array){
result.append("-").append(line).append("\n");
}
textview1.setText(result);
Honestly, you should just be editing the strings directly, but in this case, maybe make a third-string and for that string concatenate the individual lines of text as strings with a string that contains "-" so now you can print your desired strings one line at a time with \n to separate the lines.
You can achieve this by replacing the charSequence with new one as you want.
Kotlin Code Example
https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/replace.html
Java Code Example
https://www.javatpoint.com/java-string-replace

What's the fastest way to only rotate certain elements in an array?

I'm writing a form of word scrambler for strings which takes all letters except for the first and last, and rotates their positions. However, I am supposed to only look at the second to second last letters. How should I only scramble from the second last letter to the second last letter?
e.g. scramble "string" to "srintg"
I can call Collections.rotate() on an array of characters created by splitting the string, but that will scramble the entire word.
List<String> newWordList = Arrays.asList(word.split(" "));
Collections.rotate(newWordList, -1);
String newWord = String.join("", newWordList);
I want to get the output "srintg", but instead I will get "rintgs".
Provided that your word is long enough for it to be sensible (at least four letters), you can make the approach you present work by rotating a sublist of your list:
Collections.rotate(newWordList.subList(1, newWordList.size() - 1), -1);
List.subList() creates a view of a portion of a List list for the exact purpose of avoiding the need for overloading List methods with versions that operate on indexed sub-ranges of the elements. That's "fast" in the sense of fast to write, and it's fairly clear.
If you are looking for "fast" in a performance sense, however, then splitting and joining strings seems ill-advised. Fastest is probably not something we can offer, as performance needs to be tested, but if I were looking for best performance then I would test at least these general approaches:
Work with an array form of your word
Use String.toCharArray() to obtain your word's letters in array form.
Use an indexed for loop to rotate the characters in the array.
Construct a new String from the modified array (using the appropriate constructor).
Use a StringBuilder to assemble the word
Create a StringBuilder with initial capacity equal to the word length.
Iterate over the word's letters using a CharacterIterator, appending them to the builder in the order required. This can be done in a single pass.
Obtain the result string from the builder.

Google Challenge Dilemma, Insights into possible errors?

I am currently passing 4 of the 5 hidden test cases for this challenge and would like some input
Quick problem description:
You are given two input strings, String chunk and String word
The string "word" has been inserted into "chunk" some number of times
The task is to find the shortest string possible when all instances of
"word" have been removed from "chunk".
Keep in mind during removal, more instances of the "word" might be
created in "chunk". "word" can also be inserted anywhere, including
between "word" instances
If there are more the one shortest possible strings after removal,
return the shortest word that is lexicographic-ally the earliest.
This is easier understood with examples:
Inputs:
(string) chunk = "lololololo"
(string) word = "lol"
Output:
(string) "looo" (since "looo" is eariler than "oolo")
Inputs:
(string) chunk = "goodgooogoogfogoood"
(string) word = "goo"
Output:
(string) "dogfood"
right now I am iterating forwards then backwards, removing all instances of word and then comparing the two results of the two iterations.
Is there a case I am overlooking? Is it possible there is a case where you have to remove from the middle first or something along those lines?
Any insight is appreciated.
I am not sure. But i will avoid matching first and last character of chunk. Should replace all other.

Filter words belonging to a broad category

I have a list of words (assume they are stored in String[] if you must). I want to filter out words that belong to a broad general category such as Music or Sports.
Is there a ready-made solution for this (even if it's only for a limited set of general categories)?
Or how would you go about doing this?
It is to be done in Java 1.6 and it is an NLP (Natural Language Processing) problem. The input list of words has random words, and I want to extract from this large list, only the words that belong to a given general category (which will be a subset).
Another way of thinking: Given a single word, I want to determine if this word belongs to a category. Something like this:
String word1 = "football"; //the strings will always be single word units
String word2 = "telephone";
boolean b1 = belongsToCategory(Categories.SPORTS, word1); //true
boolean b2 = belongsToCategory(Categories.SPORTS, word2); //false
If you need more info, please ask.
Well, my idea would be to hold a set of words for each category and look the word up in each set.
Of course, this set would get huge and impossible to maintain if you held all the inflected forms for a single word. I'd consider using lemmatization to limit the size of this set.
You might be interested in checking the following links:
Lemmatization on Wikipedia
and
Lemmatization java

Finding Distinct Words between 2 sentences in java

What is an efficient way to find out all the unique words between 2 sentences in java and store them? What data structure should be used to store the words?
Store words from the first sentence in hashset and then iterate over ords in second sentence to see if its already there in hashset
Put all words from one sentence in a set, then pass through words of the second sentence. If the word exists in a set, take it out of the set, otherwise put it into the set.
A simple way of achieving this is:
//I use regular expression to remove punctuation marks
//II use split to convert the sentences into collections of "words"
//III create a variable that is an implementation of java.util.set (to store unique words)
//III iterate over the collections
// add words from each sentence to the set variable (that way the word will only be stored once)
Hope this helps

Categories