Compare Certain Strings (part of sentence) from Paragraph - java

I have a Map,
HashMap<String,String> dataCheck= new HashMap<String,String>();
dataCheck.put("Flag1","Additional Income");
dataCheck.put("Flag2","Be your own boss");
dataCheck.put("Flag3","Compete for your business");
and a paragraph.
String paragraph = "When you have an additional Income, you can be your
own boss. So advertise with us and compete for your business. We help
you get additional income";
So what I want to achieve is for every member of the Hashmap, I want to compare it with the paragraph and find a number of repetitions. The match My output must be as follows:
Flag1 - 2 , Flag2 - 1 , Flag3 - 1
So, basically, I just want to get an idea on how I compare certain string with another set of strings.
Update: The Match would be case insensitive.

You can use a loop with String.indexOf() to count occurrences.
In the following code, you'll see we are looping through our HashMap and comparing each entry to our paragraph.
HashMap<String, String> dataCheck = new HashMap<String, String>();
dataCheck.put("Flag1", "Additional Income");
dataCheck.put("Flag2", "Be your own boss");
dataCheck.put("Flag3", "Compete for your business");
String paragraph = "When you have an additional Income, you can be your own boss. So advertise with us and compete for your business. We help you get additional income";
// Now, iterate through each entry in the Map
for (Map.Entry<String, String> entry : dataCheck.entrySet()) {
// Keep track of the number of occurrences
int count = 0;
// After finding a match, we need to increase our index in the loop so it moves on to the next match
int startingIndex = 0;
// This will convert the strings to upper case (so our matches are case insensitive
// It will continue looping until we get an an indexOf == -1 (which means no match was found)
while ((startingIndex = paragraph.toUpperCase().indexOf(entry.getValue().toUpperCase(), startingIndex)) != -1) {
// Add to our count
count++;
// Move our index position forward for the next loop
startingIndex++;
}
// Finally, print out the total count per Flag
System.out.println(entry.getKey() + ": " + count);
}
Here is the result:
Flag1: 2
Flag2: 1
Flag3: 1

Related

counting number of occurrences of words in a text java

So I'm building a TreeMap from scratch and I'm trying to count the number of occurrences of every word in a text using Java. The text is read from a text file, but I can easily read it from there. I really don't know how to count every word, can someone help?
Imagine the text is something like:
Over time, computer engineers take advantage of each other's work and invent algorithms for new things. Algorithms combine with other algorithms to utilize the results of other algorithms, in turn producing results for even more algorithms.
Output:
Over 1
time 1
computer 1
algotitms 5
...
If possible I want to ignore if it's upper or lower case, I want to count them both together.
EDIT: I don't want to use any sort of Map (hashMap i.e.) or something similiar to do this.
Break down the problem as follows (this is one potential solution - not THE solution):
Split the text into words (create list or array or words).
Remove punctuation marks.
Create your map to collect results.
Iterate over your list of words and add "1" to the value of each encountered key
Display results (Iterate over the map's EntrySet)
Split the text into words
My preference is to split words by using space as a delimiter. The reason being is that, if you split using non-word characters, you may missed on some hyphenated words. I know that the use of hyphenation is being reduced, there are still plenty of words that fall under this rule; for example, middle-aged. If a word such as this is encountered, it MIGHT have to be treated as one word and not two.
Remove punctuation marks
Because of the decision above, you will need to first remove punctuation characters that might attached to your words. Keep in mind that if you use a regular expression to split the words, you might be able to accomplish this step at the same time you are doing the step above. In fact, that would be preferred so that you don't have to iterate over twice. Do both of these in a single pass. While you at it, call toLowerCase() on the input string to eliminate the ambiguity between capitalized words and lowercase words.
Create your map to collect results
This is where you are going to collect your count. Using the TreeMap implementation of the Java Map. One thing to be aware about this particular implementation is that the map is sorted according to the natural ordering of its keys. In this case, since the keys are the words from the inputted text, the keys will be arranged in alphabetical order, not by the magnitude of the count. IF sorting the entries by count is important, there is a technique where you can "reverse" the map and make the values the keys and the keys to values. However, since two or more words could have the same count, you will need to create a new map of <Integer, Set>, so that you can group together words with the same count.
Iterate over your list of words
At this point, you should have a list of words and a map structure to collect the count. Using a lambda expression, you should be able to perform a count() or your words very easily. But, if you are not familiarized or comfortable with Lambda expressions, you can use a regular looping structure to iterate over your list, do a containsKey() check to see if the word was encountered before, get() the value if the map already contains the word, and then add "1" to the previous value. Lastly, put() the new count in the map.
Display results
Again, you can use a Lambda Expression to print out the EntrySet key value pairs or simply iterate over the entry set to display the results.
Based on all of the above points, a potential solution should look like this (not using Lambda for the OPs sake)
public static void main(String[] args) {
String text = "Over time, computer engineers take advantage of each other's work and invent algorithms for new things. Algorithms combine with other algorithms to utilize the results of other algorithms, in turn producing results for even more algorithms.";
text = text.replaceAll("\\p{P}", ""); // replace all punctuations
text = text.toLowerCase(); // turn all words into lowercase
String[] wordArr = text.split(" "); // create list of words
Map<String, Integer> wordCount = new TreeMap<>();
// Collect the word count
for (String word : wordArr) {
if(!wordCount.containsKey(word)){
wordCount.put(word, 1);
} else {
int count = wordCount.get(word);
wordCount.put(word, count + 1);
}
}
Iterator<Entry<String, Integer>> iter = wordCount.entrySet().iterator();
System.out.println("Output: ");
while(iter.hasNext()) {
Entry<String, Integer> entry = iter.next();
System.out.println(entry.getKey() + ": " + entry.getValue());
}
}
This produces the following output
Output:
advantage: 1
algorithms: 5
and: 1
combine: 1
computer: 1
each: 1
engineers: 1
even: 1
for: 2
in: 1
invent: 1
more: 1
new: 1
of: 2
other: 2
others: 1
over: 1
producing: 1
results: 2
take: 1
the: 1
things: 1
time: 1
to: 1
turn: 1
utilize: 1
with: 1
work: 1
Why did I break down the problem like this for such mundane task? Simple. I believe each of those discrete steps should be extracted into functions to improve code reusability. Yes, it is cool to use a Lambda expression to do everything at once and make your code look much simplified. But what if you need to some intermediate step over and over? Most of the time, code is duplicated to accomplish this. In reality, often a better solution is to break these tasks into methods. Some of these tasks, like transforming the input text, can be done in a single method since that activity seems to be related in nature. (There is such a thing as a method doing "too little.")
public String[] createWordList(String text) {
return text.replaceAll("\\p{P}", "").toLowerCase().split(" ");
}
public Map<String, Integer> createWordCountMap(String[] wordArr) {
Map<String, Integer> wordCountMap = new TreeMap<>();
for (String word : wordArr) {
if(!wordCountMap.containsKey(word)){
wordCountMap.put(word, 1);
} else {
int count = wordCountMap.get(word);
wordCountMap.put(word, count + 1);
}
}
return wordCountMap;
}
String void displayCount(Map<String, Integer> wordCountMap) {
Iterator<Entry<String, Integer>> iter = wordCountMap.entrySet().iterator();
while(iter.hasNext()) {
Entry<String, Integer> entry = iter.next();
System.out.println(entry.getKey() + ": " + entry.getValue());
}
}
Now, after doing that, your main method looks more readable and your code is more reusable.
public static void main(String[] args) {
WordCount wc = new WordCount();
String text = "...";
String[] wordArr = wc.createWordList(text);
Map<String, Integer> wordCountMap = wc.createWordCountMap(wordArr);
wc.displayCount(wordCountMap);
}
UPDATE:
One small detail I forgot to mention is that, if instead of a TreeMap a HashMap is used, the output will come sorted by count value in descending order. This is because the hashing function will use value of the entry as the hash. Therefore, you won't need to "reverse" the map for this purpose. So, after switching to HashMap, the output should be as follows:
Output:
algorithms: 5
other: 2
for: 2
turn: 1
computer: 1
producing: 1
...
my suggestion is to use regexp and split and stream with grouping example 3
EX1 this solution does not use a collection LIST/MAP only array for me it is not optimal
#Test
public void testApp2() {
final String text = "Over time, computer engineers take advantage of each other's work and invent algorithms for new things. Algorithms combine with other algorithms to utilize the results of other algorithms, in turn producing results for even more algorithms.";
final String lowerText = text.toLowerCase();
final String[] split = lowerText.split("\\W+");
System.out.println("Output: ");
for (String s : split) {
if (s == null) {
continue;
}
int count = 0;
for (int i = 0; i < split.length; i++) {
final boolean sameWorld = s.equals(split[i]);
if (sameWorld) {
count = count + 1;
split[i] = null;
}
}
System.out.println(s + " " + count);
}
}
EX2 I think that's what you mean, but I'm not sure if I used too much for the list
#Test
public void testApp() {
final String text = "Over time, computer engineers take advantage of each other's work and invent algorithms for new things. Algorithms combine with other algorithms to utilize the results of other algorithms, in turn producing results for even more algorithms.";
final String[] split = text.split("\\W+");
final List<String> list = new ArrayList<>();
System.out.println("Output: ");
for (String s : split) {
if(!list.contains(s)){
list.add(s.toUpperCase());
final long count = Arrays.stream(split).filter(s::equalsIgnoreCase).count();
System.out.println(s+" "+count);
}
}
}
EX3 below is a test for your example but use MAP
#Test
public void test() {
final String text = "Over time, computer engineers take advantage of each other's work and invent algorithms for new things. Algorithms combine with other algorithms to utilize the results of other algorithms, in turn producing results for even more algorithms.";
Map<String, Long> result = Arrays.stream(text.split("\\W+")).collect(Collectors.groupingBy(String::toLowerCase, Collectors.counting()));
assertEquals(result.get("algorithms"), new Long(5));
System.out.println("Output: ");
result.entrySet().stream().forEach(x -> System.out.println(x.getKey() + " " + x.getValue()));
}

How to find better algorithm for my prefix matcher algorithm

I was solving online problem and the task was something like this:
There are two arrays: numbers and prefixes.
Array numbers contains numbers: “+432112345”, “+9990”, “+4450505”
Array prefixes contains prefixes: “+4321”, “+43211”, “+7700”, “+4452”, “+4”
Find longest prefix for each number. If no prefix found for number, match with empty string.
For example:
“+432112345” matches with the longest prefix “+43211” (not +4321, cause 43211 is longer).
“+9990” doesn't match with anything, so empty string "".
“+4450505” matches with “+4” (“+4452” doesn’t match because of the 2).
I came up with the most straight forward solution where I loop through each number with each prefix. So each time new number, I check prefixes, if some prefix is longer than last one, I will change.
Map<String, String> numsAndPrefixes = new HashMap<>();
for (String number : A) {
for (String prefix : B) {
if (number.contains(prefix)) {
// if map already contains this number, check for prefixes.
// if longer exists, switch longer one
if (numsAndPrefixes.containsKey(number)) {
int prefixLength = prefix.length();
int currentLen = numsAndPrefixes.get(number).length();
if (prefixLength > currentLen) {
numsAndPrefixes.put(number, prefix);
}
} else {
numsAndPrefixes.put(number, prefix);
}
} else if (!number.contains(prefix) && !numsAndPrefixes.containsKey(number)){
numsAndPrefixes.put(number, "");
}
}
}
So it will have two for loops. I see that each time I am doing the same job over and over, e.g checking for prefixes. It works, but it is slow. The problem is that I can’t come up with anything better.
Could someone explain how they would approach to find better algorithm?
And more general, how do you proceed if you have somewhat working solution and trying to find better one? What knowledge am I still missing?
I would implement this using a TreeSet and the floor(E e) method.
String[] numbers = { "+432112345", "+9990", "+4450505" };
String[] prefixes = { "+4321", "+43211", "+7700", "+4452", "+4" };
TreeSet<String> prefixSet = new TreeSet<>(Arrays.asList(prefixes));
for (String number : numbers) {
String prefix = prefixSet.floor(number);
while (prefix != null && ! number.startsWith(prefix))
prefix = prefixSet.floor(prefix.substring(0, prefix.length() - 1));
if (prefix == null)
prefix = "";
System.out.println(number + " -> " + prefix);
}
Output
+432112345 -> +43211
+9990 ->
+4450505 -> +4
The data structure you need is trie.
Add all prefixes in trie
For each string S in numbers:
Start from the root of trie
For each character in S:
If there is a link from current node, associated with current character, go by this link to the next node
If there is no link, then you reached the longest prefix - prefix stored in the current node is the answer for S
This algorithm works in O(length(prefixes) + length(numbers))
You are using .contains(). You should use .startsWith(). It's a lot faster.
Then in your else if you are checking what you already checked in the if.
This is only one approach on how to improve the algorithm:
Sort the prefixes:
+43211 +4321 +4452 +4 +7700
What is good about this? Well, it will always find the longest prefix first. You can exit the loop and don't have to look for longer prefixes.
Arrays.sort(prefixes, new Comparator<String>() {
#Override
public int compare​(String o1, String o2) {
return o1.startsWith(o2) ? 1 : o1.compareTo(o2);
}
});
Map<String, String> numsAndPrefixes = new HashMap<>();
for (String number: numbers) {
numsAndPrefixes.put(number, "");
for (String prefix: prefixes) {
if (number.startsWith(prefix, 1)) {
numsAndPrefixes.put(number, prefix);
break;
}
}
}
But if your number starts with +1 and there is no prefix it will continue checking all the prefixes with +2 +3 +4 ... which are obviously not matching. (Issue 1)
Also if your number starts with +9 the prefix will be found very late. (Issue 2)
How to fix this? Well you can save the indices where +1 starts, +2 starts, ...:
In our prefix list:
0 1 2 3 4 5 (index)
+1233 +123 +2233 +2 +3 +4
+2 starts at index [2] and +3 starts at index [4]. So when you want to know the prefix for a number starting with +2 you only have to check elements [2] and [3]. This will both fix issue 1 and 2.
It would also be possible to store the indices for more digits (for example where +13 starts).

Hash Tables: Ransom Note hackerrank

Harold is a kidnapper who wrote a ransom note, but now he is worried it will be traced back to him through his handwriting. He found a magazine and wants to know if he can cut out whole words from it and use them to create an untraceable replica of his ransom note. The words in his note are case-sensitive and he must use only whole words available in the magazine. He cannot use substrings or concatenation to create the words he needs.
Given the words in the magazine and the words in the ransom note, print Yes if he can replicate his ransom note exactly using whole words from the magazine; otherwise, print No.
For example, the note is "Attack at dawn". The magazine contains only "attack at dawn". The magazine has all the right words, but there's a case mismatch. The answer is .
Sample Input 0
6 4
give me one grand today night
give one grand today
Sample Output 0
Yes
Sample Input 1
6 5
two times three is not four
two times two is four
Sample Output 1
No
My code 5/22 test cases failed :(
I can't figure out why 5 failed.
static void checkMagazine(String[] magazine, String[] note) {
int flag = 1;
Map<String, Integer> wordMap = new HashMap<>();
for(String word: magazine) {
if(!wordMap.containsKey(word)) {
wordMap.put(word, 1);
} else
wordMap.put(word,wordMap.get(word)+1);
}
for(String word: note){
if(!wordMap.containsKey(word)){
flag = 0;
break;
}
else wordMap.remove(word, wordMap.get(word));
}
if(flag == 0)
System.out.println("No");
else
System.out.println("Yes");
}
It's probably because instead of decrementing the count of the words in the magazine when you retrieve one, you're removing all counts of that word completely. Try this:
for(String word: note){
if(!(wordMap.containsKey(word) && wordMap.get(word) > 0)){
flag = 0;
break;
}
else wordMap.put(word, wordMap.get(word)-1);
}
wordMap is a frequency table and gives word counts.
However for every word in the note, you must decrease the word count instead of entirely removing the entry. Only when the word count reaches 0 one could remove the entry.
An other isssue is the case-sensitivity. Depending on the requirements you may need to convert all words to lowercase.
else {
wordMap.computeIfPresent(word, (k, v) -> v <= 1? null : v - 1);
}
This checks that the old value v is above 1 and then decreases it, or else returns a null value signaling to delete the entry.
The frequency counts can be done:
Map<String, Integer> wordMap = new HashMap<>();
for(String word: magazine) {
wordMap.merge(word, 1, Integer::sum);
}
I think, this implementation is simplier
static boolean checkMagazine(String[] magazine, String[] note) {
List<String> magazineCopy = new ArrayList<>(Arrays.asList(magazine));
for (String word : note)
{
if (magazineCopy.contains(word)) {
magazineCopy.remove(word);
continue;
}
return false;
}
return true;
}
I suppose your error is here:
else wordMap.remove(word, wordMap.get(word));
you are removing the word from the map, instead of decreasing the number of such words and only if the number reaches 0, you should remove the word from the map.
Python Solution
def checkMagazine(magazine, ransom):
magazine.sort()
ransom.sort()
for word in ransom:
if word not in magazine:
flag = False
break
else:
magazine.remove(word)
flag = True
if (flag):
print("Yes")
else:
print("No")

Count occurrences of each unique character

How to find the number of occurrence of every unique character in a String? You can use at most one loop. please post your solution, thanks.
Since this sounds like a homework problem, let's try to go over how to solve this problem by hand. Once we do that, let's see how we can try to implement that in code.
What needs to be done?
Let's take the following string:
it is nice and sunny today.
In order to get a count of how many times each character appears in the above string, we should:
Iterate over each character of the string
Keep a tally of how many times each character in the string appears
How would we actually try it?
Doing this this by hand might be like this:
First, we find a new characeter i, so we could note that in a table and say that i appeared 1 time so far:
'i' -> 1
Second, we find another new character t, so we could add that in the above table:
'i' -> 1
't' -> 1
Third, a space, and repeat again...
'i' -> 1
't' -> 1
' ' -> 1
Fourth, we encounter an i which happens to exist in the table already. So, we'll want to retrieve the existing count, and replace it with the existing count + 1:
'i' -> 2
't' -> 1
' ' -> 1
And so on.
How to translate into code?
Translating the above to code, we may write something like this:
For every character in the string
Check to see if the character has already been encountered
If no, then remember the new character and say we encountered it once
If yes, then take the number of times it has been encountered, and increment it by one
For the implementation, as others have mentioned, using a loop and a Map could achieve what is needed.
The loop (such as a for or while loop) could be used to iterate over the characters in the string.
The Map (such as a HashMap) could be used to keep track of how many times a character has appeared. In this case, the key would be the character and the value would be the count for how many times the character appears.
Good luck!
It's a homework, so cannot post the code, but here is one approach:
Iterate through the string, char by char.
Put the char in a hashmap key and initialize its value to 1 (count). Now, if the char is encountered again, update the value (count+1). Else add the new char to key and again set its value (count=1)
Here you go! I have done a rough program on Count occurrences of each unique character
public class CountUniqueChars{
public static void main(String args[]){
HashMap<Character, Integer> map;
ArrayList<HashMap<Character, Integer>> list = new ArrayList<HashMap<Character,Integer>>();
int i;
int x = 0;
Boolean fire = false;
String str = "Hello world";
str = str.replaceAll("\\s", "").toLowerCase();
System.out.println(str.length());
for(i=0; i<str.length() ; i++){
if(list.size() <= 0){
map = new HashMap<Character, Integer>();
map.put(str.charAt(i), 1);
list.add(map);
}else{
map = new HashMap<Character, Integer>();
map.put(str.charAt(i), 1);
fire = false;
for (HashMap<Character, Integer> t : list){
if(t.containsKey(str.charAt(i)) == map.containsKey(str.charAt(i))){
x = list.indexOf(t);
fire = true;
map.put(str.charAt(i), t.get(str.charAt(i))+1);
}
}
if(fire){
list.remove(x);
}
list.add(map);
}
}
System.out.println(list);
}
}

Get all characters from a string with their number

How in Java can I get list of all characters appearing in string, with number of their appearances ? Let's say we have a string "I am really busy right now" so I should get :
i-2, a-2, r-2, m-1 and so on.
Just have a mapping of every character and their counts. You can get the character array of a String using String#toCharArray() and loop through it using the enhanced for loop. On every iteration, get the count from the mapping, set it if absent and then increment it with 1 and put back in map. Pretty straightforward.
Here's a basic kickoff example:
String string = "I am really busy right now";
Map<Character, Integer> characterCounts = new HashMap<Character, Integer>();
for (char character : string.toCharArray()) {
Integer characterCount = characterCounts.get(character);
if (characterCount == null) {
characterCount = 0;
}
characterCounts.put(character, characterCount + 1);
}
To learn more about maps, check the Sun tutorial on the subject.
You commented that it's "for a project", but it's however a typical homework question because it's pretty basic and covered in the first chapters of a decent Java book/tutorial. If you're new to Java, I suggest to get yourself through the Sun Trails Covering the Basics.
Is it homework? Without knowing it I'll assume a best-effort answer.
The logic behind your problem is to
go trought the list one character at time
count that character: since possible characters (excluding unicode) are just 256 you can have an array of 256 ints and count them there: in this way you won't need to search the correct counter but just increment the right index.
I'm not sure of your exact needs but it seems you want to count occurrences regardless of the case, maybe also ignore characters such as whitespace, etc. So you might want something like this:
String initial = "I am really busy right now";
String cleaned = initial.replaceAll("\\s", "") //remove all whitespace characters
.toLowerCase(); // lower all characters
Map<Character, Integer> map = new HashMap<Character, Integer>();
for (char character : cleaned.toCharArray()) {
Integer count = map.get(character);
count = (count!=null) ? count + 1 : 1;
map.put(character, count);
}
for (Map.Entry<Character, Integer> entry : map.entrySet()) {
System.out.println(entry.getKey() + " : " + entry.getValue());
}
Tweak the regex to meet your exact requirements (to skip punctuation, etc).
String input = "AAZERTTYAATY";
char[] chars = input.toCharArray();
Map<Character, Integer> map = new HashMap<>();
for (char aChar : chars) {
Integer charCount = map.putIfAbsent(aChar, 1);
if (charCount != null) {
charCount++;
map.put(aChar, charCount);
}
}

Categories