More efficient way of getting frequency of words

More efficient way of getting frequency of words - java

I want to count the frequency of each word in an ArrayList by the start of the word. e.g [cat, cog, mouse] will mean there are 2 words begining with c and one word begining with m. The code I have works fine but there are 26 letters in the alphabet which will require alot more if s. Is there any other way of doing this?
public static void countAlphabeticalWords(ArrayList<String> arrayList) throws IOException
{
int counta =0, countb=0, countc=0, countd=0,counte=0;
String word = "";
for(int i = 0; i<arrayList.size();i++)
{
word = arrayList.get(i);
if (word.charAt(0) == 'a' || word.charAt(0) == 'A'){ counta++;}
if (word.charAt(0) == 'b' || word.charAt(0) == 'B'){ countb++;}
}
System.out.println("The number of words begining with A are: " + counta);
System.out.println("The number of words begining with B are: " + countb);
}

Use a Map
public static void countAlphabeticalWords(List<String> arrayList) throws IOException {
Map<Character,Integer> counts = new HashMap<Character,Integer>();
String word = "";
for(String word : list) {
Character c = Character.toUpperCase(word.charAt(0));
if (counts.containsKey(c)) {
counts.put(c, counts.get(c) + 1);
}
else {
counts.put(c, 1);
}
}
for (Map.Entry<Character, Integer> entry : counts.entrySet()) {
System.out.println("The number of words begining with " + entry.getKey() + " are: " + entry.getValue());
}
Or use a Map and AtomicInteger (as per Jarrod Roberson)
public static void countAlphabeticalWords(List<String> arrayList) throws IOException {
Map<Character,AtomicInteger> counts = new HashMap<Character,AtomicInteger>();
String word = "";
for(String word : list) {
Character c = Character.toUpperCase(word.charAt(0));
if (counts.containsKey(c)) {
counts.get(c).incrementAndGet();
}
else {
counts.put(c, new AtomicInteger(1));
}
}
for (Map.Entry<Character, AtomicInteger> entry : counts.entrySet()) {
System.out.println("The number of words begining with " + entry.getKey() + " are: " + entry.getValue());
}
Best Practices
Never do list.get(i), use for(element : list) instead. And never use ArrayList in a signature use the Interface List instead so you can change the implemenation.

How about this? Considering that the words start only with [a-zA-Z]:
public static int[] getCount(List<String> arrayList) {
int[] data = new int[26];
final int a = (int) 'a';
for(String s : arrayList) {
data[((int) Character.toLowerCase(s.charAt(0))) - a]++;
}
return data;
}
edit:
Just out of curiosity, I made a very simple test comparing my method and Steph's method with map.
List with 236 items, 10000000 iterations (without printing the result): my code took ~10000ms and Steph's took ~65000ms.
Test: http://pastebin.com/HNBgKFRk
Data: http://pastebin.com/UhCtapZZ

Now, every character can be cast to an integer, representing an ASCII decimal. For example, (int)'a' is 97. 'z''s ASCII decimal is 122. http://www.asciitable.com/
You can create a lookup table for the characters:
int characters = new int[128]
Then in your algorithm's loop use the ASCII decimal as index and increment the value:
word = arrayList.get(i);
characters[word.charAt(0)]++;
In the end, you can print the occurence of the characters:
for (int i = 97; i<=122; i++){
System.out.println(String.format("The number of words beginning with %s are: %d", (char)i, characters[i]));
}

Related

Finding the longest word ArrayList /Java

I want to write a method which finds the longest String (word). The output should be the longest word in case of two words with the same lenght the output should be: "More than one longest word".
I used ArrayList and almost had a solution, but something goes wrong. The case is that I have a problem when two words have the same lenght.
The output is :
More than one longest word
More than one longest word
14 incrementation is the longest word
Please check out piece of my code and help me to find the answer :)
public class LongestWord {
public static void main(String[] args) {
ArrayList<String> wordsList = new ArrayList<String>();
wordsList.add("december");
wordsList.add("california");
wordsList.add("cat");
wordsList.add("implementation");
wordsList.add("incrementation");
int largestString = wordsList.get(0).length();
int index = 0;
for (int i = 0; i < wordsList.size(); i++) {
if (wordsList.get(i).length() > largestString) {
largestString = wordsList.get(i).length();
index = i;
}else if(wordsList.get(i).length() == largestString){
largestString = wordsList.get(i).length();
index = i;
System.out.println("More than one longest word");
}
}
System.out.println(largestString +" " + wordsList.get(index) +" is the longest word ");
}
}

The fact is that you can't tell what the biggest word until you have iterated the whole list.
So iterate on the list
if word is bigger than previous largest size : clear list and save word
if word has same size as largest size : save word
if word is smaller : nothing
List<String> wordsList = Arrays.asList(
"december", "california", "cat",
"implementation", "incremntation");
int maxLength = Integer.MIN_VALUE;
List<String> largestStrings = new ArrayList<>();
for (String s : wordsList) {
if (s.length() > maxLength) {
maxLength = s.length();
largestStrings.clear();
largestStrings.add(s);
} else if (s.length() == maxLength) {
largestStrings.add(s);
}
}
if (largestStrings.size() > 1) {
System.out.println("More than one longest word");
System.out.println(largestStrings);
} else {
System.out.println(largestStrings.get(0) + " is the longest word");
}
Gives
More than one longest word
[implementation, incrementation]

azro is right. You can figure out the problem using two iteration. I m not sure but the code below works
for (int i = 0; i < wordsList.size(); i++) {
if (wordsList.get(i).length() > largestString) {
largestString = wordsList.get(i).length();
index = i;
}
}
for (int i = 0; i < wordsList.size(); i++) {
if (wordsList.get(index).length() == wordsList.get(i).length()) {
System.out.println("More than one longest word");
break;
}
}

You can do this with one loop iteration. Storing the longest word(s) as you go.
import java.util.*;
public class Test {
public static void main(String[] args) {
final Collection<String> words = Arrays.asList(
"december", "california", "cat",
"implementation", "incrementation");
final Collection<String> longestWords = findLongestWords(words);
if (longestWords.size() == 1) {
System.out.printf("The longest word is: %s\n", longestWords.iterator().next());
} else if (longestWords.size() > 1) {
System.out.printf("More than one longest word. The longest words are: %s\n", longestWords);
}
}
private static final Collection<String> findLongestWords(final Collection<String> words) {
// using a Set, so that duplicate words are stored only once.
final Set<String> longestWords = new HashSet<>();
// remember the current length of the longest word
int lengthOfLongestWord = Integer.MIN_VALUE;
// iterate over all the words
for (final String word : words) {
// the length of this word is longer than the previously though longest word. clear the list and update the longest length.
if (word.length() > lengthOfLongestWord) {
lengthOfLongestWord = word.length();
longestWords.clear();
}
// the length of this word is currently though to be the longest word, add it to the Set.
if (word.length() == lengthOfLongestWord) {
longestWords.add(word);
}
}
// return an unmodifiable Set containing the longest word(s)
return Collections.unmodifiableSet(longestWords);
}
}

My two cents to make it done in the single loop. Can be improved further.
ArrayList<String> wordsList = new ArrayList<String>();
wordsList.add("december");
wordsList.add("california");
wordsList.add("cat");
wordsList.add("implementation");
wordsList.add("incrementation");
String result;
int length = Integer.MIN_VALUE;
Map<String,String> map = new HashMap<>();
for(String word: wordsList){
if(word.length() >= length) {
length = word.length();
if (map.containsKey(String.valueOf(word.length())) || map.containsKey( "X" + word.length())) {
map.remove(String.valueOf(word.length()));
map.put("X" + word.length(), word);
} else {
map.put(String.valueOf(word.length()), word);
}
}
}
result = map.get(String.valueOf(length)) == null ? "More than one longest word" :
map.get(String.valueOf(length)) + " is the longest word";
System.out.println(result);

Here is one approach. I am using a set to hold the results as there is no reason to include duplicate words if they exist.
iterate over the words
if the current word length is > maxLength, clear the set and add the word, and update maxLength
if equal to the maxLength, just add the word.
List<String> wordsList = List.of("december", "implementation",
"california", "cat", "incrementation");
int maxLength = Integer.MIN_VALUE;
Set<String> results = new HashSet<>();
for (String word : wordsList) {
int len = word.length();
if (len >= maxLength) {
if (len > maxLength) {
results.clear();
maxLength = len;
}
results.add(word);
}
}
System.out.printf("The longest word%s -> %s%n", results.size() > 1 ? "s" : "", results);
prints
The longest words -> [implementation, incrementation]

I changed your code to suggest a different approach to the problem. Honestly, I hope you'll find it fascinating and helpful.
There are two different fashion of it, one that doesn't care about finding more than one longest word (it stamps just the first one - but you can change it as you prefer), and the other one that does.
First solution:
`
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
public class LongestWord {
public static void main(String[] args) {
List<String> wordsList = new ArrayList<>();
wordsList.add("december");
wordsList.add("california");
wordsList.add("cat");
wordsList.add("implementation");
wordsList.add("incrementation");
wordsList.stream()
.max(LongestWord::compare)
.ifPresent(a -> System.out.println(a.toUpperCase() + " is the longest word with length of: " + a.length()));
}
private static int compare(String a1, String b1) {
return a1.length() - b1.length();
}
}
`
Second solution:
`
public class LongestWord {
public static void main(String[] args) {
List<String> wordsList = new ArrayList<>();
wordsList.add("december");
wordsList.add("california");
wordsList.add("cat");
wordsList.add("implementation");
wordsList.add("incrementation");
int max_length = wordsList.stream()
.max(LongestWord::compare)
.map(String::length).orElse(0);
List<String> finalWordsList = wordsList.stream()
.filter(word -> word.length() == max_length)
.collect(Collectors.toList());
if (finalWordsList.size() > 1) {
System.out.println("More than one longest word");
} else {
System.out.println(finalWordsList.get(0) + " is the longest word");
}
}
private static int compare(String a1, String b1) {
return a1.length() - b1.length();
}
}
`

Count number of matching strings after replacing a character and all its occurrence

I have an array of strings arr and another input string s.
Now my task is to pick a character from s and replace all the occurrences of that letter in s with another character. Then rearrange the characters if needed but this is optional. Now count how many of them are matching with array elements.
I have written code in Java for this, but the approach I followed is not correct.
Example:
s = aabbcdbb
arr = {"aabbcdbbb", "aabbcdb", "aabbxdbb", "aabbbdbb", "aacccdcc", "ddbbcdbb", "eebbcdbb"}
Output :
5
explanation:
length of s = 8
sorting s = aabbbbcd
arr[0] = has 9 characters i.e more than s length so ignored
arr[1] = has 7 characters i.e less than s length so ignored
arr[2] = sorting : aabbbbdx. replace x with c and rearranging it makes this as aabbbbcd
arr[3] = sorting : aabbbbbd. replace 1 occurrence of b with c and rearranging it makes this as aabbbbcd
arr[4] = sorting : aacccccd. replace 4 occurrences of c with b and rearranging it makes this as aabbbbcd
arr[5] = sorting : bbbbcddd. replace 2 occurrences of d with a and rearranging it makes this as aabbbbcd
arr[6] = sorting : bbbbcdee. replace e with a and rearranging it makes this as aabbbbcd
so arr[2], arr[3], arr[4], arr[5], arr[6] matches the given requirement so output is 5.
I tried this program but this fails for some inputs:
static int process(String s, String[] arr) {
int matches = 0;
Map<Character, Integer> m = new HashMap<>();
// sort s
char[] c = s.toCharArray();
Arrays.sort(c);
s = new String(c);
c = s.toCharArray();
// get each char of s and count its occurrences
for(char k : c) {
m.put(k, m.getOrDefault(k, 0)+1);
}
for(String s1 : arr) {
// get array element
char[] c1 = s1.toCharArray();
// check if array element length matches with input string length
if(c1.length == c.length) {
// count each occurrence of char into array of alphabets
int[] chars = new int[26];
for(char k1: c1) {
chars[k1-'a']++;
}
// decrement count by checking with map
for(char k : m.keySet()) {
chars[k-'a'] -= m.get(k);
}
boolean f1 = false;
boolean valid = true;
int mismatch = 0;
int notzeros = 0;
// get each element from array of chars
for(int i=0; i<26; i++) {
int ch = chars[i];
// value not zero
if(ch != 0) {
// count number of non zeros
notzeros++;
// f1 is true, means its second occurrence of non zero element
if(f1) {
if(ch > 0) {
// check if values do not match
if(mismatch*-1 != ch) {
valid = false;
break;
}
} else {
// check if values do not match
if(mismatch != ch*-1) {
valid = false;
break;
}
}
}
// get the mismatch count and set the value of flag to true
f1 = true;
mismatch = ch;
}
// if non zero elements more than 2 then we can ignore this array element
if(notzeros > 2) {
valid = false;
break;
}
}
// check for possible solution.
if(valid && f1) {
matches++;
}
}
}
return matches;
}
This program works for the given test case.
Now if I send the below input it fails.
example: s = abba
arr = {'aadd" ,"abbb"};
expected output: 1
explanation:
sorted s = aabb
arr[0] = aadd, replace d with b then we get aabb
arr[1] = abbb, we cannot replace all occurrences of a single character to get s as output, so ignored.
So the output is 1.
But my program is printing 2 which is not correct.
My approach to solve this task is not correct, what is the correct way to do this?

First of all, it seems the explanation you provided is based on a slightly misunderstood formulation of the problem. The problem consists of checking whether all occurrences of a character can be replaced with a different character in the string s, not the string in the array.
So for example, with s = "aabbcdbb", and array string "aabbbdbb", you can replace the c character in s with a b to obtain the array string. It's not the other way around. That explains the inconsistency of the expected outputs for the two input samples (as raised in the comments).
Your implementation is generally correct but fails on a special case. The way you're solving it is by basically generating a "diff" array containing the difference in occurrence for each character. You then expect that in the diff, you have only two different occurrences that negate each other. To illustrate with the previous example, you map the characters of s:
a -> 2
b -> 4
c -> 1
d -> 1
similarly with the current array element:
a -> 2
b -> 5
d -> 1
the difference will be:
b -> 1
c -> -1
This fails when you have s = "aabb" and a string "abbb", where the diff is:
a -> -1
b -> 1
The problem here is that both characters a and b occur in the string "abbb". This should fail the match check. The reason is: if we want to go from "abbb" to "aabb", we would need to replace a b with an a. But "abbb" already has an a character, which would not have been there if the opposite side replaced a with b.
The code can be modified to handle this case (the part that uses diffInS1):
for(String s1 : arr) {
// get array element
char[] c1 = s1.toCharArray();
// check if array element length matches with input string length
if(c1.length == c.length) {
// count each occurrence of char into array of alphabets
int[] chars = new int[26];
int[] diff = new int[26];
for(char k1: c1) {
chars[k1-'a']++;
diff[k1-'a']++;
}
// decrement count by checking with map
for(char k : m.keySet()) {
diff[k-'a'] = chars[k-'a'] - m.get(k);
}
boolean valid = true;
int mismatch = 0;
int notzeros = 0;
int diffInS1 = 0;
// get each element from array of chars
for(int i=0; i<26; i++) {
int ch = diff[i];
// value not zero
if(ch != 0) {
// count number of non zeros
notzeros++;
// second occurrence of non zero element
if(notzeros > 1) {
// check if values do not match
if(mismatch*-1 != ch) {
valid = false;
break;
}
}
if(chars[i] > 0) {
diffInS1++;
}
// get the mismatch count
mismatch = ch;
}
// if non zero elements more than 2 then we can ignore this array element
if(notzeros > 2 || diffInS1 == 2) {
valid = false;
break;
}
}
// check for possible solution.
if(valid && notzeros > 0) {
matches++;
}
}
}

I will offer a similar approach. Let's analyze when two string are "matching strings after replacing a character and all its occurrence". Let's assume we have 2 maps of char count. Now we need to calculate their differences. The two strings are matching when the two left maps, have one entry, and the counter is equal.
Let's do an example. aabbbbcd will create map:
a -> 2
b -> 4
c -> 1
d -> 1
aabbxdbb will create:
a -> 2
b -> 4
x -> 1
d -> 1
The difference will be:
First map will remain:
c -> 1
Second map:
x -> 1
Therefore those two match. Let's see how to write this.
First, this is the method to get this map:
private static Map<Character, Integer> getMap(String s) {
Map<Character, Integer> result = new HashMap<>();
for (char c : s.toCharArray()) {
if (result.containsKey(c)) {
result.put(c, result.get(c) + 1);
} else {
result.put(c, 1);
}
}
return result;
}
Now we can define a method that will create a predicate:
private static Predicate<String> getPredicate(String s) {
Map<Character, Integer> sMap = getMap(s);
Predicate<String> p = s1 -> {
Map<Character, Integer> s1Map = getMap(s1);
Map<Character, Integer> sMapCopy = getMap(s);
for (Map.Entry<Character, Integer> kvp : sMap.entrySet()) {
if (s1Map.containsKey(kvp.getKey())) {
if (s1Map.get(kvp.getKey()) < kvp.getValue()) {
sMapCopy.put(kvp.getKey(), kvp.getValue() - s1Map.get(kvp.getKey()));
s1Map.remove(kvp.getKey());
} else if (kvp.getValue() < s1Map.get(kvp.getKey())) {
s1Map.put(kvp.getKey(), s1Map.get(kvp.getKey()) - kvp.getValue());
sMapCopy.remove(kvp.getKey());
} else {
sMapCopy.remove(kvp.getKey());
s1Map.remove(kvp.getKey());
}
}
}
boolean result = sMapCopy.size() == 1 && s1Map.size() == 1;
if (result) {
for (Map.Entry<Character, Integer> kvp : sMapCopy.entrySet()) {
for (Map.Entry<Character, Integer> kvp1 : s1Map.entrySet()) {
System.out.println(s + " and " + s1 + " can be replaced. Replace " + kvp.getValue() + " instances of " + kvp.getKey() + " with " + kvp1.getValue() + " instances of " + kvp1.getKey());
}
}
} else {
System.out.println(s + " and " + s1 + " cannot be replaced.");
}
return result;
};
return p;
}
Then we run the following:
String[] strings = {"aabbcdbbb", "aabbcdb", "aabbxdbb", "aabbbdbb", "aacccdcc", "ddbbcdbb", "eebbcdbb"};
long result = Arrays.stream(strings).filter(getPredicate("aabbcdbb")).count();
System.out.println("Replacables count: " + result);
and we get the output:
aabbcdbb and aabbcdbbb cannot be replaced.
aabbcdbb and aabbcdb cannot be replaced.
aabbcdbb and aabbxdbb can be replaced. Replace 1 instances of c with 1 instances of x
aabbcdbb and aabbbdbb can be replaced. Replace 1 instances of c with 1 instances of b
aabbcdbb and aacccdcc can be replaced. Replace 4 instances of b with 4 instances of c
aabbcdbb and ddbbcdbb can be replaced. Replace 2 instances of a with 2 instances of d
aabbcdbb and eebbcdbb can be replaced. Replace 2 instances of a with 2 instances of e
Replacables count: 5

Scanning string for keywords of various lengths

I want to scan my document split into array of words for certain keywords such as 'Fuel', 'Vehicle', 'Vehicle Leasing', 'Asset Type Maintenance' etc. The problem is that the keywords are of different lengths. One is a single word keyword, the other is 4 words keyword. At the moment I'm scanning word after word but that doesn't like the idea of multiple word keywords such as 'Vehicle Leasing' for example.
What can I do to improve my code and to work with multiple word keywords?
This is how it looks now
public void findKeywords(POITextExtractor te, ArrayList<HashMap<String,Integer>> listOfHashMaps, ArrayList<Integer> KeywordsFound, ArrayList<Integer> existingTags) {
String document = te.getText().toString();
String[] words = document.split("\\s+");
int wordsNo = 0;
int keywordsMatched = 0;
try {
for(String word : words) {
wordsNo++;
for(HashMap<String, Integer> hashmap : listOfHashMaps) {
if(hashmap.containsKey(word) && !KeywordsFound.contains(hashmap.get(word)) && !existingTags.contains(hashmap.get(word))) {
KeywordsFound.add(hashmap.get(word));
keywordsMatched++;
System.out.println(word);
}
}
}
System.out.println("New keywords found: " + KeywordsFound);
System.out.println("Number of words in document = " + wordsNo);
System.out.println("Number of keywords matched: " + keywordsMatched);
} catch (IllegalArgumentException e) {
e.printStackTrace();
}
}
I have included my method. If there's anything else required to understand my code, leave a comment please.
#UPDATE
public void findKeywords(POITextExtractor te, ArrayList<HashMap<String,Integer>> listOfHashMaps, ArrayList<Integer> KeywordsFound, ArrayList<Integer> existingTags) {
String document = te.getText().toString();
String[] words = document.split("\\s+");
int wordsNo = 0;
int keywordsMatched = 0;
for(HashMap<String, Integer> hashmap : listOfHashMaps) {
Iterator it = hashmap.entrySet().iterator();
while (it.hasNext()) {
Map.Entry pair = (Map.Entry)it.next();
//System.out.println(pair.getKey() + " = " + pair.getValue());
it.remove(); // avoids a ConcurrentModificationException
if(document.contains((CharSequence) pair.getKey()) && !KeywordsFound.contains(pair.getValue()) && !existingTags.contains(pair.getValue())) {
System.out.println(pair.getKey());
KeywordsFound.add((Integer) pair.getValue());
keywordsMatched++;
}
}
}
System.out.println("New keywords found: " + KeywordsFound);
System.out.println("Number of keywords matched: " + keywordsMatched);
}

Another way of doing it would be to split the string by the search strings.
eg.
List<String> searchString = new ArrayList<>();
searchString.add("Fuel");
searchString.add("Asset Type Maintenance");
searchString.add("Vehicle Leasing");
String document=""; // Assuming that you complete string is initilaized here.
for (String str : searchString) {
String[] tempDoc=document.split(str);
System.out.println(str + " is repated "+ (tempDoc.length-1) + " times");
Note this might thrash the JVM in garbage collection.
You can compare the performance on you own.

I assume this is a kind of homework. Therefore:
Have a look at string search algorithms that search for a substring (pattern) in a larger string.
Then assume that you use one of this algorithms, but instead of having a sequence of chars (pattern) that you search for in a larger sequence of chars, you have a sequence of string (pattern) that you search for in a larger sequence of string. (so you just have a different, much larger, alphabet)

Output one occurence of a character in a string c#

This program should output only one occurrence of a character in a string then specify the number of occurrence in that string. It should be sorted in ascending order depending on the number of occurrences of that particular character. It's working except on the (char)i part. Does it have something to do with ASCII codes or something?
Desired Output:
b: 1
d:1
a:2
s:2
Code's output:
ü: 1
ý: 1
þ: 2
ÿ: 2
public class HuffmanCode {
static String string;
static Scanner input = new Scanner(System.in);
public static void main(String args[]){
System.out.print("Enter a string: ");
string = input.nextLine();
int count[] = countOccurence(string);
Arrays.sort(count);
for (int i = 0; i < count.length; i++) {
if (count[i] > 0)
System.out.println((char)i + ": " + count[i]);
}
}
public static int[] countOccurence(String str){
int counts[] = new int[256];
for(int i=0;i<str.length();i++){
char charAt = str.charAt(i);
counts[(int)charAt]++;
}
return counts;
}
}

In Java 8, you could use the Stream API and do something like this:
String input = "ababcabcd" ;
input.chars() // split the string to a stream of int representing the chars
.boxed() // convert to stream of Integer
.collect(Collectors.groupingBy(c->c,Collectors.counting())) // aggregate by counting the letters
.entrySet() // collection of entries (key, value), i.e. char, count
.stream() // corresponding stream
.sorted(Map.Entry.comparingByValue()) // sort by value, i.e. by number of occurence of letters
.forEach(e->System.out.println((char)(int)e.getKey() + ": " + e.getValue())); // Output the result
The result would be:
d: 1
c: 2
a: 3
b: 3
I hope it helps.
EDIT:
Suppose your input is
String input = "ababc\u0327abçd" ;
We would have in that case ababçabçdas input and we need normalization to make sure we properly count the letters that are the same, with different representations. To achieve that, we preprocess the inputstring using Normalization, which was introduced in JDK6:
input = Normalizer.normalize(input, Form.NFC);

Create a list and sort it instead of sorting count.
List<int[]> list = new ArrayList<>();
for (int i = 0; i < count.length; i++) {
if (count[i] > 0)
list.add(new int[] {i , count[i]});
}
Collections.sort(list, Comparator.comparing(a -> a[1]));
for (int[] a : list) {
System.out.println((char)a[0] + ": " + a[1]);
}

You could use TreeMap with a combination of custom Comparator
Here's an example
String test = "ABBCCCDDDDEEEEEFFFFFF";
Map<Character, Integer> map = new HashMap<>();
for (Character c : test.toCharArray()) {
if (!map.containsKey(c)) map.put(c, 0);
map.put(c, map.get(c) + 1);
}
Map<Character, Integer> tMap = new TreeMap<>(new MyComparator(map));
tMap.putAll(map);
for (Map.Entry<Character, Integer> entry : tMap.entrySet()) {
System.out.println(entry.getKey() + ": " + entry.getValue());
}
And here's the implementation of MyComparator
class MyComparator implements Comparator<Object> {
Map<Character, Integer> map;
public MyComparator(Map<Character, Integer> map) {
this.map = map;
}
public int compare(Object o1, Object o2) {
if (map.get(o1).equals(map.get(o2)))
return 1;
else
return (map.get(o1)).compareTo(map.get(o2));
}
}

remove repeated words from String Array

Good Morning
I write a function that calculates for me the frequency of a term:
public static int tfCalculator(String[] totalterms, String termToCheck) {
int count = 0; //to count the overall occurrence of the term termToCheck
for (String s : totalterms) {
if (s.equalsIgnoreCase(termToCheck)) {
count++;
}
}
return count;
}
and after that I use it on the code below to calculate every word from a String[] words
for(String word:words){
int freq = tfCalculator(words, word);
System.out.println(word + "|" + freq);
mm+=word + "|" + freq+"\n";
}
well the problem that I have is that the words repeat here is for example the result:
cytoskeletal|2
network|1
enable|1
equal|1
spindle|1
cytoskeletal|2
...
...
so can someone help me to remove the repeated word and get as result like that:
cytoskeletal|2
network|1
enable|1
equal|1
spindle|1
...
...
Thank you very much!

Java 8 solution
words = Arrays.stream(words).distinct().toArray(String[]::new);
the distinct method removes duplicates. words is replaced with a new array without duplicates

I think here you want to print the frequency of each string in the array totalterms . I think using Map is a easier solution as in the single traversal of the array it will store the frequency of all the strings Check the following implementation.
public static void printFrequency(String[] totalterms)
{
Map frequencyMap = new HashMap<String, Integer>();
for (String string : totalterms) {
if(frequencyMap.containsKey(string))
{
Integer count = (Integer)frequencyMap.get(string);
frequencyMap.put(string, count+1);
}
else
{
frequencyMap.put(string, 1);
}
}
Set <Entry<String, Integer>> elements= frequencyMap.entrySet();
for (Entry<String, Integer> entry : elements) {
System.out.println(entry.getKey()+"|"+entry.getValue());
}
}

You can just use a HashSet and that should take care of the duplicates issue:
words = new HashSet<String>(Arrays.asList(words)).toArray(new String[0]);
This will take your array, convert it to a List, feed that to the constructor of HashSet<String>, and then convert it back to an array for you.

Sort the array, then you can just count equal adjacent elements:
Arrays.sort(totalterms);
int i = 0;
while (i < totalterms.length) {
int start = i;
while (i < totalterms.length && totalterms[i].equals(totalterms[start])) {
++i;
}
System.out.println(totalterms[start] + "|" + (i - start));
}

in two line :
String s = "cytoskeletal|2 - network|1 - enable|1 - equal|1 - spindle|1 - cytoskeletal|2";
System.out.println(new LinkedHashSet(Arrays.asList(s.split("-"))).toString().replaceAll("(^\[|\]$)", "").replace(", ", "- "));

Your code is fine, you just need keep track of which words were encountered already. For that you can keep a running set:
Set<String> prevWords = new HashSet<>();
for(String word:words){
// proceed if word is new to the set, otherwise skip
if (prevWords.add(word)) {
int freq = tfCalculator(words, word);
System.out.println(word + "|" + freq);
mm+=word + "|" + freq+"\n";
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

More efficient way of getting frequency of words - java

Related

Finding the longest word ArrayList /Java

Count number of matching strings after replacing a character and all its occurrence

Scanning string for keywords of various lengths

Output one occurence of a character in a string c#

remove repeated words from String Array

Categories

Resources