So I was developing an algorithm to count the number of repetitions of each character in a given word. I am using a HashMap and I add each unique character to the HashMap as the key and the value is the number of repetitions. I would like to know what the run time of my solution is and if there is a more efficient way to solve the problem.
Here is the code :
public static void getCount(String name){
public HashMap<String, Integer> names = new HashMap<String, Integer>() ;
for(int i =0; i<name.length(); i++){
if(names.containsKey(name.substring(i, i+1))){
names.put(name.substring(i, i+1), names.get(name.substring(i, i+1)) +1);
}
else{
names.put(name.substring(i, i+1), 1);
}
}
Set<String> a = names.keySet();
Iterator i = a.iterator();
while(i.hasNext()){
String t = (String) i.next();
System.out.println(t + " Ocurred " + names.get(t) + " times");
}
}
The algorithm has a time complexity of O(n), but I'd change some parts of your implementation, namely:
Using a single get() instead of containsKey() + get();
Using charAt() instead of substring() which will create a new String object;
Using a Map<Character, Integer> instead of Map<String, Integer> since you only care about a single character, not the entire String:
In other words:
public static void getCount(String name) {
Map<Character, Integer> names = new HashMap<Character, Integer>();
for(int i = 0; i < name.length(); i++) {
char c = name.charAt(i);
Integer count = names.get(c);
if (count == null) {
count = 0;
}
names.put(c, count + 1);
}
Set<Character> a = names.keySet();
for (Character t : a) {
System.out.println(t + " Ocurred " + names.get(t) + " times");
}
}
Your solution is O(n) from an algorithmic perspective, which is already optimal (at a minimum you have to inspect each character in the entire string at least once which is O(n)).
However there are a couple of ways that you could speed it up be reducing the constant overhead, e.g.
Use a HashMap<Character,Integer>. Characters will be much more efficient than Strings of length 1.
use charAt(i) instead of substring(i,i+1). This avoids creating a new String which will help you a lot. Probably the biggest single improvement you can make.
If the string is going to be long (e.g. thousands of characters or more), consider using an int[] array to count the individual characters rather than a HashMap, with the character's ASCII value used as an index into the array. This isn't a good idea if your Strings are short though.
Store the initial time to a variable, like so:
long start = System.currentTimeMillis();
then at the end, when you finish, print out the current time minus the start time:
System.out.println((System.currentTimeMillis() - start) + "ms taken");
to see the time taken to do it. As far as I can tell, that is the most efficient way to do it, but there may be another good method. Also, use char rather than strings for each individual character (as char/Character is the best class for characters, strings for a series of chars) then do name.charAt(i) rather than name.substring(i, i+1) and change your hashmap to HashMap<Character, Integer>
String s="good";
//collect different unique characters
ArrayList<String> temp=new ArrayList<>();
for (int i = 0; i < s.length(); i++) {
char c=s.charAt(i);
if(!temp.contains(""+c))
{
temp.add(""+s.charAt(i));
}
}
System.out.println(temp);
//get count of each occurrence in the string
for (int i = 0; i < temp.size(); i++) {
int count=0;
for (int j = 0; j < s.length(); j++) {
if(temp.get(i).equals(s.charAt(j)+"")){
count++;
}
}
System.out.println("Occurance of "+ temp.get(i) + " is "+ count+ " times" );
}*/
Related
Characters of given string must be sorted according to the order defined by another pattern string. Requirements for complexity O(n + m) where n is length of string and m is length of pattern.
Example:
Pattern: 1234567890AaBbCcDdEeFfGgHh
String: dH7ee2D6a341Fb9Ea20dhC1g7ca32Ba2Gac5f76A2g
Result: 112222233456677790AaaaaaBbCccDddEeeFfGggHh
Pattern has all characters of the string and each one appears in pattern only once.
My code:
// Instances of possible values for input:
String pattern = "1234567890AaBbCcDdEeFfGgHh";
String string = "dH7ee2D6a341Fb9Ea20dhC1g7ca32Ba2Gac5f76A2g";
// Builder to collect characters for sorted result:
StringBuilder result = new StringBuilder();
// Hash table based on characters from pattern to count occurrence of each character in string:
Map<Character, Integer> characterCount = new LinkedHashMap<>();
for (int i = 0; i < pattern.length(); i++) {
// Put each character from pattern and initialize its counter with initial value of 0:
characterCount.put(pattern.charAt(i), 0);
}
// Traverse string and increment counter at each occurrence of character
for (int i = 0; i < string.length(); i++) {
char ch = string.charAt(i);
Integer count = characterCount.get(ch);
characterCount.put(ch, ++count);
}
// Traverse completed dictionary and collect sequentially all characters collected from string
for (Map.Entry<Character, Integer> entry : characterCount.entrySet()) {
Integer count = entry.getValue();
if (count > 0) {
Character ch = entry.getKey();
// Append each character as many times as it appeared in string
for (int i = 0; i < count; i++) {
result.append(ch);
}
}
}
// Get final result from builder
return result.toString();
Is this code optimal? Is there any way to improve this algorithm? Do I understand correctly that it satisfies the given complexity O(n + m)?
Not sure if timing wise yours or mine is faster.
But here's an alternative:
import java.math.BigDecimal;
class Playground {
public static void main(String[ ] args) {
String pattern = "1234567890AaBbCcDdEeFfGgHh";
String s = "dH7ee2D6a341Fb9Ea20dhC1g7ca32Ba2Gac5f76A2g";
long startTime = System.nanoTime();
StringBuilder sb = new StringBuilder();
for (char c : pattern.toCharArray()) {
sb.append(s.replaceAll("[^" + c + "]", ""));
}
System.out.println(sb.toString());
BigDecimal elapsedTime =
new BigDecimal( String.valueOf(System.nanoTime() - startTime)
)
.divide(
new BigDecimal( String.valueOf(1_000_000_000)
)
);
System.out.println(elapsedTime + " seconds");
}
}
Explanation:
For each character in pattern, use a String's regex based replaceAll method to replace all characters except the current one with an empty string. Rinse and repeat. That will leave you with the count of each character in original intact, ordered by the character sequence of pattern.
Outputs:
112222233456677790AaaaaaBbCccDddEeeFfGggHh
0.021151652 seconds
(The timing is somewhat subjective. It came from the Sololearn Java Playground. It obviously depends on the current load on their servers)
beginner at java was asked in an interview
here i have to count the occurrence of each word in a given sentence.
for eg( "chair is equal to chair but not equal to table."
Output : chair :2,
is :1,
equal :2,
to :2,
but :1,
not :1,
table :1 )
I have written some part of the code and tried using for loop but i failed....
public static void main(String[] args)
{
int counter = 0;
String a = " To associate myself with an organization that provides a challenging job and an opportunity to provide innovative and diligent work.";
String[] b =a.split(" "); //stored in array and splitted
for(int i=0;i<b.length;i++)
{
counter=0;
for(int j<b.length;j>0;j--)
{
if(b[i] = b[j])
//
}
}
}
}
Use a hashmap to count frequency of objects
import java.util.HashMap;
import java.util.Map.Entry;
public class Funly {
public static void main(String[] args) {
int counter = 0;
String a = " To associate myself with an organization that provides a challenging job and an opportunity to provide innovative and diligent work.";
String[] b = a.split(" "); // stored in array and splitted
HashMap<String, Integer> freqMap = new HashMap<String, Integer>();
for (int i = 0; i < b.length; i++) {
String key = b[i];
int freq = freqMap.getOrDefault(key, 0);
freqMap.put(key, ++freq);
}
for (Entry<String, Integer> result : freqMap.entrySet()) {
System.out.println(result.getKey() + " " + result.getValue());
}
}
}
Quite easy since Java8:
public static Map<String, Long> countOccurrences(String sentence) {
return Arrays.stream(sentence.split(" "))
.collect(Collectors.groupingBy(
Function.identity(), Collectors.counting()
)
);
}
I would also remove non literal symbols, and convert to lowecase before running:
String tmp = sentence.replaceAll("[^A-Za-z\\s]", "");
So your final main method for interview will be:
ppublic static void main(String[] args) {
String sentence = "To associate myself with an organization that provides a challenging job and an opportunity to provide innovative and diligent work.";
String tmp = sentence.replaceAll("[^A-Za-z\\s]", "").toLowerCase();
System.out.println(
countOccurrences(tmp)
);
}
Output is:
{diligent=1, a=1, work=1, myself=1, opportunity=1, challenging=1, an=2, associate=1, innovative=1, that=1, with=1, provide=1, and=2, provides=1, organization=1, to=2, job=1}
A simple (but not very efficient) way would be to add all the elements to a set, which doesn't allow duplicates. See How to efficiently remove duplicates from an array without using Set. Then iterate through the set and count the number of occurrences in your array, printing out the answer after each set element you check.
There are several solutions to this and I'm not going to provide you with any of them. However, I'm going to give you a rough outline of one possible solution:
You could use a Map, for example a HashMap, where you use the words as keys and the number of their occurrence as values. Then, all you need to do is to split the input string on spaces and iterate over the resulting array. For each word, you check if it already exists in the map. If so you increase the value by one, otherwise you add the word to the map and set the value to 1. After that, you can iterate over the map to create the desired output.
You need to use Map data structure which stores data in key-value pairs.
You can use the HashMap (implementation of Map) to store each word as key and their occurance as the value inside the Map as shown in the below code with inline comments:
String[] b =a.split(" "); //split the array
Map<String, Integer> map = new HashMap<>();//create a Map object
Integer counter=null;//initalize counter
for(int i=0;i<b.length;i++) { //loop the whole array
counter=map.get(b[i]);//get element from map
if(map.get(b[i]) == null) { //check if it already exists
map.put(b[i], 1);//not exist, add with counter as 1
} else {
counter++;//if already eists, increment the counter & put to Map
map.put(b[i], counter);
}
}
Using simple For loops
public static void main(String[] args) {
String input = "Table is this Table";
String[] arr1 = input.split(" ");
int count = 0;
for (int i = 0; i < arr1.length; i++) {
count = 0;
for (int j = 0; j < arr1.length; j++) {
String temp = arr1[j];
String temp1 = arr1[i];
if (j < i && temp.contentEquals(temp1)) {
break;
}
if (temp.contentEquals(temp1)) {
count = count + 1;
}
if (j == arr1.length - 1) {
System.out.println(">>" + arr1[i] + "<< is present >>" + count + "<< number of times");
}
}
}
}
So, I need to write a program using loops that takes a string and counts what and how many letters appear in that string. (string "better butter" would print "b appears 2 times, e appears 3 times, ' '(space) appears 1 time, and so on). While I understand the idea and concept behind this assignment, actually pulling it off has been rough.
My nested for loop is where the problems are coming from, I assume. What I've written only loops once (i think) and just shows the first character and says there's only one of that character.
Edit: Preferably without using Map or arrays. I'm fine with using them if it's the only way, but they've not been covered in my class so I'm trying to avoid them. Every other similar question to this (that I've found) uses Map or array.
import java.util.Scanner;
class myString{
String s;
myString() {
s = "";
}
void setMyString(String s) {
this.s = s;
}
String getMyString() {
return s;
}
String countChar(String s){
s = s.toUpperCase();
int cnt = 0;
char c = s.charAt(cnt);
for (int i = 0; i <= s.length(); i++)
for (int j = 0; j <= s.length(); j++) //problem child here
c = s.charAt(cnt);
cnt++;
if (cnt == 1)
System.out.println(c+" appears "+cnt+" time in "+s);
else
System.out.println(c+" appears "+cnt+" times in "+s);
return "for"; //this is here to prevent complaint from the below end bracket.
}
}
public class RepeatedCharacters {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
String s;
System.out.println("Enter a sentence: ");
s = in.nextLine();
myString myS = new myString();
// System.out.println(myS.getMyString());
// System.out.println(myS.countChar());
myS.countChar(s);
}
}
First you will need to scan the entire string and store the
counts of each characters. Later you can just print the counts.
Algorithm 1:
Use a HashMap to store the character as key and its count as value. (If you are new to Java, you might want to read up on
HashMaps.)
Every time you read a character in your for loop, check if it present in the HashMap. If yes, then increment the count by 1. Else
add a new characters to the map with count 1.
Printing:
Just iterate on your HashMap and print out the character and
their respective counts.
Issue with your code: You are trying to print the count as soon as you
read a character. But the character might appear again later in the
string. So you need to keep track of the characters you have already
read.
Algorithm 2:
String countChar(String s){
has_processed = []
for i = 0 to n
cnt = 0
if s.charAt(i) has been processed
continue;
for j = i+1 to n
if (s.charAt(i) == s.charAt(j))
cnt++
add s.charAt(i) to has_processed array
print the count of s.charAt(i)
}
Use a frequency array to get an answer in linear time.
/* package whatever; // don't place package name! */
import java.util.*;
import java.lang.*;
import java.io.*;
/* Name of the class has to be "Main" only if the class is public. */
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
String s = "better butter";
int freq[] = new int[26];
int i;
for (i = 0; i < s.length(); i++) {
if (s.charAt(i) >= 'a' && s.charAt(i) <= 'z')
freq[s.charAt(i)-'a']++;
}
for (i = 0; i < freq.length; i++) {
if (freq[i] == 0) continue;
System.out.println((char)(i+'a') + " appears " + freq[i] + " times" );
}
}
}
Ideone Link
Note that this can be expanded to include uppercase letters, but for demonstrative purposes, only lowercase letters are handled in the above code.
EDIT: While the OP did ask if it was possible to do this without an array, I would recommend against such. That solution would have terrible time complexity and repeat character counts (unless an array is used to keep track of seen characters, which is counter to the aim). Thus, the above solution is the best way to do it in a reasonable amount of time (linear) with limited space consumption.
I would do the following. Create a HashMap which keeps track of which unique characters are in the string and the count for each character.
You only need to iterate over the string once, and put each character into the HashMap. if the characer is in the map, icrement the integer count in the map, else add 1 to the map for that character. Print out the map with toString() to get the result. The whole thing can be done in about 4 lines of code.
The only thing being done in your nested for loop with the following
c = s.charAt(cnt)
is setting the c char to the value of the first letter (i.e. index 0 of the string) over and over and over until you've looped through the string n^2 times. In other words, you're not incrementing your cnt counter within the for loops at all.
Suggestion: try to use meaningful names for your variables; it will help you a lot in your career. Also class names should always start with a capital letter.
Although it is not the quickest solution in terms of performance, the most simple solution should be:
import java.util.HashMap;
import java.util.Map;
...
Map<String, Integer> freq = new HashMap<String, Integer>();
...
int count = freq.containsKey(word) ? freq.get(word) : 0;
freq.put(word, count + 1);
Source: Most efficient way to increment a Map value in Java
Please next time use the search function before posting a new question.
Here is my version of countChar(String s)
boolean countChar(String s) {
if(s==null) return false;
s = s.toUpperCase();
//view[x] will means that the characted in position x has been just read
boolean[] view = new boolean[s.length()];
/*
The main idea is:
foreach character c = s.charAt(x) in the string s, I have a boolean value view[x] which say if I have already examinated c.
If c has not been examinated yet, I search for other characters equals to c in the rest of the string.
When I found other characters equals to c, I mark it as view and I increment count with count++.
*/
for (int i = 0; i < s.length(); i++) {
if (!view[i]) {
char tmp = s.charAt(i);
int count = 0;
for (int j = i; j < s.length(); j++) {
if (!view[j] && s.charAt(j) == tmp) {
count++;
view[j] = true;
}
}
System.out.println("There were " + count + " " + tmp);
}
}
return true;
}
It should work, excuse me for my English because I'm italian
My goal is to make a function which counts occurances of some symbols (chars) in line.
An int ID is giving to every character I need to count.
The set of chars is limited and I know it from the beginning.
All the lines consist only of the chars from the giving set.
The function processes gazzilions of lines.
My profiler always shows the function which collects the stats is the slowest (97%) despite the program does a lot of other things.
First I used a HashMap and code like this:
occurances = new HashMap<>();
for (int symbol : line) {
Integer amount = 1;
if (occurances.containsKey(symbol)) {
amount += occurances.get(symbol);
}
occurances.put(symbol, amount);
}
The profiler showed hashMap.put takes 97% processor usage
Then I tried to replace it with a created once ArrayList:
And optimized it a litle bit (the lines are always longer than 1 char), but it's still very slow.
int symbol = line[0];
occurances.set(symbol, 1);
for (int i = 1; i < length; i++) {
symbol = line[i];
occurances.set(symbol, 1 + occurances.get(symbol));
}
Please if someone has some better ideas how to solve this task with better performance, your help would be very much appreceated.
As suggested here you can try to do somthing like
List<Integer> line = //get line as a list;
Map<Integer, Long> intCount = line.parallelStream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
You could try something like this:
public class CharCounter {
final int max;
final int[] counts;
public CharCounter(char max) {
this.max = (int) max;
counts = new int[this.max + 1];
}
public void addCounts(char[] line) {
for (int symbol : line) {
counts[symbol]++;
}
}
public Map<Integer, Integer> getCounts() {
Map<Integer, Integer> countsMap = new HashMap<>();
for (int symbol = 0; symbol < counts.length; symbol++) {
int count = counts[symbol];
if (count > 0) {
countsMap.put(symbol, count);
}
}
return countsMap;
}
}
This uses an array to keep the counts and uses the char itself as an index to the array.
This eliminates the need to check if a map contains the given key etc. It also removes the need for autoboxing the chars.
And a performance comparison shows roughly 20x speedup:
public static final char MIN = 'a';
public static final char MAX = 'f';
private static void count1(Map<Integer, Integer> occurrences, char[] line) {
for (int symbol : line) {
Integer amount = 1;
if (occurrences.containsKey(symbol)) {
amount += occurrences.get(symbol);
}
occurrences.put(symbol, amount);
}
}
private static void count2(CharCounter counter, char[] line) {
counter.addCounts(line);
}
public static void main(String[] args) {
char[] line = new char[1000];
for (int i = 0; i < line.length; i++) {
line[i] = (char) ThreadLocalRandom.current().nextInt(MIN, MAX + 1);
}
Map<Integer, Integer> occurrences;
CharCounter counter;
// warmup
occurrences = new HashMap<>();
counter = new CharCounter(MAX);
System.out.println("Start warmup ...");
for (int i = 0; i < 500_000; i++) {
count1(occurrences, line);
count2(counter, line);
}
System.out.println(occurrences);
System.out.println(counter.getCounts());
System.out.println("Warmup done.");
// original method
occurrences = new HashMap<>();
System.out.println("Start timing of original method ...");
long start = System.nanoTime();
for (int i = 0; i < 500_000; i++) {
count1(occurrences, line);
}
System.out.println(occurrences);
long duration1 = System.nanoTime() - start;
System.out.println("End timing of original method.");
System.out.println("time: " + duration1);
// alternative method
counter = new CharCounter(MAX);
System.out.println("Start timing of alternative method ...");
start = System.nanoTime();
for (int i = 0; i < 500_000; i++) {
count2(counter, line);
}
System.out.println(counter.getCounts());
long duration2 = System.nanoTime() - start;
System.out.println("End timing of alternative method.");
System.out.println("time: " + duration2);
System.out.println("Speedup: " + (double) duration1 / duration2);
}
Output:
Start warmup ...
{97=77000000, 98=82000000, 99=86500000, 100=86000000, 101=80000000, 102=88500000}
{97=77000000, 98=82000000, 99=86500000, 100=86000000, 101=80000000, 102=88500000}
Warmup done.
Start timing of original method ...
{97=77000000, 98=82000000, 99=86500000, 100=86000000, 101=80000000, 102=88500000}
End timing of original method.
time: 7110894999
Start timing of alternative method ...
{97=77000000, 98=82000000, 99=86500000, 100=86000000, 101=80000000, 102=88500000}
End timing of alternative method.
time: 388308432
Speedup: 18.31249185698857
Also if you add the -verbose:gc JVM flag you can see that the original method needs to do quite a bit of garbage collecting while the alternative method doesn't need any.
Its very possible that not parameterizing HashMap it is causing lots of performance problems.
What I would do is create a class called IntegerCounter. Look at AtomicInteger (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/concurrent/atomic/AtomicInteger.java) code and copy everything from there except the code that makes it Atomic. Using IntegerCounter and incrementing the single instance of it should save you a lot of garbage collection.
Using new Integer(x) for the key lookup should allow for escape-analysis to automatically garbage collect it.
HashMap<Integer, IntegerCounter> occurances;
// since the set of characters are already known, add all of them here with an initial count of 0
for (int i = 0; i < length; i++) {
occurances.get(new Integer(line[i])).incrementAndGet();
}
In your code in most loop iterations you'll lookup the entry in the Map 3 times:
1.
occurances.containsKey(symbol)
2.
occurances.get(symbol);
3.
occurances.put(symbol, amount);
This is more than needed and you can simply use the fact that get returns null to improve this to 2 lookups:
Integer currentCount = occurances.get(symbol);
Integer amount = currentCount == null ? 1 : currentCount + 1;
occurances.put(symbol, amount);
Furthermore by using Integer, new Integer objects need to be created often (as soon as they exceed 127 or the upper bound that is used for the cached values), which decreases performance.
Also since you know the character set before analyzing the data, you could insert 0s (or equivalent) as values for all characters, which removes the need to check, if a mapping is already in the map.
The following code uses uses a helper class containing a int count field to store the data instead, which allows incrementing the value without boxing/unboxing conversions.
class Container {
public int count = 0;
}
int[] symbolSet = ...
Map<Integer, Container> occurances = new HashMap<>();
for (int s : symbolSet) {
occurances.put(s, new Container());
}
for (int symbol : line) {
occurances.get(symbol).count++;
}
Also using a different data structure can also help. Things that come to mind are Perfect Hashing or storing the data in a data structure different from Map. However instead of using a ArrayList, I'd recommend using a int[] array, since this does not require any method calls and also removes the need for boxing/unboxing conversions to/from Integer. The data can still be converted to a more suitable data structure after calculating the frequencies.
you can convert the char directly to an int and use it as an index
for (i=0; ; i++){
occurences[(int)line[i]]++;
}
Good Morning
I write a function that calculates for me the frequency of a term:
public static int tfCalculator(String[] totalterms, String termToCheck) {
int count = 0; //to count the overall occurrence of the term termToCheck
for (String s : totalterms) {
if (s.equalsIgnoreCase(termToCheck)) {
count++;
}
}
return count;
}
and after that I use it on the code below to calculate every word from a String[] words
for(String word:words){
int freq = tfCalculator(words, word);
System.out.println(word + "|" + freq);
mm+=word + "|" + freq+"\n";
}
well the problem that I have is that the words repeat here is for example the result:
cytoskeletal|2
network|1
enable|1
equal|1
spindle|1
cytoskeletal|2
...
...
so can someone help me to remove the repeated word and get as result like that:
cytoskeletal|2
network|1
enable|1
equal|1
spindle|1
...
...
Thank you very much!
Java 8 solution
words = Arrays.stream(words).distinct().toArray(String[]::new);
the distinct method removes duplicates. words is replaced with a new array without duplicates
I think here you want to print the frequency of each string in the array totalterms . I think using Map is a easier solution as in the single traversal of the array it will store the frequency of all the strings Check the following implementation.
public static void printFrequency(String[] totalterms)
{
Map frequencyMap = new HashMap<String, Integer>();
for (String string : totalterms) {
if(frequencyMap.containsKey(string))
{
Integer count = (Integer)frequencyMap.get(string);
frequencyMap.put(string, count+1);
}
else
{
frequencyMap.put(string, 1);
}
}
Set <Entry<String, Integer>> elements= frequencyMap.entrySet();
for (Entry<String, Integer> entry : elements) {
System.out.println(entry.getKey()+"|"+entry.getValue());
}
}
You can just use a HashSet and that should take care of the duplicates issue:
words = new HashSet<String>(Arrays.asList(words)).toArray(new String[0]);
This will take your array, convert it to a List, feed that to the constructor of HashSet<String>, and then convert it back to an array for you.
Sort the array, then you can just count equal adjacent elements:
Arrays.sort(totalterms);
int i = 0;
while (i < totalterms.length) {
int start = i;
while (i < totalterms.length && totalterms[i].equals(totalterms[start])) {
++i;
}
System.out.println(totalterms[start] + "|" + (i - start));
}
in two line :
String s = "cytoskeletal|2 - network|1 - enable|1 - equal|1 - spindle|1 - cytoskeletal|2";
System.out.println(new LinkedHashSet(Arrays.asList(s.split("-"))).toString().replaceAll("(^\[|\]$)", "").replace(", ", "- "));
Your code is fine, you just need keep track of which words were encountered already. For that you can keep a running set:
Set<String> prevWords = new HashSet<>();
for(String word:words){
// proceed if word is new to the set, otherwise skip
if (prevWords.add(word)) {
int freq = tfCalculator(words, word);
System.out.println(word + "|" + freq);
mm+=word + "|" + freq+"\n";
}
}