What is the fastest way to gather symbol occurances in java

What is the fastest way to gather symbol occurances in java - java

My goal is to make a function which counts occurances of some symbols (chars) in line.
An int ID is giving to every character I need to count.
The set of chars is limited and I know it from the beginning.
All the lines consist only of the chars from the giving set.
The function processes gazzilions of lines.
My profiler always shows the function which collects the stats is the slowest (97%) despite the program does a lot of other things.
First I used a HashMap and code like this:
occurances = new HashMap<>();
for (int symbol : line) {
Integer amount = 1;
if (occurances.containsKey(symbol)) {
amount += occurances.get(symbol);
}
occurances.put(symbol, amount);
}
The profiler showed hashMap.put takes 97% processor usage
Then I tried to replace it with a created once ArrayList:
And optimized it a litle bit (the lines are always longer than 1 char), but it's still very slow.
int symbol = line[0];
occurances.set(symbol, 1);
for (int i = 1; i < length; i++) {
symbol = line[i];
occurances.set(symbol, 1 + occurances.get(symbol));
}
Please if someone has some better ideas how to solve this task with better performance, your help would be very much appreceated.

As suggested here you can try to do somthing like
List<Integer> line = //get line as a list;
Map<Integer, Long> intCount = line.parallelStream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

You could try something like this:
public class CharCounter {
final int max;
final int[] counts;
public CharCounter(char max) {
this.max = (int) max;
counts = new int[this.max + 1];
}
public void addCounts(char[] line) {
for (int symbol : line) {
counts[symbol]++;
}
}
public Map<Integer, Integer> getCounts() {
Map<Integer, Integer> countsMap = new HashMap<>();
for (int symbol = 0; symbol < counts.length; symbol++) {
int count = counts[symbol];
if (count > 0) {
countsMap.put(symbol, count);
}
}
return countsMap;
}
}
This uses an array to keep the counts and uses the char itself as an index to the array.
This eliminates the need to check if a map contains the given key etc. It also removes the need for autoboxing the chars.
And a performance comparison shows roughly 20x speedup:
public static final char MIN = 'a';
public static final char MAX = 'f';
private static void count1(Map<Integer, Integer> occurrences, char[] line) {
for (int symbol : line) {
Integer amount = 1;
if (occurrences.containsKey(symbol)) {
amount += occurrences.get(symbol);
}
occurrences.put(symbol, amount);
}
}
private static void count2(CharCounter counter, char[] line) {
counter.addCounts(line);
}
public static void main(String[] args) {
char[] line = new char[1000];
for (int i = 0; i < line.length; i++) {
line[i] = (char) ThreadLocalRandom.current().nextInt(MIN, MAX + 1);
}
Map<Integer, Integer> occurrences;
CharCounter counter;
// warmup
occurrences = new HashMap<>();
counter = new CharCounter(MAX);
System.out.println("Start warmup ...");
for (int i = 0; i < 500_000; i++) {
count1(occurrences, line);
count2(counter, line);
}
System.out.println(occurrences);
System.out.println(counter.getCounts());
System.out.println("Warmup done.");
// original method
occurrences = new HashMap<>();
System.out.println("Start timing of original method ...");
long start = System.nanoTime();
for (int i = 0; i < 500_000; i++) {
count1(occurrences, line);
}
System.out.println(occurrences);
long duration1 = System.nanoTime() - start;
System.out.println("End timing of original method.");
System.out.println("time: " + duration1);
// alternative method
counter = new CharCounter(MAX);
System.out.println("Start timing of alternative method ...");
start = System.nanoTime();
for (int i = 0; i < 500_000; i++) {
count2(counter, line);
}
System.out.println(counter.getCounts());
long duration2 = System.nanoTime() - start;
System.out.println("End timing of alternative method.");
System.out.println("time: " + duration2);
System.out.println("Speedup: " + (double) duration1 / duration2);
}
Output:
Start warmup ...
{97=77000000, 98=82000000, 99=86500000, 100=86000000, 101=80000000, 102=88500000}
{97=77000000, 98=82000000, 99=86500000, 100=86000000, 101=80000000, 102=88500000}
Warmup done.
Start timing of original method ...
{97=77000000, 98=82000000, 99=86500000, 100=86000000, 101=80000000, 102=88500000}
End timing of original method.
time: 7110894999
Start timing of alternative method ...
{97=77000000, 98=82000000, 99=86500000, 100=86000000, 101=80000000, 102=88500000}
End timing of alternative method.
time: 388308432
Speedup: 18.31249185698857
Also if you add the -verbose:gc JVM flag you can see that the original method needs to do quite a bit of garbage collecting while the alternative method doesn't need any.

Its very possible that not parameterizing HashMap it is causing lots of performance problems.
What I would do is create a class called IntegerCounter. Look at AtomicInteger (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/concurrent/atomic/AtomicInteger.java) code and copy everything from there except the code that makes it Atomic. Using IntegerCounter and incrementing the single instance of it should save you a lot of garbage collection.
Using new Integer(x) for the key lookup should allow for escape-analysis to automatically garbage collect it.
HashMap<Integer, IntegerCounter> occurances;
// since the set of characters are already known, add all of them here with an initial count of 0
for (int i = 0; i < length; i++) {
occurances.get(new Integer(line[i])).incrementAndGet();
}

In your code in most loop iterations you'll lookup the entry in the Map 3 times:
1.
occurances.containsKey(symbol)
2.
occurances.get(symbol);
3.
occurances.put(symbol, amount);
This is more than needed and you can simply use the fact that get returns null to improve this to 2 lookups:
Integer currentCount = occurances.get(symbol);
Integer amount = currentCount == null ? 1 : currentCount + 1;
occurances.put(symbol, amount);
Furthermore by using Integer, new Integer objects need to be created often (as soon as they exceed 127 or the upper bound that is used for the cached values), which decreases performance.
Also since you know the character set before analyzing the data, you could insert 0s (or equivalent) as values for all characters, which removes the need to check, if a mapping is already in the map.
The following code uses uses a helper class containing a int count field to store the data instead, which allows incrementing the value without boxing/unboxing conversions.
class Container {
public int count = 0;
}
int[] symbolSet = ...
Map<Integer, Container> occurances = new HashMap<>();
for (int s : symbolSet) {
occurances.put(s, new Container());
}
for (int symbol : line) {
occurances.get(symbol).count++;
}
Also using a different data structure can also help. Things that come to mind are Perfect Hashing or storing the data in a data structure different from Map. However instead of using a ArrayList, I'd recommend using a int[] array, since this does not require any method calls and also removes the need for boxing/unboxing conversions to/from Integer. The data can still be converted to a more suitable data structure after calculating the frequencies.

you can convert the char directly to an int and use it as an index
for (i=0; ; i++){
occurences[(int)line[i]]++;
}

Related

How to iterate through large string with substrings incrementing by 1 position?

I have a String with an enormous number in it (thousands of chars):
String pi = "3.14159265358979323846264338327950288419716939937..."
I want to cycle through this string, grabbing 6 chars at a time, and checking if they match a given String:
String substring = "3.1415"
However, on each subsequent substring, I want to shift 1 position to the right of the chars in the original String:
substring = ".14159"
substring = "141592"
substring = "415926"
substring = "159265"
etc. etc.
What is the best way to do this? I have considered StringBuilder's methods, but converting to a String each iteration might be costly. String's method
substring(int beginIndex, int endIndex)
seems to approach what I'm trying to do, but I don't know if those indices can be incremented algorithmically.

I don't know if those indices can be incremented algorithmically.
These are parameters. They are values provided by you for each invocation of the method.
You are free to specify anything you want based on variables, constants, expressions, user input, or anything else. In this case, you can keep one or two variables, increment them, and pass them as parameters.
Here's an example using two variables that are both incremented by 1 each iteration:
class Main {
public static void main(String[] args) {
String pi = "3.14159265358979323846264338327950288419716939937...";
for(int start=0, end=6; end <= pi.length(); start++, end++) {
String substring = pi.substring(start, end);
System.out.println(substring);
}
}
}

Here's an algorithm that's efficient at matching values. Might be more efficient then using substring methods since it short circuits as soon as values don't match the provided sequence.
public static int containsSubstring(String wholeString, String findValue) {
//Break values into arrays
char[] wholeArray = wholeString.toCharArray();
char[] findArray = findValue.toCharArray();
//Use named outer loop for easy continuation to next character place
outerLoop:
for(int i = 0; i < wholeArray.length; i++) {
//Remaining values aren't large enough to contain find values so stop looking
if(i + findArray.length > wholeArray.length) {
break;
}
//Loop through next couple digits to check for matching sequence
for(int j = 0; j < findArray.length; j++) {
//Breaks loop as soon as a values don't match
if(wholeArray[i + j] != findArray[j]) {
continue outerLoop;
}
}
return i; //Or 'true' of you just care whether it's in there, and set the method return to boolean
}
return -1; //Or 'false'
}

Or java 8 style
String pi = "3.14159265358979323846264338327950288419716939937...";
IntStream.range(0, pi.length() - 5)
.mapToObj(i -> new StringBuffer(pi.substring(i, i + 6)))
.forEach(System.out::println)
;
You have the possibility to make it parallel
String pi = "3.14159265358979323846264338327950288419716939937...";
IntStream.range(0, pi.length() - 5)
.mapToObj(i -> new StringBuffer(pi.substring(i, i + 6)))
.parallel()
.forEach(System.out::println)
;
Speaking about performances the classic for loop method is still a little bit faster of it; you should do some tests:
public class Main {
static long firstTestTime;
static long withStreamTime;
static String pi = "3.141592653589793238462643383279502884197169399375105820974944592307816";
public static void main(String[] args) {
firstTest(pi);
withStreams(pi);
System.out.println("First Test: " + firstTestTime);
System.out.println("With Streams: " + withStreamTime);
}
static void withStreams(String pi) {
System.out.println("Starting stream test");
long startTime = System.currentTimeMillis();
IntStream.range(0, pi.length() - 5)
.mapToObj(i -> new StringBuffer(pi.substring(i, i + 6)))
//.parallel()
.forEach(System.out::println)
;
withStreamTime = System.currentTimeMillis() - startTime;
}
// By #that other guy
static void firstTest(String pi) {
System.out.println("Starting first test");
long startTime = System.currentTimeMillis();
for(int start=0, end=6; end <= pi.length(); start++, end++) {
String substring = pi.substring(start, end);
System.out.println(substring);
}
firstTestTime = System.currentTimeMillis() - startTime;
}
}
Try to increase the greek pi length!

I need to take a string and output the word that occurs most within the string

I believe that I have the correct code in order to keep going to the next word in the string, however, I am really struggling with how I am supposed to add the most used word into maxW. I also am confused about the maxCnt, will I need to create a whole separate loop just to return the maxCnt?
My professor mentioned using an if statement to compare maxW and maxCnt, but I honestly do not know where to start with implementing that.
String getMode() {
String tmp = "";
for(int i=0; i<s.length(); i++){
if(s.charAt(i)>64 && s.charAt(i)<=122 || s.charAt(i)==32){
tmp = tmp+s.charAt(i);
s = tmp;
}
}
String maxW = "";
int maxCnt = 0;
for(int i=0; i<s.length(); i++) {
int p =s.indexOf(" ",i);
int cnt = 1;
String w = s.substring(i,p);
i = p;
for (int j=p+1; j<s.length(); j++) {
int p1 = s.indexOf(" ",j);
String w1 = s.substring(j,p1);
if(w.equalsIgnoreCase(w1))
cnt++;
j = p1;
maxW = w+s.substring(j,p1);
}
}
return maxW;
}
Everything that I have tried results in a String out of range error code at:
(String.java:1967)
(Hw9.java:36)
(Hw9.java:64)
This is an example of what the result should be: If s = "You are braver than you believe, and stronger than you seem, and smarter than you think.", this method will return "You".
Thanks in advance for any help!

If you can't use maps, then perhaps you could use two parallel lists. One for words and one for count. Search each word in the String list. If you find it, increment its corresponding count list entry by 1. If you don't find it, add it to the list and set the appropriate count entry to 1.
Once you get done building your lists, then find the index of the max count and use that to index into the word list to get the word that occurred most often.
Keep in mind that for some data sets (sentences) there could be a multi-way tie.

I ran your code with your example sentence.
Using the string you supplied, s has a single character, Y. The reason for that is the loop only executes once.
As soon as you set s = tmp inside the loop, the length of s is now 1, so the loop immediately exits after one iteration.
I'd recommend doing this piece by piece. Break the problem down into chunks, and tackle those one-by-one. Use a debugger or, if you're not comfortable with that yet, make liberal use of System.out.println().

Here's some ways of counting words mentioned in comments in code form...since it's helpful to see it in code form sometimes:
//The map easy way:
Map<String, Integer> counts = new HashMap<String, Integer>();
if (!counts.containsKey("word")) {
counts.put("word", 0);
}
counts.put("word", counts.get("word") + 1);
//The double array way (dirty):
int wordsAddedCount = 0;
boolean wordFound = false;
String[] wordsList = new String[500];//assumes max of 500 different words
Integer[] counts2 = new Integer[500];
for (int i = 0; i < wordsAddedCount; i++) {
if ("word".equals(wordsList[i])) {
wordFound = true;
counts2[i]++;
break;
}
}
if (!wordFound) {
wordsList[wordsAddedCount] = "word";
counts2[wordsAddedCount++] = 1;
}

How can I find the most frequent word in a huge amount of words (eg. 900000)

I am facing a task which is generating 900000 random words and then print out the most frequent one. So here is my algorithm:
1. move all number into a collection rather than printhing out them
2. for (900000...){move the frequency of Collection[i] to another collection B}
** 90W*90W is too much for a computer(lack of efficiency)
3. find the biggest number in that collection and the index.
4. then B[index] is output.
But the thing is that my computer cannot handle the second step. So I searched on this website and find some answer about find the frequency of word in a bunch of words and I viewed the answer code, but I haven't find a way to apply them into huge amount of words.
Now I show my code here:
/** Funny Words Generator
* Tony
*/
import java.util.*;
public class WordsGenerator {
//data field (can be accessed in whole class):
private static int xC; // define a xCurrent so we can access it all over the class
private static int n;
private static String[] consonants = {"b","c","d","f","g","h","j","k","l","m","n","p","r","s","t","v","w","x","z"};
private static String[] vowels = {"a", "e", "i", "o", "u"};
private static String funnyWords = "";
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
int times = 900000; // words number
xC = sc.nextInt(); // seeds (only input)
/* Funny word list */
ArrayList<String> wordsList = new ArrayList<String>();
ArrayList<Integer> frequencies = new ArrayList<Integer>();
int maxFreq;
for (int i = 0; i < times; i++) {
n = 6; // each words are 6 characters long
funnyWords = ""; // reset the funnyWords each new time
for (int d = 0; d < n; d ++) {
int letterNum = randomGenerator(); /* random generator will generate numbers based on current x */
int letterIndex = 0; /* letterNum % 19 or % 5 based on condition */
if ((d + 1) % 2 == 0) {
letterIndex = letterNum % 5;
funnyWords += vowels[letterIndex];
}
else if ((d + 1) % 2 != 0) {
letterIndex = letterNum % 19;
funnyWords += consonants[letterIndex];
}
}
wordsList.add(funnyWords);
}
/* put all frequencies of each words into an array called frequencies */
for (int i = 0; i < 900000; i++) {
frequencies.add(Collections.frequency(wordsList, wordsList.get(i)));
}
maxFreq = Collections.max(frequencies);
int index = frequencies.indexOf(maxFreq); // get the index of the most frequent word
System.out.print(wordsList.get(index));
sc.close();
}
/** randomGenerator
* param: N(generate times), seeds
* return: update the xC and return it */
private static int randomGenerator() {
int a = 445;
int c = 700001;
int m = 2097152;
xC = (a * xC + c) % m; // update
return xC; // return
}
}
So I have realized that maybe there is a way skip the second step somehow. Anyone can give me a hint? Just a hint not code so I can try it myself will be great! Thx!
Modified:
I see lots of your answer code contains "words.stream()", I googled it and I couldn't find it. Could you guys please tell me where I can find this kind of knowledge? this stream method is in which class? Thank you!

You can do it using Java Lambdas (requires JDK 8). Also notice that you can have words with equal frequency in your word list.
public class Main {
public static void main(String[] args) {
List<String> words = new ArrayList<>();
words.add("World");
words.add("Hello");
words.add("World");
words.add("Hello");
// Imagine we have 90000 words in word list
Set<Map.Entry<String, Integer>> set = words.stream()
// Here we create map of unique words and calculates their frequency
.collect(Collectors.toMap(word -> word, word -> 1, Integer::sum)).entrySet();
// Find the max frequency
int max = Collections
.max(set, (a, b) -> Integer.compare(a.getValue(), b.getValue())).getValue();
// We can have words with the same frequency like in my words list. Let's get them all
List<String> list = set.stream()
.filter(entry -> entry.getValue() == max)
.map(Map.Entry::getKey).collect(Collectors.toList());
System.out.println(list); // [Hello, World]
}
}

This can basically be broken down into two steps:
Compute the word frequencies, as a Map<String, Long>. There are several options for this, see this question for examples.
Computing the maximum entry of this map, where "maximum" refers to the entry with the highest value.
So if you're really up to it, you can write this very compactly:
private static <T> T maxCountElement(List<? extends T> list)
{
return Collections.max(list.stream().collect(Collectors.groupingBy(
Function.identity(), Collectors.counting())).entrySet(),
(e0, e1) -> Long.compare(e0.getValue(), e1.getValue())).getKey();
}
Edited in response to the comment:
The compact representation may not be the most readable. Breaking it down makes the code a bit elaborate, but may make clearer what is happening there:
private static <T> T maxCountElement(List<? extends T> list)
{
// A collector that receives the input elements, and converts them
// into a map. The key of the map is the input element. The value
// of the map is the number of occurrences of the element
Collector<T, ?, Map<T, Long>> collector =
Collectors.groupingBy(Function.identity(), Collectors.counting());
// Create the map and obtain its set of entries
Map<T, Long> map = list.stream().collect(collector);
Set<Entry<T, Long>> entrySet = map.entrySet();
// A comparator that compares two map entries based on their value
Comparator<Entry<T, Long>> comparator =
(e0, e1) -> Long.compare(e0.getValue(), e1.getValue());
// Compute the maximum element of the set of entries. That is,
// the entry with the largest value (which is the entry for the
// element with the maximum number of occurrences)
Entry<T, Long> entryWithMaxValue =
Collections.max(entrySet, comparator);
return entryWithMaxValue.getKey();
}

HashMap is one of the fastest data structures, just loop through each words, use it as key to the HashMap, inside the loop, make the counter the value of the hashMap.
HashMap<string, Integer> hashMapVariable = new HashMap<>();
...
//inside the loop of words
if (hashMapVariable.containsKey(word){
hashMapVariable.put(key, hashMapVariable.get(key) + 1);
} else {
hashMapVariable.put(word, 1);
}
...
for each key(word) just increment the value as associated with the key. although you have to check if the key exits ( in java its hashMapVariable.containsKey("key") ). if its exits then just increament else add it to the HashMap. by doing this you are not restoring the whole data you are only making every key just one and the number of times it occurs as value to the key.
At the end of the loop the most frequent word will have the highest counter/value.

you can use a HashMap and the key store word and the value is correspond times
pseudocode as below:
String demo(){
int maxFrequency = 0;
String maxFrequencyStr = "";
String strs[] ;
Map<String,Integer> map = new HashMap<String,Integer>();
for(int i = 0; i < 900000;i++){//for
if(map.containsKey(strs[i])){
int times = map.get(strs[i]);
map.put(strs[i], times+1);
if(maxFrequency<times+1){
maxFrequency = times + 1;
maxFrequencyStr = strs[i];
}
}
else{
map.put(strs[i], 1);
if(maxFrequency<1){
maxFrequency = 1;
maxFrequencyStr = strs[i];
}
}
}//for
return maxFrequencyStr;
}

Find number of repetitions of characters in a given Word

So I was developing an algorithm to count the number of repetitions of each character in a given word. I am using a HashMap and I add each unique character to the HashMap as the key and the value is the number of repetitions. I would like to know what the run time of my solution is and if there is a more efficient way to solve the problem.
Here is the code :
public static void getCount(String name){
public HashMap<String, Integer> names = new HashMap<String, Integer>() ;
for(int i =0; i<name.length(); i++){
if(names.containsKey(name.substring(i, i+1))){
names.put(name.substring(i, i+1), names.get(name.substring(i, i+1)) +1);
}
else{
names.put(name.substring(i, i+1), 1);
}
}
Set<String> a = names.keySet();
Iterator i = a.iterator();
while(i.hasNext()){
String t = (String) i.next();
System.out.println(t + " Ocurred " + names.get(t) + " times");
}
}

The algorithm has a time complexity of O(n), but I'd change some parts of your implementation, namely:
Using a single get() instead of containsKey() + get();
Using charAt() instead of substring() which will create a new String object;
Using a Map<Character, Integer> instead of Map<String, Integer> since you only care about a single character, not the entire String:
In other words:
public static void getCount(String name) {
Map<Character, Integer> names = new HashMap<Character, Integer>();
for(int i = 0; i < name.length(); i++) {
char c = name.charAt(i);
Integer count = names.get(c);
if (count == null) {
count = 0;
}
names.put(c, count + 1);
}
Set<Character> a = names.keySet();
for (Character t : a) {
System.out.println(t + " Ocurred " + names.get(t) + " times");
}
}

Your solution is O(n) from an algorithmic perspective, which is already optimal (at a minimum you have to inspect each character in the entire string at least once which is O(n)).
However there are a couple of ways that you could speed it up be reducing the constant overhead, e.g.
Use a HashMap<Character,Integer>. Characters will be much more efficient than Strings of length 1.
use charAt(i) instead of substring(i,i+1). This avoids creating a new String which will help you a lot. Probably the biggest single improvement you can make.
If the string is going to be long (e.g. thousands of characters or more), consider using an int[] array to count the individual characters rather than a HashMap, with the character's ASCII value used as an index into the array. This isn't a good idea if your Strings are short though.

Store the initial time to a variable, like so:
long start = System.currentTimeMillis();
then at the end, when you finish, print out the current time minus the start time:
System.out.println((System.currentTimeMillis() - start) + "ms taken");
to see the time taken to do it. As far as I can tell, that is the most efficient way to do it, but there may be another good method. Also, use char rather than strings for each individual character (as char/Character is the best class for characters, strings for a series of chars) then do name.charAt(i) rather than name.substring(i, i+1) and change your hashmap to HashMap<Character, Integer>

String s="good";
//collect different unique characters
ArrayList<String> temp=new ArrayList<>();
for (int i = 0; i < s.length(); i++) {
char c=s.charAt(i);
if(!temp.contains(""+c))
{
temp.add(""+s.charAt(i));
}
}
System.out.println(temp);
//get count of each occurrence in the string
for (int i = 0; i < temp.size(); i++) {
int count=0;
for (int j = 0; j < s.length(); j++) {
if(temp.get(i).equals(s.charAt(j)+"")){
count++;
}
}
System.out.println("Occurance of "+ temp.get(i) + " is "+ count+ " times" );
}*/

Efficient way to find Frequency of a character in a String in java : O(n)

In a recent interview I was asked to write the below program.
Find out the character whose frequency is minimum in the given String ?
So I tried by iterating through the string by using charAt and storing the character as key in a HashMap and the number of occurences as its value.
Now Again I have to iterate on the Map to find the lowest element.
Is there a more efficient way to do it as obviously the above one is too intensive i guess.
Update and Another Solution
After some thought process and answers I think the best time that this can be is O(n).
In the first iteration we will have to iterate through the String character by character and then store their frequency in an Array at the specific position(character is an int) and same time have two temporary variables which maintain the least count and the corresponding character.So when I go to the next character and store its frequency in arr[char] = arr[char]+1;At the same time I will check if the temp varible has a value greater than this value,if yes then the temp varible will be this value and also the char will be this one.In this way i suppose we dont need a second iteration to find the smallest and also no sorting is required I guess
.... Wat say ? Or any more solutions

I'd use an array rather than a hash map. If we're limited to ascii, that's just 256 entries; if we're using Unicode, 64k. Either way not an impossible size. Besides that, I don't see how you could improve on your approach. I'm trying to think of some clever trick to make it more efficient but I can't come up with any.
Seems to me the answer is almost always going to be a whole list of characters: all of those that are used zero times.
Update
This is probably clost to the most efficient it could be in Java. For convenience, I'm assuming we're using plain Ascii.
public List<Character> rarest(String s)
{
int[] freq=new int[256];
for (int p=s.length()-1;p>=0;--p)
{
char c=s.charAt(p);
if (c>255)
throw new UnexpectedDataException("Wasn't expecting that");
++freq[c];
}
int min=Integer.MAX_VALUE;
for (int x=freq.length-1;x>=0;--x)
{
// I'm assuming we don't want chars with frequency of zero
if (freq[x]>0 && min>freq[x])
min=freq[x];
}
List<Character> rares=new ArrayList<Character>();
for (int x=freq.length-1;x>=0;--x)
{
if (freq[x]==min)
rares.add((char)x);
}
return rares;
}
Any effort to keep the list sorted by frequency as you go is going to be way more inefficient, because it will have to re-sort every time you examine one character.
Any attempt to sort the list of frequencies at all is going to be more inefficient, as sorting the whole list is clearly going to be slower than just picking the smallest value.
Sorting the string and then counting is going to be slower because the sort will be more expensive than the count.
Technically, it would be faster to create a simple array at the end rather than an ArrayList, but the ArrayList makes slightly more readable code.
There may be a way to do it faster, but I suspect this is close to the optimum solution. I'd certainly be interested to see if someone has a better idea.

I think your approach is in theory the most efficient (O(n)). However in practice it needs quite a lot of memory, and is probably very slow.
It is possibly more efficient (at least it uses less memory) to convert the string to a char array, sort the array, and then calculate the frequencies using a simple loop. However, in theory it is less efficient (O(n log n)) because of sorting (unless you use a more efficient sort algorithm).
Test case:
import java.util.Arrays;
public class Test {
public static void main(String... args) throws Exception {
// System.out.println(getLowFrequencyChar("x"));
// System.out.println(getLowFrequencyChar("bab"));
// System.out.println(getLowFrequencyChar("babaa"));
for (int i = 0; i < 5; i++) {
long start = System.currentTimeMillis();
for (int j = 0; j < 1000000; j++) {
getLowFrequencyChar("long start = System.currentTimeMillis();");
}
System.out.println(System.currentTimeMillis() - start);
}
}
private static char getLowFrequencyChar(String string) {
int len = string.length();
if (len == 0) {
return 0;
} else if (len == 1) {
return string.charAt(0);
}
char[] chars = string.toCharArray();
Arrays.sort(chars);
int low = Integer.MAX_VALUE, f = 1;
char last = chars[0], x = 0;
for (int i = 1; i < len; i++) {
char c = chars[i];
if (c != last) {
if (f < low) {
if (f == 1) {
return last;
}
low = f;
x = last;
}
last = c;
f = 1;
} else {
f++;
}
}
if (f < low) {
x = last;
}
return (char) x;
}
}

The process of finding frequency of characters in a String is very easy.
For answer see my code.
import java.io.*;
public class frequency_of_char
{
public static void main(String args[])throws IOException
{
BufferedReader in=new BufferedReader(new InputStreamReader(System.in));
int ci,i,j,k,l;l=0;
String str,str1;
char c,ch;
System.out.println("Enter your String");
str=in.readLine();
i=str.length();
for(c='A';c<='z';c++)
{
k=0;
for(j=0;j<i;j++)
{
ch=str.charAt(j);
if(ch==c)
k++;
}
if(k>0)
System.out.println("The character "+c+" has occured for "+k+" times");
}
}
}

I'd do it the following way as it involves the fewest lines of code:
character you wish to want to know frequency of: "_"
String "this_is_a_test"
String testStr = "this_is_a_test";
String[] parts = testStr.split("_"); //note you need to use regular expressions here
int freq = parts.length -1;
You may find weird things happen if the string starts or ends with the character in question, but I'll leave it to you to test for that.

Having to iterate through the HashMap is not necessarily bad. That will only be O(h) where h is the HashMap's length--the number of unique characters--which in this case will always be less than or equal to n. For the example "aaabbc", h = 3 for the three unique characters. But, since h is strictly less than the number of possible characters: 255, it is constant. So, your big-oh will be O(n+h) which is actually O(n) since h is constant. I don't know of any algorithm that could get a better big-oh, you could try to have a bunch of java specific optimizations, but that said here is a simple algorithm I wrote that finds the char with the lowest frequency. It returns "c" from the input "aaabbc".
import java.util.HashMap;
import java.util.Map;
public class StackOverflowQuestion {
public static void main(String[] args) {
// TODO Auto-generated method stub
System.out.println("" + findLowestFrequency("aaabbc"));
}
public static char findLowestFrequency(String input) {
Map<Character, Integer> map = new HashMap<Character, Integer>();
for (char c : input.toCharArray())
if (map.containsKey(c))
map.put(c, map.get(c) + 1);
else
map.put(c, 0);
char rarest = map.keySet().iterator().next();
for (char c : map.keySet())
if (map.get(c) < map.get(rarest))
rarest = c;
return rarest;
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

What is the fastest way to gather symbol occurances in java - java

As suggested here you can try to do somthing like List<Integer> line = //get line as a list; Map<Integer, Long> intCount = line.parallelStream() .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

you can convert the char directly to an int and use it as an index for (i=0; ; i++){ occurences[(int)line[i]]++; }

Related

How to iterate through large string with substrings incrementing by 1 position?

I need to take a string and output the word that occurs most within the string

How can I find the most frequent word in a huge amount of words (eg. 900000)

Find number of repetitions of characters in a given Word

Efficient way to find Frequency of a character in a String in java : O(n)

Categories

Resources