Sort Characters By Frequency Java (Optimal Solution) - java

I'm trying to solve this question using Java. The goal is to sort a string in decreasing order based on the frequency of characters. For example "Aabb" is going to be "bbaA" or "bbAa". I have implemented a working solution but it's in O(n^2). I was wondering if someone out there has a better and more optimal solution.
Here is the code:
public class Solution
{
public String frequencySort(String s)
{
Map<Character,Integer> map =new HashMap<Character,Integer>();
for(int i=0;i<s.length();i++)
{
if(map.containsKey(s.charAt(i)))
map.put(s.charAt(i),map.get(s.charAt(i))+1);
else
map.put(s.charAt(i),1);
}
List<Map.Entry<Character,Integer>> sortedlist = new ArrayList<>(map.entrySet());
Collections.sort(sortedlist, new Comparator<Map.Entry<Character,Integer>>() {
#Override
public int compare(Map.Entry<Character, Integer> o1,
Map.Entry<Character, Integer> o2) {
return o2.getValue() - o1.getValue();
}
});
String lastString="";
for (Map.Entry<Character,Integer> e : sortedlist)
{
for(Integer j=0;j < e.getValue();j++)
lastString+= e.getKey().toString();
}
return lastString;
}
}

Your algorithm is actually O(n) (thanks #andreas, twice!):
Building the map of counts is O(n), where n is the length of the input
Sorting the list of entries is O(m log m), where m is the number of unique characters in the input
Rebuilding the sorted string is O(n)
Although the slowest step by magnitude may appear to be the sorting, most probably it isn't the dominant operation when the input is very large. "Probably", because m is bound by the size of the alphabet, which is normally expected to be much smaller than the size of a very large input. Hence the overall time complexity of O(n).
Some minor optimizations are possible, but won't change the order of complexity:
You can first get a character array from the input string. It uses more memory, but you will save the boundary checks of .charAt, and the array can be useful at a later step (see below).
If you know the size of the alphabet, then you can use an int[] instead of a hash map.
Instead of rebuilding the sorted string manually and with string concatenation, you could write into the character array and return new String(chars).

Your code doesn't pass because of string concatenation, use StringBuilder instead and I bet you will pass.
StringBuilder builder = bew StringBuilder();
builder.append(e.getKey());
return builder.toString();
There are a couple of other ideas how to sort elements by frequency.
Use a sorting algorithm to sort the elements O(nlogn)
Scan the sorted array and construct a 2D array of element and count
O(n).
Sort the 2D array according to count O(nlogn)
Input 2 5 2 8 5 6 8 8
After sorting we get
2 2 5 5 6 8 8 8
Now construct the 2D array as
2, 2
5, 2
6, 1
8, 3
Sort by count
8, 3
2, 2
5, 2
6, 1
copyright
Follow the link to take a look at other possible approaches.

If you have to display the characters according to their frequency, we can use a map or dictionary in Python.
I am solving this using Python:
# sort characters by frequency
def fix(s):
d = {}
res=""
for ch in a: d[ch] = d.get(ch,0)+1
# use this lambda whenever you have to sort with values
for val in sorted(d.items(),reverse = True,key = lambda ch : ch[1]):
res = res + val[0]*val[1]
return res
a = "GiniGinaProtijayi"
print(fix(a))
Method 2 :Using collections.Counter().most_common()
# sort characeters by frequency
a = "GiniGinaProtijayi"
def sortByFrequency(a):
aa = [ch*count for ch, count in collections.Counter(a).most_common()]
print(aa)
print(''.join(aa))
sortByFrequency(a)

Related

counting number of occurrences of words in a text java

So I'm building a TreeMap from scratch and I'm trying to count the number of occurrences of every word in a text using Java. The text is read from a text file, but I can easily read it from there. I really don't know how to count every word, can someone help?
Imagine the text is something like:
Over time, computer engineers take advantage of each other's work and invent algorithms for new things. Algorithms combine with other algorithms to utilize the results of other algorithms, in turn producing results for even more algorithms.
Output:
Over 1
time 1
computer 1
algotitms 5
...
If possible I want to ignore if it's upper or lower case, I want to count them both together.
EDIT: I don't want to use any sort of Map (hashMap i.e.) or something similiar to do this.
Break down the problem as follows (this is one potential solution - not THE solution):
Split the text into words (create list or array or words).
Remove punctuation marks.
Create your map to collect results.
Iterate over your list of words and add "1" to the value of each encountered key
Display results (Iterate over the map's EntrySet)
Split the text into words
My preference is to split words by using space as a delimiter. The reason being is that, if you split using non-word characters, you may missed on some hyphenated words. I know that the use of hyphenation is being reduced, there are still plenty of words that fall under this rule; for example, middle-aged. If a word such as this is encountered, it MIGHT have to be treated as one word and not two.
Remove punctuation marks
Because of the decision above, you will need to first remove punctuation characters that might attached to your words. Keep in mind that if you use a regular expression to split the words, you might be able to accomplish this step at the same time you are doing the step above. In fact, that would be preferred so that you don't have to iterate over twice. Do both of these in a single pass. While you at it, call toLowerCase() on the input string to eliminate the ambiguity between capitalized words and lowercase words.
Create your map to collect results
This is where you are going to collect your count. Using the TreeMap implementation of the Java Map. One thing to be aware about this particular implementation is that the map is sorted according to the natural ordering of its keys. In this case, since the keys are the words from the inputted text, the keys will be arranged in alphabetical order, not by the magnitude of the count. IF sorting the entries by count is important, there is a technique where you can "reverse" the map and make the values the keys and the keys to values. However, since two or more words could have the same count, you will need to create a new map of <Integer, Set>, so that you can group together words with the same count.
Iterate over your list of words
At this point, you should have a list of words and a map structure to collect the count. Using a lambda expression, you should be able to perform a count() or your words very easily. But, if you are not familiarized or comfortable with Lambda expressions, you can use a regular looping structure to iterate over your list, do a containsKey() check to see if the word was encountered before, get() the value if the map already contains the word, and then add "1" to the previous value. Lastly, put() the new count in the map.
Display results
Again, you can use a Lambda Expression to print out the EntrySet key value pairs or simply iterate over the entry set to display the results.
Based on all of the above points, a potential solution should look like this (not using Lambda for the OPs sake)
public static void main(String[] args) {
String text = "Over time, computer engineers take advantage of each other's work and invent algorithms for new things. Algorithms combine with other algorithms to utilize the results of other algorithms, in turn producing results for even more algorithms.";
text = text.replaceAll("\\p{P}", ""); // replace all punctuations
text = text.toLowerCase(); // turn all words into lowercase
String[] wordArr = text.split(" "); // create list of words
Map<String, Integer> wordCount = new TreeMap<>();
// Collect the word count
for (String word : wordArr) {
if(!wordCount.containsKey(word)){
wordCount.put(word, 1);
} else {
int count = wordCount.get(word);
wordCount.put(word, count + 1);
}
}
Iterator<Entry<String, Integer>> iter = wordCount.entrySet().iterator();
System.out.println("Output: ");
while(iter.hasNext()) {
Entry<String, Integer> entry = iter.next();
System.out.println(entry.getKey() + ": " + entry.getValue());
}
}
This produces the following output
Output:
advantage: 1
algorithms: 5
and: 1
combine: 1
computer: 1
each: 1
engineers: 1
even: 1
for: 2
in: 1
invent: 1
more: 1
new: 1
of: 2
other: 2
others: 1
over: 1
producing: 1
results: 2
take: 1
the: 1
things: 1
time: 1
to: 1
turn: 1
utilize: 1
with: 1
work: 1
Why did I break down the problem like this for such mundane task? Simple. I believe each of those discrete steps should be extracted into functions to improve code reusability. Yes, it is cool to use a Lambda expression to do everything at once and make your code look much simplified. But what if you need to some intermediate step over and over? Most of the time, code is duplicated to accomplish this. In reality, often a better solution is to break these tasks into methods. Some of these tasks, like transforming the input text, can be done in a single method since that activity seems to be related in nature. (There is such a thing as a method doing "too little.")
public String[] createWordList(String text) {
return text.replaceAll("\\p{P}", "").toLowerCase().split(" ");
}
public Map<String, Integer> createWordCountMap(String[] wordArr) {
Map<String, Integer> wordCountMap = new TreeMap<>();
for (String word : wordArr) {
if(!wordCountMap.containsKey(word)){
wordCountMap.put(word, 1);
} else {
int count = wordCountMap.get(word);
wordCountMap.put(word, count + 1);
}
}
return wordCountMap;
}
String void displayCount(Map<String, Integer> wordCountMap) {
Iterator<Entry<String, Integer>> iter = wordCountMap.entrySet().iterator();
while(iter.hasNext()) {
Entry<String, Integer> entry = iter.next();
System.out.println(entry.getKey() + ": " + entry.getValue());
}
}
Now, after doing that, your main method looks more readable and your code is more reusable.
public static void main(String[] args) {
WordCount wc = new WordCount();
String text = "...";
String[] wordArr = wc.createWordList(text);
Map<String, Integer> wordCountMap = wc.createWordCountMap(wordArr);
wc.displayCount(wordCountMap);
}
UPDATE:
One small detail I forgot to mention is that, if instead of a TreeMap a HashMap is used, the output will come sorted by count value in descending order. This is because the hashing function will use value of the entry as the hash. Therefore, you won't need to "reverse" the map for this purpose. So, after switching to HashMap, the output should be as follows:
Output:
algorithms: 5
other: 2
for: 2
turn: 1
computer: 1
producing: 1
...
my suggestion is to use regexp and split and stream with grouping example 3
EX1 this solution does not use a collection LIST/MAP only array for me it is not optimal
#Test
public void testApp2() {
final String text = "Over time, computer engineers take advantage of each other's work and invent algorithms for new things. Algorithms combine with other algorithms to utilize the results of other algorithms, in turn producing results for even more algorithms.";
final String lowerText = text.toLowerCase();
final String[] split = lowerText.split("\\W+");
System.out.println("Output: ");
for (String s : split) {
if (s == null) {
continue;
}
int count = 0;
for (int i = 0; i < split.length; i++) {
final boolean sameWorld = s.equals(split[i]);
if (sameWorld) {
count = count + 1;
split[i] = null;
}
}
System.out.println(s + " " + count);
}
}
EX2 I think that's what you mean, but I'm not sure if I used too much for the list
#Test
public void testApp() {
final String text = "Over time, computer engineers take advantage of each other's work and invent algorithms for new things. Algorithms combine with other algorithms to utilize the results of other algorithms, in turn producing results for even more algorithms.";
final String[] split = text.split("\\W+");
final List<String> list = new ArrayList<>();
System.out.println("Output: ");
for (String s : split) {
if(!list.contains(s)){
list.add(s.toUpperCase());
final long count = Arrays.stream(split).filter(s::equalsIgnoreCase).count();
System.out.println(s+" "+count);
}
}
}
EX3 below is a test for your example but use MAP
#Test
public void test() {
final String text = "Over time, computer engineers take advantage of each other's work and invent algorithms for new things. Algorithms combine with other algorithms to utilize the results of other algorithms, in turn producing results for even more algorithms.";
Map<String, Long> result = Arrays.stream(text.split("\\W+")).collect(Collectors.groupingBy(String::toLowerCase, Collectors.counting()));
assertEquals(result.get("algorithms"), new Long(5));
System.out.println("Output: ");
result.entrySet().stream().forEach(x -> System.out.println(x.getKey() + " " + x.getValue()));
}

Generate all Palindromic numbers in a given number system?

I need to generate all palindromic numbers for a given number base (which should be able to be of size up to 10,000), in a given range. I need a efficient way to do it.
I stumbled upon this answer, which is related to base 10 directly. I'm trying to adapt it to work for "all" bases:
public static Set<String> allPalindromic(long limit, int base, char[] list) {
Set<String> result = new HashSet<String>();
for (long i = 0; i <= base-1 && i <= limit; i++) {
result.add(convert(i, base, list));
}
boolean cont = true;
for (long i = 1; cont; i++) {
StringBuffer rev = new StringBuffer("" + convert(i, base, list)).reverse();
cont = false;
for (char d : list) {
String n = "" + convert(i, base, list) + d + rev;
if (convertBack(n, base, list) <= limit) {
cont = true;
result.add(n);
}
}
}
return result;
}
convert() method converts a number to a string representation of that number in a given base using a list of chars for digits.
convertBack() converts back the string representation of a number to base 10.
When testing my method for base 10, it leaves out two-digit palindromes and then the next ones it leaves out are 1001,1111,1221... and so on.
I'm not sure why.
Here are the conversion methods if needed.
Turns out, this gets slower with my other code because of constant conversions since I need the all numbers in order and in decimal. I'll just stick to iterating over every integer and converting it to every base and then checking if its a palindrome.
I don't have enough reputation to comment, but if you are only missing even length palindromes, then most probably there is something wrong with your list. Most probably you have forgot to add an empty entry in list as to generate 1001, it should be like num(10) + empty("") + rev(01).
There is no so many appropriate chars for digits in all possible bases (like 0xDEADBEEF for hex, and I suppose that convert has some limit like 36), so forget about exotic digits, and use simple lists or arrays like [8888, 123, 5583] for digits in 10000-base.
Then convert limit into need base, store it.
Now generate symmetric arrays of odd and even length like
[175, 2, 175] or [13, 221, 221, 13]. If length is the same as limit length, compare array values and reject too high numbers.
You can also use limit array as starting and generate only palindromes with lesser values.

What is the time complexity and space complexity of this algorithm to find Anagrams?

I am working on an interview question from Amazon Software
The question is "Design an algorithm to take a list of strings as well as a single input string, and return the indices of the list which are anagrams of the input string, disregarding special characters."
I was able to design the algorithm fine, what I did in psuedo code was
1.Create an array character count of the single input string
2.For each string the list, construct the an array character count
3.Compare the character count of each string in list to single output string
4.If same, add it to a list that holds all the indexes of anagrams.
5.Return that list of indices.
Here is my implementation in Java(it works, tested it)
public static List<Integer> indicesOfAnag(List<String> li, String comp){
List<Integer> allAnas = new ArrayList<Integer>();
int[] charCounts = generateCharCounts(comp);
int listLength = li.size();
for(int c=0;c<listLength; c++ ){
int[] charCountComp = generateCharCounts(li.get(c));
if(isEqualCounts(charCounts, charCountComp))
allAnas.add(c);
}
return allAnas;
}
private static boolean isEqualCounts(int[] counts1, int[] counts2){
for(int c=0;c<counts1.length;c++) {
if(counts1[c]!=counts2[c])
return false;
}
return true;
}
private static int[] generateCharCounts(String comp) {
int[] charCounts = new int[26];
int length = comp.length();
for(int c=0;c<length;c++) {
charCounts[Character.toLowerCase(comp.charAt(c)) - 'a'] ++;
}
return charCounts;
}
What I am having trouble with is analyzing the space and time complexity of this algorithm because of both of the sizes of the list and of each string. Would the time complexity algorithm just be O(N) where N is the size of the list(processing each String once) or do I have to take into account the composite complexity of the length of each String, in that case, O(N * n) where n is the length of the string? I did N * n because you ware processing n N times. And would space complexity be O(N) because I am creating N copies of the 26 length array?
And would space complexity be O(N) because I am creating N copies of the 26 length array?
Yes.
Would the time complexity algorithm just be O(N) where N is the size of the list
No. Time depends on size of input strings, it'll be O(comp.length+sum_of_li_lengths).

Exporting specific pattern of string using split method in a most efficient way

I want to export pattern of bit stream in a String varilable. Assume our bit stream is something like bitStream="111000001010000100001111". I am looking for a Java code to save this bit stream in a specific array (assume bitArray) in a way that all continous "0"s or "1"s be saved in one array element. In this example output would be somethins like this:
bitArray[0]="111"
bitArray[1]="00000"
bitArray[2]="1"
bitArray[3]="0"
bitArray[4]="1"
bitArray[5]="0000"
bitArray[6]="1"
bitArray[7]="0000"
bitArray[8]="1111"
I want to using bitArray to calculate the number of bit which is stored in each continous stream. For example in this case the final output would be, "3,5,1,1,1,4,1,4,4". I figure it out that probably "split" method would solve this for me. But I dont know what splitting pattern would do that for me, if i Using bitStream.split("1+") it would split on contious "1" pattern, if i using bitStream.split("0+") it will do that base on continous"0" but how it could be based on both?
Mathew suggested this solution and it works:
var wholeString = "111000001010000100001111";
wholeString = wholeString.replace('10', '1,0');
wholeString = wholeString.replace('01', '0,1');
stringSplit = wholeString.split(',');
My question is "Is this solution the most efficient one?"
Try replacing any occurrence of "01" and "10" with "0,1" and "1,0" respectively. Then once you've injected the commas, split the string using the comma as the delimiting character.
String wholeString = "111000001010000100001111"
wholeString = wholeString.replace("10", "1,0");
wholeString = wholeString.replace("01", "0,1");
String stringSplit[] = wholeString.split(",");
You can do this with a simple regular expression. It matches 1s and 0s and will return each in the order they occur in the stream. How you store or manipulate the results is up to you. Here is some example code.
String testString = "111000001010000100001111";
Pattern pattern = Pattern.compile("1+|0+");
Matcher matcher = pattern.matcher(testString);
while (matcher.find())
{
System.out.print(matcher.group().length());
System.out.print(" ");
}
This will result in the following output:
3 5 1 1 1 4 1 4 4
One option for storing the results is to put them in an ArrayList<Integer>
Since the OP wanted most efficient, I did some tests to see how long each answer takes to iterate over a large stream 10000 times and came up with the following results. In each test the times were different but the order of fastest to slowest remained the same. I know tick performance testing has it's issues like not accounting for system load but I just wanted a quick test.
My answer completed in 1145 ms
Alessio's answer completed in 1202 ms
Matthew Lee Keith's answer completed in 2002 ms
Evgeniy Dorofeev's answer completed in 2556 ms
Hope this helps
I won't give you a code, but I'll guide you to a possible solution:
Construct an ArrayList<Integer>, iterate on the array of bits, as long as you have 1's, increment a counter and as soon as you have 0, add the counter to the ArrayList. After this procedure, you'll have an ArrayList that contain numbers, etc: [1,2,2,3,4] - Representing a serieses of 1's and 0's.
This will represent the sequences of 1's and 0's. Then you construct an array of the size of the ArrayList, and fill it accordingly.
The time complexity is O(n) because you need to iterate on the array only once.
This code works for any String and patterns, not only 1s and 0s. Iterate char by char, and if the current char is equal to the previous one, append the last char to the last element of the List, otherwise create a new element in the list.
public List<String> getArray(String input){
List<String> output = new ArrayList<String>();
if(input==null || input.length==0) return output;
int count = 0;
char [] inputA = input.toCharArray();
output.add(inputA[0]+"");
for(int i = 1; i <inputA.length;i++){
if(inputA[i]==inputA[i-1]){
String current = output.get(count)+inputA[i];
output.remove(count);
output.add(current);
}
else{
output.add(inputA[i]+"");
count++;
}
}
return output;
}
try this
String[] a = s.replaceAll("(.)(?!\\1)", "$1,").split(",");
I tried to implement #Maroun Maroun solution.
public static void main(String args[]){
long start = System.currentTimeMillis();
String bitStream ="0111000001010000100001111";
int length = bitStream.length();
char base = bitStream.charAt(0);
ArrayList<Integer> counts = new ArrayList<Integer>();
int count = -1;
char currChar = ' ';
for (int i=0;i<length;i++){
currChar = bitStream.charAt(i);
if (currChar == base){
count++;
}else {
base = currChar;
counts.add(count+1);
count = 0;
}
}
counts.add(count+1);
System.out.println("Time taken :" + (System.currentTimeMillis()-start ) +"ms");
System.out.println(counts.toString());
}
I believe it is more effecient way, as he said it is O(n) , you are iterating only once. Since the goal to get the count only not to store it as array. i woul recommen this. Even if we use Regular Expression ( internal it would have to iterate any way )
Result out put is
Time taken :0ms
[1, 3, 5, 1, 1, 1, 4, 1, 4, 4]
Try this one:
String[] parts = input.split("(?<=1)(?=0)|(?<=0)(?=1)");
See in action here: http://rubular.com/r/qyyfHNAo0T

Improving Collections.Sort

I am sorting 1 million strings (each string 50 chars) in ArrayList with
final Comparator comparator= new Comparator<String>() {
public int compare(String s1, String s2) {
if (s2 == null || s1 == null)
return 0;
return s1.compareTo(s2);
}
};
Collections.Sort(list,comparator);
The average time for this is: 1300 millisec
How can I speed it up?
If you're using Java 6 or below you might get a speedup by switching to Java 7. In Java 7 they changed the sort algorithm to TimSort which performs better in some cases (in particular, it works well with partially sorted input). Java 6 and below used MergeSort.
But let's assume you're using Java 6. I tried three versions:
Collections.sort(): Repeated runs of the comparator you provided take about 3.0 seconds on my machine (including reading the input of 1,000,000 randomly generated lowercase ascii strings).
Radix Sort: Other answers suggested a Radix sort. I tried the following code (which assumes the strings are all the same length, and only lowercase ascii):
String [] A = list.toArray(new String[0]);
for(int i = stringLength - 1; i >=0; i--) {
int[] buckets = new int[26];
int[] starts = new int[26];
for (int k = 0 ; k < A.length;k++) {
buckets[A[k].charAt(i) - 'a']++;
}
for(int k = 1; k < buckets.length;k++) {
starts[k] = buckets[k -1] + starts[k-1];
}
String [] temp = new String[A.length];
for(int k = 0; k < A.length; k++) {
temp[starts[A[k].charAt(i) - 'a']] = A[k];
starts[A[k].charAt(i) - 'a']++;
}
A = temp;
}
It takes about 29.0 seconds to complete on my machine. I don't think this is the best way to implement radix sort for this problem - for example, if you did a most-significant digit sort then you could terminate early on unique prefixes. And there'd also be some benefit in using an in-place sort instead (There's a good quote about this - “The troubles with radix sort are in
implementation, not in conception”). I'd like to write a better radix sort based solution that does this - if I get time I'll update my answer.
Bucket Sort: I also implemented a slightly modified version of Peter Lawrey's bucket sort solution. Here's the code:
Map<Integer, List<String>> buckets = new TreeMap<Integer,List<String>>();
for(String s : l) {
int key = s.charAt(0) * 256 + s.charAt(1);
List<String> list = buckets.get(key);
if(list == null) buckets.put(key, list = new ArrayList<String>());
list.add(s);
}
l.clear();
for(List<String> list: buckets.values()) {
Collections.sort(list);
l.addAll(list);
}
It takes about 2.5 seconds to complete on my machine. I believe this win comes from the partitioning.
So, if switching to Java 7's TimSort doesn't help you, then I'd recommend partitioning the data (using something like bucket sort). If you need even better performance, then you can also multi-thread the processing of the partitions.
You didn't specify the sort algorithm you use some are quicker than others(quick/merge vs. bubble)
Also If you are running on a multi-core/multi-processor machine you can divide the sort between multiple thread (again exactly how depends on the sort algorithm but here's an example)
You can use a radix sort for the first two characters. If you first two characters are distinctive you can use something like.
List<String> strings =
Map<Integer, List<String>> radixSort =
for(String s: strings) {
int key = (s.charAt(0) << 16) + s.charAt(1);
List<String> list = radixSort.get(key);
if(list == null) radixSort.put(key, list = new ArrayList<String>());
list.add(s);
}
strings.clear();
for(List<String> list: new TreeMap<Integer, List<String>>(radixSort).values()) {
Collections.sort(list);
strings.addAll(list);
}

Categories