Best way of scanning for letter combinations - java

So let's say I have a 32 character string like this:
GCAAAGCTTGGCACACGTCAAGAGTTGACTTT
My goal is to count all occurrences of specific substrings, such as 'AA' 'ATT' 'CGG' and so on. For this purpose, the 3rd through 5th characters above contain 2 occurrences of 'AA'. There are a total of 8 of these substrings, 6 that are 3 characters in length and 2 that are 2 characters in length, and I would want counts for all eight.
What would be the most efficient way of doing this in Java? My thoughts follow a couple lines:
Scan through character by character, checking and flagging for each substring. This seems intensive and inefficient.
Find some existing function that would do the work (not sure of efficiency of what function it would be, String.contains is a boolean, not a count).
Scan through the string multiple times, each sweep checking for a different substring.
The implementation of 3 is trivial, but 1 might give a few extra headaches and won't be very clean code.

I think this should answer your question.
The naive approach (checking for substring at each possible index)
runs in O(nk) where n is the length of the string and k is the length
of the substring. This could be implemented with a for-loop, and
something like haystack.substring(i).startsWith(needle).
More efficient algorithms exist though. You may want to have a look at
the Knuth-Morris-Pratt algorithm, or the Aho-Corasick algorithm. As
opposed to the naive approach, both of these algorithms behave well
also on input like "look for the substring of 100 'X' in a string of
10000 'X's.
Taken from stackoverflow.com/questions/4121875/count-of-substrings-within-string

One approach is to essentially code up an NFA (http://en.wikipedia.org/wiki/Nondeterministic_finite_automaton)
and just run your input on the NFA.
Here's my attempt at coding an NFA. You'd probably want to convert to a DFA first before running it so that you don't have to manage a bunch of branches. With the branches it's basically as slow as O(nk), whereas if you convert to a DFA it would be O(n)
import java.util.*;
public class Test
{
public static void main (String[] args)
{
new Test();
}
private static final String input = "TAAATGGAGGTAATAGAGGAGGTGTAT";
private static final String[] substrings = new String[] { "AA", "AG", "GG", "GAG", "TA" };
private static final int[] occurrences = new int[substrings.length];
public Test()
{
ArrayList<Branch> branches = new ArrayList<Branch>();
// For each character, read it, create branches for each substring, and pass the current character
// to each active branch
for (int i = 0; i < input.length(); i++)
{
char c = input.charAt(i);
// Make a new branch, one for each substring that we are searching for
for (int j = 0; j < substrings.length; j++)
branches.add(new Branch(substrings[j], j, branches));
// Pass the current input character to each branch that is still alive
// Iterate in reverse order because the nextCharacter method may
// cause the branch to be removed from the ArrayList
for (int j = branches.size()-1; j >= 0; j--)
branches.get(j).nextCharacter(c);
}
for (int i = 0; i < occurrences.length; i++)
System.out.println(substrings[i]+": "+occurrences[i]);
}
private static class Branch
{
private String searchFor;
private int position, index;
private ArrayList<Branch> parent;
public Branch(String searchFor, int searchForIndex, ArrayList<Branch> parent)
{
this.parent = parent;
this.searchFor = searchFor;
this.position = 0;
this.index = searchForIndex;
}
public void nextCharacter(char c)
{
// If the current character matches the ith character of the string we are searching for,
// Then this branch will stay alive
if (c == searchFor.charAt(position))
position++;
// Otherwise the substring didn't match, so this branch dies
else
suicide();
// Reached the end of the substring, so the substring was found.
if (position == searchFor.length())
{
occurrences[index] += 1;
suicide();
}
}
private void suicide()
{
parent.remove(this);
}
}
}
output for this example is
AA: 3
AG: 4
GG: 4
GAG: 3
TA: 4

Do you want to find all possible substrings that are longer than 1 character?
In that case one approach is to use HashMaps.
This example outputs:
{AA=3, TT=4, AC=3, CTT=2, CAA=2, GCA=2, CAC=2, AG=3, TTG=2, AAG=2, GT=2, CT=2, TG=2, GA=2, GC=3, CA=4}
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
public class Test {
public static void main(String[] args) {
String str = "GCAAAGCTTGGCACACGTCAAGAGTTGACTTT";
HashMap<String, Integer> map = countMatches(str);
System.out.println(map);
}
private static HashMap<String, List<Integer>> findOneLetterMatches(String str) {
ArrayList<Integer> list = new ArrayList<>();
for(int i = 0; i < str.length(); i++) list.add(i);
return extendMatches(str, list, 1);
}
private static HashMap<String, List<Integer>> extendMatches(String str, List<Integer> indices, int targetLength) {
HashMap<String, List<Integer>> map = new HashMap<>();
for(int index: indices) {
if(index+targetLength <= str.length()) {
String s = str.substring(index, index + targetLength);
List<Integer> list = map.get(s);
if(list == null) {
list = new ArrayList<>();
map.put(s, list);
}
list.add(index);
}
}
return map;
}
private static void addIfListLongerThanOne(HashMap<String, List<Integer>> source,
HashMap<String, List<Integer>> target) {
for(Map.Entry<String, List<Integer>> e: source.entrySet()) {
String s = e.getKey();
List<Integer> l = e.getValue();
if(l.size() > 1) target.put(s, l);
}
}
private static HashMap<String, List<Integer>> extendAllMatches(String str, HashMap<String, List<Integer>> map, int targetLength) {
HashMap<String, List<Integer>> result = new HashMap<>();
for(List<Integer> list: map.values()) {
HashMap<String, List<Integer>> m = extendMatches(str, list, targetLength);
addIfListLongerThanOne(m, result);
}
return result;
}
private static HashMap<String, Integer> countMatches(String str) {
HashMap<String, Integer> result = new HashMap<>();
HashMap<String, List<Integer>> matches = findOneLetterMatches(str);
for(int targetLength = 2; !matches.isEmpty(); targetLength++) {
HashMap<String, List<Integer>> m = extendAllMatches(str, matches, targetLength);
for(Map.Entry<String, List<Integer>> e: m.entrySet()) {
String s = e.getKey();
List<Integer> l = e.getValue();
result.put(s, l.size());
}
matches = m;
}
return result;
}
}

Related

How can I get the count of most duplicated value in a list after sorting it alphabetically?

What is the easiest way to get the most duplicated value in a list and sorted in descending order...
for example:
List<String> list = new ArrayList<>(List.of("Renault","BMW","Renault","Renault","Toyota","Rexon","BMW","Opel","Rexon","Rexon"));
`
"renault" & "rexon" are most duplicated and if sorted in descending order alphabetically I would like to get the rexon.
I think one of the most readable and elegant way would be to use the Streams API
strings.stream()
.collect(Collectors.groupingBy(x -> x, Collectors.counting()))
.entrySet().stream()
.max(Comparator.comparingLong((ToLongFunction<Map.Entry<String, Long>>) Map.Entry::getValue).thenComparing(Map.Entry::getKey))
.map(Map.Entry::getKey)
.ifPresent(System.out::println);
Create a map of names with their corresponding number of occurrences.
Get names and sort them in descending order.
Print the first name that has the highest number of occurrences.
class Scratch {
public static void main(String[] args) {
List<String> list = List.of("Renault","BMW","Renault","Renault","Toyota","Rexon","BMW","Opel","Rexon","Rexon");
Map<String, Integer> duplicates = new HashMap<>();
// 1. Create a map of names with their corresponding
// number of occurrences.
for (String s: list) {
duplicates.merge(s, 1, Integer::sum);
}
// 2. Get names and sort them in descending order.
List<String> newList = new ArrayList<String>(duplicates.keySet());
newList.sort(Collections.reverseOrder());
// 3. Print the first name that has the highest number of
// occurrences.
Integer max = Collections.max(duplicates.values());
newList.stream().filter(name -> duplicates.get(name).equals(max))
.findFirst()
.ifPresent(System.out::println);
}
}
After some time this is what I came with (I only tested it with your example and it worked):
public class Duplicated {
public static String MostDuplicated(String[] a) {
int dup = 0;
int position = -1;
int maxDup = 0;
for(int i = 0; i < a.length; i++) { //for every position
for(int j = 0; j < a.length; j++){ //compare it to all
if(a[i].equals(a[j])) { dup++; } // and count how many time is duplicated
}
if (dup > maxDup) { maxDup = dup; position = i;}
//if the number of duplications
//is greater than the maximum you have got so far, save this position.
else if (dup == maxDup) {
if( a[i].compareTo(a[position]) > 0 ){ position = i; }
//if its the same, keep the position of the alphabetical last
// (if u want the alphabetical first, just change the "<" to ">")
}
}
return a[position]; //return the position you saved
}
}
You are asking to sort the list and then find the most common item.
I would suggest that the easiest way to sort the list is using the sort method that is built into list.
I would then suggest finding the most common by looping with the for..each construct, keeping track of the current and longest streaks.
I like Yassin Hajaj's answer with streams but I find this way easier to write and easier to read. Your mileage may vary, as this is subjective. :)
import java.util.*;
public class SortingAndMostCommonDemo {
public static void main(String[] args) {
List<String> list = new ArrayList<>(List.of("Renault","BMW","Renault","Renault","Toyota","Rexon","BMW","Opel","Rexon","Rexon"));
list.sort(Comparator.reverseOrder());
System.out.println(list);
System.out.println("The most common is " + mostCommon(list) + ".");
}
private static String mostCommon(List<String> list) {
String mostCommon = null;
int longestStreak = 0;
String previous = null;
int currentStreak = 0;
for (String s : list) {
currentStreak = 1 + (s.equals(previous) ? currentStreak : 0);
if (currentStreak > longestStreak) {
mostCommon = s;
longestStreak = currentStreak;
}
previous = s;
}
return mostCommon;
}
}
The fast algorithm takes advantage of the fact that the list is sorted and finds the list with the most duplicates in O(n), with n being the size of the list. Since the list is sorted the duplicates will be together in consecutive positions:
private static String getMostDuplicates(List<String> list) {
if(!list.isEmpty()) {
list.sort(Comparator.reverseOrder());
String prev = list.get(0);
String found_max = prev;
int max_dup = 1;
int curr_max_dup = 0;
for (String s : list) {
if (!s.equals(prev)) {
if (curr_max_dup > max_dup) {
max_dup = curr_max_dup;
found_max = prev;
}
curr_max_dup = 0;
}
curr_max_dup++;
prev = s;
}
return found_max;
}
return "";
}
Explanation:
We iterate through the list and keep track of the maximum of duplicates found so far and the previous element. If the current element is the same as the previous one we increment the number of duplicates found so far. Otherwise, we check if the number of duplicates is the bigger than the previous maximum of duplicates found. If it is we update accordingly
A complete running example:
public class Duplicates {
private static String getMostDuplicates(List<String> list) {
if(!list.isEmpty()) {
list.sort(Comparator.reverseOrder());
String prev = list.get(0);
String found_max = prev;
int max_dup = 1;
int curr_max_dup = 0;
for (String s : list) {
if (!s.equals(prev)) {
if (curr_max_dup > max_dup) {
max_dup = curr_max_dup;
found_max = prev;
}
curr_max_dup = 0;
}
curr_max_dup++;
prev = s;
}
return found_max;
}
return "";
}
public static void main(String[] args) {
List<String> list = new ArrayList<>(List.of("Renault","BMW","Renault","Renault","Toyota","Rexon","BMW","Opel","Rexon","Rexon"));
String duplicates = getMostDuplicates(list);
System.out.println("----- Test 1 -----");
System.out.println(duplicates);
list = new ArrayList<>(List.of("Renault","BMW"));
duplicates = getMostDuplicates(list);
System.out.println("----- Test 2 -----");
System.out.println(duplicates);
list = new ArrayList<>(List.of("Renault"));
duplicates = getMostDuplicates(list);
System.out.println("----- Test 3 -----");
System.out.println(duplicates);
}
}
Output:
----- Test 1 -----
Rexon
----- Test 2 -----
Renault
----- Test 3 -----
Renault
Actually, I found a solution which works:
public static void main(String[] args) {
List<String> list = new ArrayList<>(List.of("Renault", "BMW", "BMW", "Renault", "Renault", "Toyota",
"Rexon", "BMW", "Opel", "Rexon", "Rexon"));
Map<String, Integer> soldProducts = new HashMap<>();
for (String s : list) {
soldProducts.put(s, soldProducts.getOrDefault(s, 0) + 1);
}
LinkedHashMap<String, Integer> sortedMap = soldProducts.entrySet()
.stream()
.sorted(VALUE_COMPARATOR.thenComparing(KEY_COMPARATOR_REVERSED))
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue, (e1, e2) -> e2, LinkedHashMap::new));
String result = "";
for (Map.Entry<String, Integer> s : sortedMap.entrySet()) {
result = s.getKey();
}
System.out.println(result);
}
static final Comparator<Map.Entry<String, Integer>> KEY_COMPARATOR_REVERSED =
Map.Entry.comparingByKey(Comparator.naturalOrder());
static final Comparator<Map.Entry<String, Integer>> VALUE_COMPARATOR =
Map.Entry.comparingByValue();

Creating a HashMap of chars in a string and integers in an ArrayList<integer>

I have to create a HashMap that records the letters in a string and their index values in a ArrayList, so that if the HashMap is called with some string key, each related index integer is returned, and so that the map can be called by itself such that each key is shown with their indexes, For example for the string "Hello World", the map would look something like:
d=[9], o=[4, 6], r=[7], W=[5], H=[0], l=[2, 3, 8], e=[1].
I'm really confused by the requirement of the inputs as String and ArrayList, rather than chars and integers. Could you explain to me the relationship of the map to those objects, and to their components which are ultimately what are recorded as keys and values? When trying to debug, it stops processing before the map call.
The error message is:
java.lang.AssertionError: Wrong number of entries in Concordance. Expected: 5. Got: 1
Expected :1
Actual :5
But I really think I'm not grasping HashMap very well, so I'd appreciate if anyone could guide me through the basics, or provide anything educational about using HashMap, especially ones that use ArrayList.
public HashMap<String, ArrayList<Integer>> concordanceForString(String s) {
HashMap<String, ArrayList<Integer>> sMap = new HashMap<>();//create map "sMap"
char[] sArray = new char[s.length()]; //create character array, "sArray", for string conversion
ArrayList<Integer> sCharIndex = new ArrayList<Integer>();
for (int i = 0; i < s.length(); i++) {
sArray[i] = s.charAt(i); // convert string into array
}
for (int j = 0; j < sArray.length; j++){
sCharIndex.add(j); // add char indexes to index ArrayList
}
sMap.put(s, sCharIndex); //add the String and ArrayList
return sMap; // I feel like this should be sMap.get(s) but when I do, it gives me the zigzag red underline.
}
Here is a way to do it:
String input = "hello world";
Map<String, List<Integer>> letters = new HashMap<String, List<Integer>>();
// remove all whitespace characters - since it appears you are doing that
String string = input.replaceAll("\\s", "");
// loop over the length of the string
for (int i = 0; i < string.length(); i++) {
// add the entry to the map
// if it does not exist, then a new List with value of i is added
// if the key does exist, then the new List of i is added to the
// existing List
letters.merge(string.substring(i, i + 1),
Arrays.asList(i),
(o, n) -> Stream.concat(o.stream(), n.stream()).collect(Collectors.toList()));
}
System.out.println(letters);
that gives this output:
{r=[7], d=[9], e=[1], w=[5], h=[0], l=[2, 3, 8], o=[4, 6]}
EDIT - this uses a Character as the key to the map:
String input = "hello world";
Map<Character, List<Integer>> letters = new HashMap<Character, List<Integer>>();
String string = input.replaceAll("\\s", "");
for (int i = 0; i < string.length(); i++) {
letters.merge(string.charAt(i), Arrays.asList(i), (o, n) ->
Stream.concat(o.stream(), n.stream()).collect(Collectors.toList()));
}
System.out.println(letters);
Essentially, this is what you want to do.
This presumes a HashMap<String, List<Integer>>
List<Integer> sCharIndex;
for (int i = 0; i < s.length(); i++) {
// get the character
char ch = s.charAt(i);
if (!Character.isLetter(ch)) {
// only check letters
continue;
}
ch = ch+""; // to string
// get the list for character
sCharIndex = sMap.get(ch);
// if it is null, create one and add it
if (sCharIndex == null) {
// create list
sCharIndex = new ArrayList<>();
// put list in map
sMap.put(ch, sCharIndex);
}
// at this point you have the list so
// add the index to it.
sCharIndex.add(i);
}
return sMap;
A hashMap is nothing more than a special data structure that takes an object as a key. Think of an array that takes a digit as an index and you can store anything there.
A hashMap can take anything as a key (like an index but it is called a key) and it can also store anything.
Note that your key to hashMap is a String but you're using a character which is not the same. So you need to decide which you want.
HashMap<String, List<Integer>> or HashMap<Character, List<Integer>>
There are also easier ways to do this but this is how most would accomplish this prior to Java 8.
Here is a much more compact way using streams. No loops required.
Map<String, List<Integer>> map2 = IntStream
.range(0,s.length())
// only look for letters.
.filter(i->Character.isLetter(s.charAt(i)))
.boxed()
// stream the Integers from 0 to length
// and group them by character in a list of indices.
.collect(Collectors.groupingBy(i->s.charAt(i)+""));
But I recommend you become familiar with the basics before delving into streams (or until your instructor recommends to do so).
For more information check out The Java Tutorials
Check out this code :
public static void main(String []args){
//Create map of respective keys and values
HashMap<Character, ArrayList<Integer>> map = new HashMap();
String str = "Hello world"; //test string
int length = str.length(); //length of string
for(int i = 0; i < length; i++){
ArrayList<Integer> indexes = new ArrayList(); //empty list of indexes
//character of test string at particular position
Character ch = str.charAt(i);
//if key is already present in the map, then add the previous index associated with the character to the indexes list
if(map.containsKey(ch)){
//adding previous indexes to the list
indexes.addAll(map.get(ch));
}
//add the current index of the character to the respective key in map
indexes.add(i);
//put the indexes in the map and map it to the current character
map.put(ch, indexes);
}
//print the indexes of 'l' character
System.out.print(map.get('l'));
}
The code is self explanatory.
public class Array {
public static void main(String[] args) {
printSortedMap(concordanceForString("Hello world")); // r[7] d[9] e[1] w[5] H[0] l[2, 3, 8] o[4, 6]
}
public static HashMap<String, ArrayList<Integer>> concordanceForString(String s) {
HashMap<String, ArrayList<Integer>> sMap = new HashMap<>();
String str = s.replace(" ", "");
for (int i = 0; i < str.length(); i++) {
ArrayList<Integer> sCharIndex = new ArrayList<Integer>();
for (int j = 0; j < str.length(); j++) {
if ( str.charAt(i) == str.charAt(j) ) {
sCharIndex.add(j);
}
}
sMap.put(str.substring(i,i+1), sCharIndex);
}
return sMap;
}
public static void printSortedMap(HashMap<String, ArrayList<Integer>> sMap) {
for (Map.Entry<String, ArrayList<Integer>> entry : sMap.entrySet()) {
System.out.println(entry.getKey() + entry.getValue());
}
}

Grouping elements from lists into sub lists without duplicates in Java

I am working on 'Grouping Anagrams'.
Problem statement: Given an array of strings, group anagrams together.
I could group the anagrams but I am not able to avoid the ones which are already grouped. I want to avoid duplicates. An element can only belong to one group. In my code, an element belongs to multiple groups.
Here is my code:
public class GroupAnagrams1 {
public static void main(String[] args) {
String[] input = {"eat", "tea", "tan", "ate", "nat", "bat"};
List<List<String>> result = groupAnagrams(input);
for(List<String> s: result) {
System.out.println(" group: ");
for(String x:s) {
System.out.println(x);
}
}
}
public static List<List<String>> groupAnagrams(String[] strs) {
List<List<String>> result = new ArrayList<List<String>>();
for(int i =0; i < strs.length; i++) {
Set<String> group = new HashSet<String>();
for(int j= i+1; j < strs.length; j++) {
if(areAnagrams(strs[i], strs[j])) {
group.add(strs[i]);
group.add(strs[j]);
}
}
if(group.size() > 0) {
List<String> aList = new ArrayList<String>(group);
result.add(aList);
}
}
return result;
}
Here comes the method to check if two string are anagrams.
private static boolean areAnagrams(String str1, String str2) {
char[] a = str1.toCharArray();
char[] b = str2.toCharArray();
int[] count1 = new int[256];
Arrays.fill(count1, 0);
int[] count2 = new int[256];
Arrays.fill(count2, 0);
for(int i = 0; i < a.length && i < b.length; i++) {
count1[a[i]]++;
count2[b[i]]++;
}
if(str1.length() != str2.length())
return false;
for(int k=0; k < 256; k++) {
if(count1[k] != count2[k])
return false;
}
return true;
}
}
expected output:
group:
tea
ate
eat
group:
bat
group:
tan
nat
actual output:
group:
tea
ate
eat
group:
tea
ate
group:
tan
nat
The order in which the groups are displayed does not matter. The way it is displayed does not matter.
Preference: Please feel free to submit solutions using HashMaps but I prefer to see solutions without using HashMaps and using Java8
I also would recommend using java Streams for that. Because you don't want that here is another solution:
public static List<List<String>> groupAnagrams(String[] strs) {
List<List<String>> result = new ArrayList<>();
for (String str : strs) {
boolean added = false;
for (List<String> r : result) {
if (areAnagrams(str, r.get(0))) {
r.add(str);
added = true;
break;
}
}
if (!added) {
List<String> aList = new ArrayList<>();
aList.add(str);
result.add(aList);
}
}
return result;
}
The problem in your solution is that you are moving each iteration one step ahead, so you just generate the not full complete group ["tea", "ate"] instead of ["bat"].
My solution uses a different approach to check if you have a group where the first word is an anagram for the searched word. if not create a new group and move on.
Because I would use Java Streams as I said at the beginning here is my initial solution using a stream:
List<List<String>> result = new ArrayList<>(Arrays.stream(words)
.collect(Collectors.groupingBy(w -> Stream.of(w.split("")).sorted().collect(Collectors.joining()))).values());
To generate the sorted string keys to group the anagrams you can look here for more solutions.
The result is both my provided solutions will be this:
[[eat, tea, ate], [bat], [tan, nat]]
I would have taken a slightly different approach using streams:
public class Scratch {
public static void main(String[] args) {
String[] input = { "eat", "tea", "tan", "ate", "nat", "bat" };
List<List<String>> result = groupAnagrams(input);
System.out.println(result);
}
private static List<List<String>> groupAnagrams(String[] input) {
return Arrays.asList(input)
// create a list that wraps the array
.stream()
// stream that list
.map(Scratch::sortedToOriginalEntryFor)
// map each string we encounter to an entry containing
// its sorted characters to the original string
.collect(Collectors.groupingBy(Entry::getKey, Collectors.mapping(Entry::getValue, Collectors.toList())))
// create a map whose key is the sorted characters and whose
// value is a list of original strings that share the sorted
// characters: Map<String, List<String>>
.values()
// get all the values (the lists of grouped strings)
.stream()
// stream them
.collect(Collectors.toList());
// convert to a List<List<String>> per your req
}
// create an Entry whose key is a string of the sorted characters of original
// and whose value is original
private static Entry<String, String> sortedToOriginalEntryFor(String original) {
char c[] = original.toCharArray();
Arrays.sort(c);
String sorted = new String(c);
return new SimpleEntry<>(sorted, original);
}
}
This yields:
[[eat, tea, ate], [bat], [tan, nat]]
If you want to eliminate repeated strings (e.g. if "bat" appears twice in your input) then you can call toSet() instead of toList() in your Collectors.groupingBy call, and change the return type as appropriate.

Anagrams finding in java

I stuck on a problem. I have a String array which is consist of String[]={"eat", "tea", "tan", "ate", "nat", "bat"} Now, I should segregated those word which have same letter on it and make a group. eat,tea,ate they have same letter in each word so this is a group. Group 2 should be tan,nat and Group3 should be bat. So I have to make a list of list to store those groups.
My approach:
To solve this problem I first find out the ascii values of each letter and then add those ascii values for a word. Like eat find out the ascii values of e,a,t and add them. I take this approach because if the letters are repeated in the words then they must have same ascii sum. After that I group them same Ascii sums and find out which words have those sums then they belongs to same group.
My progress
I find out ascii sums and put them in a hashmap. But then I could not group the same values. As I failed to group the ascii values I cannot find out the words.I have no clue how to proceed.
I also follow this posts
post1
post2
But there approach and my approach is not same. Also the questions are different from mine. I am discussing here about a different approach which is depend upon ASCII values.
My code:
public List<List<String>> groupAnagrams(String[] strs) {
ArrayList<Character>indivistr=new ArrayList<>();
ArrayList<Integer>dup=new ArrayList<>();
HashMap<Integer,Integer>mappingvalues=new HashMap<>();
for(int i=0;i<strs.length;i++){
int len=strs[i].length();
int sum=0;
for(int j=0;j<len;j++){
indivistr.add(strs[i].charAt(j));
int ascii=(int)strs[i].charAt(j);
sum=sum+ascii;
}
mappingvalues.put(i,sum);
}
}
One more approach
I transfer the map keys in a Arraylist and map values in a ArrayList. Something like that,
ArrayList<Integer>key_con=new ArrayList<
(mappingvalues.keySet());
ArrayList<Integer>val_con=new ArrayList<>(mappingvalues.values());
Then using two loops and put the same values into another list.
for(int k=0;k<val_con.size();k++){
for(int k1=k+1;k1<val_con.size();k1++){
if(val_con.get(k).equals(val_con.get(k1))){
dup.add(val_con.get(k1));
}
}
Now if I print dup output will be [314, 314, 314, 323] which is partially correct. It should be 314,314,314,323,323,311
This should get you started.
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
public class Main {
public static void main(String args[]) throws Exception {
String[] words ={"eat", "tea", "tan", "ate", "nat", "bat"};
for(List<String> list : groupAnagrams(words))
System.out.println(list);
}
public static List<ArrayList<String>> groupAnagrams(String[] words) {
List<ArrayList<String>> wordGroups = new ArrayList<ArrayList<String>>();
HashMap<Integer, ArrayList<String>> map = new HashMap<Integer, ArrayList<String>>();
for(String word : words) {
int sum = 0;
for(char c : word.toCharArray())
sum += c;
if(map.containsKey(sum))
map.get(sum).add(word);
else {
ArrayList<String> list = new ArrayList<String>();
list.add(word);
map.put(sum, list);
}
}
for(ArrayList<String> list : map.values())
wordGroups.add(list);
return wordGroups;
}
}
This program will work for small scale things such as this but consider the following input data:
{"a", "#!"}
The sum of these Strings are both 97.
Since you're using ASCII values to find anagrams you might run into a case such as this. This isn't a particularly pressing matter until you start messing around with lowercase letters and capitals. Easy fix for that is just a String.ToUpperCase() and map the symbols to huge numbers and you're good to go.
For posterity:
public class anagrams {
public static void main(String args[]) {
int numberOfAnagrams = 0;
String[] stringArray = {"eat", "tea", "tan", "ate", "nat", "bat", "plate", "knot"};
List<String> stringList = Arrays.asList(stringArray);
for(int i = 0; i < stringList.size() - 1; i++) {
for(int j = i + 1; j < stringList.size(); j++) {
if(isAnagram(stringList.get(i), stringList.get(j))) {
System.out.println(stringList.get(i) + " " + stringList.get(j));
numberOfAnagrams += 2;
}
}
}
System.out.println(numberOfAnagrams);
}
private static boolean isAnagram(String s1, String s2) {
// In order for two String to be anagrams, they must have the same length.
if(s1.length() != s2.length()) {
return false;
}
// If s2 does not contain even one of s1's chars return false.
for(int i = 0; i < s1.length(); i++) {
if(!s2.contains("" + s1.charAt(i))) {
return false;
}
}
return true;
}
}
Based on the asci approach I have made a working code
public static void main(String[] args) {
String[] values ={"eat", "tea", "tan", "ate", "nat", "bat"};
Map<Integer, List<String>> resultMap = new HashMap<Integer, List<String>>();
for (String value : values) {
char[] caharacters = value.toLowerCase().toCharArray();
int asciSum = 0;
for (char character : caharacters) {
asciSum = asciSum + (int)character;
}
System.out.println(asciSum);
if(resultMap.containsKey(asciSum)) {
resultMap.get(asciSum).add(value);
}else {
List<String> list = new ArrayList<String>();
list.add(value);
resultMap.put(asciSum, list);
}
}
System.out.println(resultMap);
}
This will give result
{323=[tan, nat], 311=[bat], 314=[eat, tea, ate]}
but if we encounter diff characters with same asci value sum like 10+11 = 20+1
below code will work where based on the sorted string we make the result map
public static void main(String[] args) {
String[] values ={"eat", "tea", "tan", "ate", "nat", "bat"};
Map<String, List<String>> resultMap = new HashMap<String, List<String>>();
for (String value : values) {
char[] caharacters = value.toLowerCase().toCharArray();
Arrays.sort(caharacters);
String sortedValue = new String(caharacters);
System.out.println(sortedValue);
if(resultMap.containsKey(sortedValue)) {
resultMap.get(sortedValue).add(value);
}else {
List<String> list = new ArrayList<String>();
list.add(value);
resultMap.put(sortedValue, list);
}
}
System.out.println(resultMap);
}
This will return
{aet=[eat, tea, ate], abt=[bat], ant=[tan, nat]}
I have fixed the comments and edits provided.
Here's my idea, first I would create a class that will store the original string and it's sorted version:
class Anagram {
String s;
String sorted;
}
Then I map the input to my list of Anagram:
List<Anagram> collect = Arrays.stream(strs)
.map(a -> new Anagram(a, Arrays.stream(a.split(""))
.sorted()
.reduce(String::concat).get()))
.collect(Collectors.toList());
Then I just group the obtained list by sorted string:
Map<String, List<Anagram>> groupBy = collect
.stream()
.collect(Collectors.groupingBy(Anagram::getSorted));
Now you have the lists with grouped anagrams, just extract from them the original string:
List<List<String>> result = new ArrayList<>();
for(List<Anagram> list : collect1.values()) {
List<String> myList = list.stream().map(Anagram::getS).collect(Collectors.toList());
result.add(myList);
}

How can I find the most frequent word in a huge amount of words (eg. 900000)

I am facing a task which is generating 900000 random words and then print out the most frequent one. So here is my algorithm:
1. move all number into a collection rather than printhing out them
2. for (900000...){move the frequency of Collection[i] to another collection B}
** 90W*90W is too much for a computer(lack of efficiency)
3. find the biggest number in that collection and the index.
4. then B[index] is output.
But the thing is that my computer cannot handle the second step. So I searched on this website and find some answer about find the frequency of word in a bunch of words and I viewed the answer code, but I haven't find a way to apply them into huge amount of words.
Now I show my code here:
/** Funny Words Generator
* Tony
*/
import java.util.*;
public class WordsGenerator {
//data field (can be accessed in whole class):
private static int xC; // define a xCurrent so we can access it all over the class
private static int n;
private static String[] consonants = {"b","c","d","f","g","h","j","k","l","m","n","p","r","s","t","v","w","x","z"};
private static String[] vowels = {"a", "e", "i", "o", "u"};
private static String funnyWords = "";
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
int times = 900000; // words number
xC = sc.nextInt(); // seeds (only input)
/* Funny word list */
ArrayList<String> wordsList = new ArrayList<String>();
ArrayList<Integer> frequencies = new ArrayList<Integer>();
int maxFreq;
for (int i = 0; i < times; i++) {
n = 6; // each words are 6 characters long
funnyWords = ""; // reset the funnyWords each new time
for (int d = 0; d < n; d ++) {
int letterNum = randomGenerator(); /* random generator will generate numbers based on current x */
int letterIndex = 0; /* letterNum % 19 or % 5 based on condition */
if ((d + 1) % 2 == 0) {
letterIndex = letterNum % 5;
funnyWords += vowels[letterIndex];
}
else if ((d + 1) % 2 != 0) {
letterIndex = letterNum % 19;
funnyWords += consonants[letterIndex];
}
}
wordsList.add(funnyWords);
}
/* put all frequencies of each words into an array called frequencies */
for (int i = 0; i < 900000; i++) {
frequencies.add(Collections.frequency(wordsList, wordsList.get(i)));
}
maxFreq = Collections.max(frequencies);
int index = frequencies.indexOf(maxFreq); // get the index of the most frequent word
System.out.print(wordsList.get(index));
sc.close();
}
/** randomGenerator
* param: N(generate times), seeds
* return: update the xC and return it */
private static int randomGenerator() {
int a = 445;
int c = 700001;
int m = 2097152;
xC = (a * xC + c) % m; // update
return xC; // return
}
}
So I have realized that maybe there is a way skip the second step somehow. Anyone can give me a hint? Just a hint not code so I can try it myself will be great! Thx!
Modified:
I see lots of your answer code contains "words.stream()", I googled it and I couldn't find it. Could you guys please tell me where I can find this kind of knowledge? this stream method is in which class? Thank you!
You can do it using Java Lambdas (requires JDK 8). Also notice that you can have words with equal frequency in your word list.
public class Main {
public static void main(String[] args) {
List<String> words = new ArrayList<>();
words.add("World");
words.add("Hello");
words.add("World");
words.add("Hello");
// Imagine we have 90000 words in word list
Set<Map.Entry<String, Integer>> set = words.stream()
// Here we create map of unique words and calculates their frequency
.collect(Collectors.toMap(word -> word, word -> 1, Integer::sum)).entrySet();
// Find the max frequency
int max = Collections
.max(set, (a, b) -> Integer.compare(a.getValue(), b.getValue())).getValue();
// We can have words with the same frequency like in my words list. Let's get them all
List<String> list = set.stream()
.filter(entry -> entry.getValue() == max)
.map(Map.Entry::getKey).collect(Collectors.toList());
System.out.println(list); // [Hello, World]
}
}
This can basically be broken down into two steps:
Compute the word frequencies, as a Map<String, Long>. There are several options for this, see this question for examples.
Computing the maximum entry of this map, where "maximum" refers to the entry with the highest value.
So if you're really up to it, you can write this very compactly:
private static <T> T maxCountElement(List<? extends T> list)
{
return Collections.max(list.stream().collect(Collectors.groupingBy(
Function.identity(), Collectors.counting())).entrySet(),
(e0, e1) -> Long.compare(e0.getValue(), e1.getValue())).getKey();
}
Edited in response to the comment:
The compact representation may not be the most readable. Breaking it down makes the code a bit elaborate, but may make clearer what is happening there:
private static <T> T maxCountElement(List<? extends T> list)
{
// A collector that receives the input elements, and converts them
// into a map. The key of the map is the input element. The value
// of the map is the number of occurrences of the element
Collector<T, ?, Map<T, Long>> collector =
Collectors.groupingBy(Function.identity(), Collectors.counting());
// Create the map and obtain its set of entries
Map<T, Long> map = list.stream().collect(collector);
Set<Entry<T, Long>> entrySet = map.entrySet();
// A comparator that compares two map entries based on their value
Comparator<Entry<T, Long>> comparator =
(e0, e1) -> Long.compare(e0.getValue(), e1.getValue());
// Compute the maximum element of the set of entries. That is,
// the entry with the largest value (which is the entry for the
// element with the maximum number of occurrences)
Entry<T, Long> entryWithMaxValue =
Collections.max(entrySet, comparator);
return entryWithMaxValue.getKey();
}
HashMap is one of the fastest data structures, just loop through each words, use it as key to the HashMap, inside the loop, make the counter the value of the hashMap.
HashMap<string, Integer> hashMapVariable = new HashMap<>();
...
//inside the loop of words
if (hashMapVariable.containsKey(word){
hashMapVariable.put(key, hashMapVariable.get(key) + 1);
} else {
hashMapVariable.put(word, 1);
}
...
for each key(word) just increment the value as associated with the key. although you have to check if the key exits ( in java its hashMapVariable.containsKey("key") ). if its exits then just increament else add it to the HashMap. by doing this you are not restoring the whole data you are only making every key just one and the number of times it occurs as value to the key.
At the end of the loop the most frequent word will have the highest counter/value.
you can use a HashMap and the key store word and the value is correspond times
pseudocode as below:
String demo(){
int maxFrequency = 0;
String maxFrequencyStr = "";
String strs[] ;
Map<String,Integer> map = new HashMap<String,Integer>();
for(int i = 0; i < 900000;i++){//for
if(map.containsKey(strs[i])){
int times = map.get(strs[i]);
map.put(strs[i], times+1);
if(maxFrequency<times+1){
maxFrequency = times + 1;
maxFrequencyStr = strs[i];
}
}
else{
map.put(strs[i], 1);
if(maxFrequency<1){
maxFrequency = 1;
maxFrequencyStr = strs[i];
}
}
}//for
return maxFrequencyStr;
}

Categories