Creating an anagram dictionary

Creating an anagram dictionary - java

I have to create an anagram dictionary using a hashtable. I take in a word from the user and have to output all the anagrams from that word from my anagram dictionary.
This is my current program, I'm creating a hash function which calculates a hash for each word, and words that are anagrams of eachother will have the same hash and be put in the same slot in the hashtable.
The part I'm having difficulty on is that when I create this map and perform my hashfunction on a user inputted word to get the index of the hashtable, how would I be able to return all the values that were at that index?
This is my code so far
fis = new FileInputStream(file);
BufferedReader br = new BufferedReader(new InputStreamReader(fis));
System.out.println("Total file size to read (in bytes) : " + fis.available());
String content = new String();
while ((content = br.readLine()) != null) {
singleAddress.add(content);
}
for(int i = 0; i<singleAddress.size(); i++)
{
char[] chars = singleAddress.get(i).toCharArray();
Arrays.sort(chars);
int hash = 0;
for(int j = 0; j<chars.length; j++)
{
hash = 2*hash + (int)chars[j];
}
numbers.put(singleAddress.get(i), hash);
System.out.println(hash + " " + i);
}
This I believe will create the anagram dictionary in the hashtable but I'm not sure how I would return all the values at a given index.

I'd use a Map<String, List<String> (or better a Google Guava Multimap<String, String>) and then apply your logic:
make a lower case version of your word
sort the characters for the lower case version to for a key
use that key to put the word into the map
When the user provides input you repeat steps 1 and 2 but use get(key) in step 3 and voilà you have your list of anagrams.
Example:
Word = Anna -> key = aann
User input = nana -> key = aann
Then you do dictionary.get("aann") and should get the list containing the element "Anna".
Edit: Issues with your code
You don't show the declaration of singleAddress and numbers but I assume it's a Set<String> and a Map<String, Integer>.
In numbers the key is the word and the value is the hash. You'd have to iterate over all entries in that map then in order to retrieve all with the same hash. Better swap it around.
The hash function might result in collisions, i.e. the same hash value for non-anagrams (as an example take "ac" and "ba", the hash for "ac" would be 2 * 64 + 66 = 194 and for "ba" it would be 2 * 65 + 64 = 194). That's why hash sets and maps in Java always use ´hashCode()_and_equals().hashCode()is used to get the bucket which is a list in the map whileequals()` is then used to check whether the keys are actually the same.

I would use a
Map<String, Collection<String>>
whose KEY would be "sorted String" for a particular word and its value would be a collection of all words the that can be made with key that are their in your dictionary
For e.g.
Key: EILNST
Value: [ELINTS, ENLIST, INLETS, LISTEN, SILENT, TINSEL]
So, in-case you want to search for word "Listen", sort the word and you will get all the anagrams for it and you have to exclude the word form the List retrieved.
Refer to solution:
Best algorithm to find anagram of word from dictonary

Related

counting number of occurrences of words in a text java

So I'm building a TreeMap from scratch and I'm trying to count the number of occurrences of every word in a text using Java. The text is read from a text file, but I can easily read it from there. I really don't know how to count every word, can someone help?
Imagine the text is something like:
Over time, computer engineers take advantage of each other's work and invent algorithms for new things. Algorithms combine with other algorithms to utilize the results of other algorithms, in turn producing results for even more algorithms.
Output:
Over 1
time 1
computer 1
algotitms 5
...
If possible I want to ignore if it's upper or lower case, I want to count them both together.
EDIT: I don't want to use any sort of Map (hashMap i.e.) or something similiar to do this.

Break down the problem as follows (this is one potential solution - not THE solution):
Split the text into words (create list or array or words).
Remove punctuation marks.
Create your map to collect results.
Iterate over your list of words and add "1" to the value of each encountered key
Display results (Iterate over the map's EntrySet)
Split the text into words
My preference is to split words by using space as a delimiter. The reason being is that, if you split using non-word characters, you may missed on some hyphenated words. I know that the use of hyphenation is being reduced, there are still plenty of words that fall under this rule; for example, middle-aged. If a word such as this is encountered, it MIGHT have to be treated as one word and not two.
Remove punctuation marks
Because of the decision above, you will need to first remove punctuation characters that might attached to your words. Keep in mind that if you use a regular expression to split the words, you might be able to accomplish this step at the same time you are doing the step above. In fact, that would be preferred so that you don't have to iterate over twice. Do both of these in a single pass. While you at it, call toLowerCase() on the input string to eliminate the ambiguity between capitalized words and lowercase words.
Create your map to collect results
This is where you are going to collect your count. Using the TreeMap implementation of the Java Map. One thing to be aware about this particular implementation is that the map is sorted according to the natural ordering of its keys. In this case, since the keys are the words from the inputted text, the keys will be arranged in alphabetical order, not by the magnitude of the count. IF sorting the entries by count is important, there is a technique where you can "reverse" the map and make the values the keys and the keys to values. However, since two or more words could have the same count, you will need to create a new map of <Integer, Set>, so that you can group together words with the same count.
Iterate over your list of words
At this point, you should have a list of words and a map structure to collect the count. Using a lambda expression, you should be able to perform a count() or your words very easily. But, if you are not familiarized or comfortable with Lambda expressions, you can use a regular looping structure to iterate over your list, do a containsKey() check to see if the word was encountered before, get() the value if the map already contains the word, and then add "1" to the previous value. Lastly, put() the new count in the map.
Display results
Again, you can use a Lambda Expression to print out the EntrySet key value pairs or simply iterate over the entry set to display the results.
Based on all of the above points, a potential solution should look like this (not using Lambda for the OPs sake)
public static void main(String[] args) {
String text = "Over time, computer engineers take advantage of each other's work and invent algorithms for new things. Algorithms combine with other algorithms to utilize the results of other algorithms, in turn producing results for even more algorithms.";
text = text.replaceAll("\\p{P}", ""); // replace all punctuations
text = text.toLowerCase(); // turn all words into lowercase
String[] wordArr = text.split(" "); // create list of words
Map<String, Integer> wordCount = new TreeMap<>();
// Collect the word count
for (String word : wordArr) {
if(!wordCount.containsKey(word)){
wordCount.put(word, 1);
} else {
int count = wordCount.get(word);
wordCount.put(word, count + 1);
}
}
Iterator<Entry<String, Integer>> iter = wordCount.entrySet().iterator();
System.out.println("Output: ");
while(iter.hasNext()) {
Entry<String, Integer> entry = iter.next();
System.out.println(entry.getKey() + ": " + entry.getValue());
}
}
This produces the following output
Output:
advantage: 1
algorithms: 5
and: 1
combine: 1
computer: 1
each: 1
engineers: 1
even: 1
for: 2
in: 1
invent: 1
more: 1
new: 1
of: 2
other: 2
others: 1
over: 1
producing: 1
results: 2
take: 1
the: 1
things: 1
time: 1
to: 1
turn: 1
utilize: 1
with: 1
work: 1
Why did I break down the problem like this for such mundane task? Simple. I believe each of those discrete steps should be extracted into functions to improve code reusability. Yes, it is cool to use a Lambda expression to do everything at once and make your code look much simplified. But what if you need to some intermediate step over and over? Most of the time, code is duplicated to accomplish this. In reality, often a better solution is to break these tasks into methods. Some of these tasks, like transforming the input text, can be done in a single method since that activity seems to be related in nature. (There is such a thing as a method doing "too little.")
public String[] createWordList(String text) {
return text.replaceAll("\\p{P}", "").toLowerCase().split(" ");
}
public Map<String, Integer> createWordCountMap(String[] wordArr) {
Map<String, Integer> wordCountMap = new TreeMap<>();
for (String word : wordArr) {
if(!wordCountMap.containsKey(word)){
wordCountMap.put(word, 1);
} else {
int count = wordCountMap.get(word);
wordCountMap.put(word, count + 1);
}
}
return wordCountMap;
}
String void displayCount(Map<String, Integer> wordCountMap) {
Iterator<Entry<String, Integer>> iter = wordCountMap.entrySet().iterator();
while(iter.hasNext()) {
Entry<String, Integer> entry = iter.next();
System.out.println(entry.getKey() + ": " + entry.getValue());
}
}
Now, after doing that, your main method looks more readable and your code is more reusable.
public static void main(String[] args) {
WordCount wc = new WordCount();
String text = "...";
String[] wordArr = wc.createWordList(text);
Map<String, Integer> wordCountMap = wc.createWordCountMap(wordArr);
wc.displayCount(wordCountMap);
}
UPDATE:
One small detail I forgot to mention is that, if instead of a TreeMap a HashMap is used, the output will come sorted by count value in descending order. This is because the hashing function will use value of the entry as the hash. Therefore, you won't need to "reverse" the map for this purpose. So, after switching to HashMap, the output should be as follows:
Output:
algorithms: 5
other: 2
for: 2
turn: 1
computer: 1
producing: 1
...

my suggestion is to use regexp and split and stream with grouping example 3
EX1 this solution does not use a collection LIST/MAP only array for me it is not optimal
#Test
public void testApp2() {
final String text = "Over time, computer engineers take advantage of each other's work and invent algorithms for new things. Algorithms combine with other algorithms to utilize the results of other algorithms, in turn producing results for even more algorithms.";
final String lowerText = text.toLowerCase();
final String[] split = lowerText.split("\\W+");
System.out.println("Output: ");
for (String s : split) {
if (s == null) {
continue;
}
int count = 0;
for (int i = 0; i < split.length; i++) {
final boolean sameWorld = s.equals(split[i]);
if (sameWorld) {
count = count + 1;
split[i] = null;
}
}
System.out.println(s + " " + count);
}
}
EX2 I think that's what you mean, but I'm not sure if I used too much for the list
#Test
public void testApp() {
final String text = "Over time, computer engineers take advantage of each other's work and invent algorithms for new things. Algorithms combine with other algorithms to utilize the results of other algorithms, in turn producing results for even more algorithms.";
final String[] split = text.split("\\W+");
final List<String> list = new ArrayList<>();
System.out.println("Output: ");
for (String s : split) {
if(!list.contains(s)){
list.add(s.toUpperCase());
final long count = Arrays.stream(split).filter(s::equalsIgnoreCase).count();
System.out.println(s+" "+count);
}
}
}
EX3 below is a test for your example but use MAP
#Test
public void test() {
final String text = "Over time, computer engineers take advantage of each other's work and invent algorithms for new things. Algorithms combine with other algorithms to utilize the results of other algorithms, in turn producing results for even more algorithms.";
Map<String, Long> result = Arrays.stream(text.split("\\W+")).collect(Collectors.groupingBy(String::toLowerCase, Collectors.counting()));
assertEquals(result.get("algorithms"), new Long(5));
System.out.println("Output: ");
result.entrySet().stream().forEach(x -> System.out.println(x.getKey() + " " + x.getValue()));
}

comparing Hashmaps by different String Keys

i have two HashMaps and want compare it as fast as possible but the problem is, the String of mapA consist of two words connected with a space. The String of mapB is only one word.
I dont want to count the occurences, that is already done, i want to compare the two diferent Strings
mapA:
key: hello world, value: 10
key: earth hi, value: 20
mapB:
key: hello, value: 5
key: world, value: 15
key: earth, value: 25
key: hi, value: 35
the first key of mapA should find key "hello" and key "world" from mapB
what i trying to do is parsing a long Text to find Co occurences and set a value how often they occur related to all words.
my first try:
for(String entry : mapA.keySet())
{
String key = (String) entry;
Integer mapAvalue = (Integer) mapA.get(entry);
Integer tokenVal1=0, tokenVal2=0;
String token1=key.substring(0, key.indexOf(" "));
String token2=key.substring(key.indexOf(" "),key.length()).trim();
for( String mapBentry : mapb.keySet())
{
String tokenkey = mapBentry;
if(tokenkey.equals(token1)){
tokenVal1=(Integer)tokens.get(tokenentry);
}
if(tokenkey.equals(token2)){
tokenVal2=(Integer)tokens.get(tokenentry);
}
if(token1!=null && token2!=null && tokenVal1>1000 && tokenVal2>1000 ){
**procedurecall(mapAvalue, token1, token2, tokenVal1, tokenVal2);**
}
}
}

You shouldn't iterate over a HashMap (O(n)) if you are just trying to find a particular key, that's what the HashMap lookup (O(1)) is used for. So eliminate your inner loop.
Also you can eliminate a few unnecessary variables in your code (e.g. key, tokenkey). You also don't need a third tokens map, you can put the token values in mapb.
for(String entry : mapA.keySet())
{
Integer mapAvalue = (Integer) mapA.get(entry);
String token1=entry.substring(0, entry.indexOf(" "));
String token2=entry.substring(entry.indexOf(" "),entry.length()).trim();
if(mapb.containsKey(token1) && mapb.containskey(token2))
{
// look up the tokens:
Integer tokenVal1=(Integer)mapb.get(token1);
Integer tokenVal2=(Integer)mapb.get(token2);
if(tokenVal1>1000 && tokenVal2>1000)
{
**procedurecall(mapAvalue, token1, token2, tokenVal1, tokenVal2);**
}
}

Assigning a character a random integer from a string

So I am trying to make an anagram tool in java in which you insert a word/string, and it spits an anagram for that word. There probably easier or better ways to do this than I am about to show, but I am still curious. Here's what I wanted to do:
Lets say the Word is: apple
What I wanted to do is assign each character from that string a randomInt(100). So lets say as an example
a - 35, p - 54, p - 98, l - 75, e - 13
After that, I would want my program to sort the numbers from least to greatest, and then print the "new" string with the number's assigned character,least to greatest. In My case, the anagram would be:
eaplp
All said and done, the place where I am stuck at is in how I can actually assign a character a random number from a string array, without actually changing that character into that number, and then printing that new modified string out like the way I said on top. Pseudocode or real code would be great.
Thanks

Use a TreeMap<Integer, Character>. Basic idea as follows:
TreeMap<Integer, Character> myMap = new TreeMap<Integer, Character>();
for (int i = 0; i < myString.length(); i++) {
myMap.put((int)(Math.random() * 100), myString.charAt(i));
}
for (Map.Entry<Integer, Character> entry : myMap.entrySet()) {
System.out.print(entry.getValue());
}
System.out.println();
A TreeMap automatically sorts the entries by key; thus, you don't have to perform a separate sort.
An easier way to code anagrams, though, is to convert the string to a character list, then use Collections.shuffle(). Basic idea:
List<Character> myLst = new ArrayList<Character>(myString.toCharArray());
Collections.shuffle(myLst);
for (Character c : myLst)
System.out.print(c);
System.out.println();
There may be some compile errors in the above; I wrote it without checking, but the process should work.

If you are using Java 8 a straightforward solution is a shuffled list of indices:
String word = "apple";
List<Integer> indices = IntStream.range(0, word.length()).collect(Collections.toList());
Collections.shuffle(indices);
indices.stream().mapToObj(word::charAt).forEach(System.out::print);
This can be done via an intermediate Map but it's a bit awkward and harder to follow:
Random random = new Random();
Map<Integer, Char> map = new TreeMap<>();
IntStream.range(0, word.length()).forEach(c -> map.put(random.nextInt(), c));
map.entrySet().stream().map(Map.Entry::getValue).forEach(System.out::print);
Or you can put it all in a single (hard to read) stream operation:
word.chars().boxed().collect(Collectors.toMap(random::nextInt, Function.identity()))
.entrySet().stream().sorted(Map.Entry.comparingByKey())
.map(e -> Character.toChars(e.getValue()))
.forEach(System.out::print);

Java program - Counts all the words from a text file, and counts frequency of each word

I'm a beginner programmer and I'm trying to do one program that opens a text file with a large text inside and then it counts how many words it contains.
Then it should write how many different words are in the text, and write the frecuency of each word in the text.
I had the intention to use one array-string to store all unique words and one int-string to store the frequency.
The program counts the words, but I'm a little bit unsure about how could I write the code correctly to get the list of the words and the frequency them are repeated in the text.
I wrote this:
import easyIO.*;
import java.util.*;
class Oblig3A{
public static void main(String[] args){
int cont = 0;
In read = new In (alice.txt);
In read2 = new In (alice.txt);
while(read.endOfFile() == false)
{
String info = read.inWord();
System.out.println(info);
cont = cont + 1;
}
System.out.println(UniqueWords);
final int AN_WORDS = cont;
String[] words = new String[AN_WORDS];
int[] frequency = new int[AN_WORDS];
int i = 0;
while(les2.endOfFile() == false){
word[i] = read2.inWord();
i = i + 1;
}
}
}

Ok, here is what you need to do:
1. Use a BufferedReader to read the lines of text from the file, one by one.
2. Create a HashMap<String,Integer> to store the word, frequency relations.
3. When you read each line of text, use split() to get all the words in the line of text in an array of String[]
4. Iterate over each word. For each word, retrieve the value from the HashTable. if you get a null value, you have found the word for the first time. Hence, create a new Integer with value 1 and place it back in the HashMap
If you get a non-null value, then increment the value and place it back in the HashMap.
5. Do this till you do not reach EOF.
Done !

You can use a
Map<String, Integer> map = HashMap<String, Integer>();
And then add the words to the map asking if the value is already there. If it is not, add it to the map with a counter initialized to 1.
if(!map.containsKey(word))
{
map.put(word, new Integer("1"));
}
else
{
map.put(word, map.get(word) + new Integer(1));
}
In the end you will have a map with all the words that the file contains and a Integer that represents how many times does the word appear in the text.

You basically need a hash here. In java , you can use a HashMap<String, Integer> which will store words and their frequency.
So when you read in a new word, check it up in the hashMap, say h, and if it exists , increase the frequency or add a new word with frequency = 1.

If you can use a library you may want to consider using a Guava Multiset, it has the counting functionality already built in:
public void count() throws IOException {
Multiset<String> countSet = HashMultiset.create();
BufferedReader bufferedReader = new BufferedReader(new FileReader("alice.txt"));
String line;
while ((line = bufferedReader.readLine()) != null) {
List<String> words = Arrays.asList(line.split("\\W+"));
countSet.addAll(words);
}
bufferedReader.close();
for (Entry<String> entry : countSet.entrySet()) {
System.out.println("word: " + entry.getElement() + " count: " + entry.getCount());
}
}

how to find duplicate and unique string entries using Hashtable

Assume I'm taking input a string from command line and I want to find the duplicate and unique entries in the string by using Hashtable.
eg:
i/p:
hi hello bye hi good hello name hi day hi
o/p:
Unique elements are: bye, good, name, day
Duplicate elements are:
hi 3 times
hello 2 times

You can break the input apart by calling split(" ") on the input String. This will return a String[] representing each word. Iterate over this array, and use each String as the key into your Hashtable, with the value being an Integer. Each time you encounter a word, either increment its value, or set the value to 0 if no value is currently there.
Hashtable<String, Integer> hashtable = new Hashtable<String, Integer>();
String[] splitInput = input.split(" ");
for(String inputToken : splitInput) {
Integer val = hashtable.get(inputToken);
if(val == null) {
val = new Integer(0);
}
++val;
hashtable.put(inputToken, val);
}
Also, you may want to look into HashMap rather than Hashtable. HashMap is not thread safe, but is faster. Hashtable is a bit slower, but is thread safe. If you are trying to do this in a single thread, I would recommend HashMap.

Use a hashtable with string as key and a numeric type as counter.
Go through all the words and if they are not in the map, insert them; otherwise increase the count (the data part of the hashtable).
hth
Mario

you can convert each string into an integer. Then, use the generated integer as the hash value. To convert string to int, you can treat it as a base 256 number and then convert it

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Creating an anagram dictionary - java

Related

counting number of occurrences of words in a text java

comparing Hashmaps by different String Keys

Assigning a character a random integer from a string

Java program - Counts all the words from a text file, and counts frequency of each word

how to find duplicate and unique string entries using Hashtable

Categories

Resources