Finding the 5 most common words in a text

Finding the 5 most common words in a text - java

I have a school task where a part of the task asks us to make a method that finds the 5 most commons words in a .txt file.
The task asks us to put all the words in an ArrayList, which i have already done. The real problem is making the program print out the top 5 words in the text file.
The only "clue" i have is the method name which is:
public Words[] common5(){
}

Iterate through the ArrayList, for each word in the list, put the word into a HashMap where the key is the word and the value is an Integer which you will increase every time you find the word again. At the end iterate through the HashSet and find the top 5 integers. Print those found words.

Related

compare one file with three other files

i write a bigger program in java for a poem analysis. Now i have one text file with a poem and three text files with words lists. I want to compare my poem with the three different word lists. So my program should say for example: this poem have three words from the list: MomentoMori, 0 words from the list vanitas and o words from the list carpe diem.
My Problem is: i know how to read files in java BUT i don´t know how to compare.
I thought that i should convert the text files into strings and then compare, but don´t know how? I did soemthing but it´s only compare the words of the first line from the poem with the first word from the word list.
Can anybody help me ? It´s only one percent from my program, but this step is very important.
Thank you all

How can I find if a word in a text file also present in another text file?

I have been asked to find whether any of the words in a text file are present in another text file. One of the files contains 10 random words on separate lines which form a 10x10 grid whereas the other file contains 2656 words on separate lines. I need to find if any of the words from the 2656 are a part of any of the 10. It is almost like a word search where I need to compare each of the 2656 words to every line and column of the 10 words in the other text file.
I know how to import the text files and use "try" and "catch" to do so. I am stuck from here. I know that I need to create two loops, one for the rows and one for the columns. I also need to convert each of the 10 lines of words into strings so I can use "String.contains(string)" to compare each of the 2656 words to the 10 lines of words to see if they match. The 10 words can be thought of a grid where it is 10x10.
An example of the 10 words might be:
fndgsdgawe
fjshellofj
fslkdfmkls
sfmkbyefkf
fsmflsfmkl
sfmJavadfm
smfjknmfkj
gjforloopj
mgfslgmlgs
gsnmgkjnsg
An example of the 2656 words might be:
Hello
Bye
Java
ForLoop
NestedLoop
As the output, I need to do it in the format of:
Hello: row 1, position 3
The "matching word" will be the word that matches in both files, the row number will correspond to the row of the 10x10 grid of words that it's found in and the same with the position which can be thought of as the column. Both the row and position start from index 0. I need to use trim() to remove trailing and leading spaces and all occurrences that happen more than once need to be output on a separate line. I am very new at coding and understand the logic behind working it out but I am unable to write it. Can you please help a beginner out? Thanks!

Printing match from tree

I am trying to create a "word completion" tree java program from a dictionary that is a text file but I am not sure where to go from here. The word completion program will match any words that start with the string entered. I am new to java/ programming. I have designed the tree as a multi way tree with each node storing a character as a letter and a boolean variable to indicate if it is the end of the word (amongst other things).
I am at the point where I am trying to see if my reading in of the file into the tree is working correct. However when I try to print my tree, it is not working correctly. It is not displaying the first letter correctly in every word after the first word. Instead of reading in from file, for testing purposes I am simply adding only 4 words to tree (Base, Basement, Ma, Matthew).
So my question is can anyone tell me why it is not printing correctly AND what I need to do next in order to finish the word completion?
Thank you so much in advance to everyone for taking the time to help me with my problem

it's this part
while(t!=null) {
if(t.down!=null && t.right!=null) {
//System.out.println(t.letter + " children");
//System.out.print(t.letter);
print(t.down);
}
t=t.right;
when you encounter another word you should print, you start it from t.down. You can for example, store all the letters up to that node on mutual stack, print them, and then proceed to printing other letters from tree.
Issue here is: t.down is next letter (from point of view of current node) in some other word.
Try adding more words with common starting substrings to understand my point easily.

guess words using dictionary

I am guessing the key of a less-simple simple substitution ciphertext. The rule that I evaluate the correctness of the key is number of english words in the putative decryption.
Are there any tools in java that can check the number of english words in a string. For example,
"thefoitedstateswasat"-> 4 words
"thefortedxyzstateswasathat"->5 words.
I loaded words list and using HashSet as a dictionay. As I dont know the inter-word spaces belong in the text, I can't validate words using simple dictionary.
Thanks.

I gave an answer to a similar question here:
If a word is made up of two valid words
It has some Java-esque pseudocode in it that might be adaptable into something that solves this problem.

Sorry I'm new and does not have the rep to comment yet.
But wouldn't the code be very slow as the number of checks and permutations is very big?
I guess you just have to brute force your way through by using (n-1) words nested for loop. And then search the dictionary for each substring.

Surely there's a better way to test the accuracy of your key?
But that's not the point, here's what I'd do:
Using "quackdogsomethinggodknowswhat"
I'd have a recursive method where starting at the beginning of the string, I'd call a recursive method for all the words with which the subject string starts, in this case "qua", and "quack" with the string not containing the word ("dogsomethinggodknowswhat" for quack). Return whatever is greater: 1 + the greatest value returned out of all your method calls OR 0 + the method call for the string starting at index 1 ("uackdogsomethinggodknowswhat").
This would probably work best if you kept your wordlist in a tree of some sort.
If you need some pseudocode, ask!

Find string section that contains another String, with possible intervening words

For the last project of the semester, the goal is to run searches of a particular phrase on a lyric String inside an Song object, then rank the results based on the length of the substring match. The lyrics were read from a file and match the line breaks in that file.
For example, searching for "She loves you" would return these in the sample matches:
The Beatles: "... She loves you, yeah, yeah, yeah ..." Rank= 13 characters
Bonnie Raitt: "... She just loves you ..." Rank= 18 characters
Elvis Presley: "... You're asking if she loves me\r\nWell, you don't know..." Rank= 23 characters
As you can see from the last example, matches can span multiple lines.
I have all the songs in a TreeMap<String, TreeSet<Song>>, so I get all the songs that match the first word in the query. The difficulty I'm having is searching the String for matches, since a regex won't work in this situation.
When the Song object is constructed, I dumped the lyrics into a Set to run searches for a single word, and to do that I used String.split("[^a-zA-Z}") to separate out the individual words and weed out the punctuation marks. So I want to run my search on that array. The process I'm using goes like:
break up the query into a String array
for each Song in the set
if (song.lyrics.contains(query)
great, break loop to next song
otherwise
int queryCounter=0;
find first index point in String array that matches query[queryCounter]
using that as the start point, iterate through the String array for matches
When the iteration is complete, a Rank object is created to hold the Song, search phrase, start point and end points of the array section that matches. In the Rank object is a method to count the number of characters and compensate for whitespace to calculate the rank. This is then inserted into a PriorityQueue, where the top ten matches will be pulled from the original matchSet.
The problem is that this doesn't prevent false positives, and match ranks can get skewed. For example, Aerosmith's Beyond Beautiful contains "... she loves me she loves you not ..." With my process, I will match "... she loves me she loves you not...", so instead of a rank of 13, I will get a rank of 27.
What changes are necessary for me to weed out the false positives and incorrect rankings?

I would like to add to what jjinguy said:
Basically, in the 'otherwise' block, after you find the first index that matches the start, you also have to look for possible other start points, and reset your start if you find another one
I would keep a list of all possible matches in a song, and finally use the one that has the best rank. Simply resetting the start point might not catch the match with the best rank.
Maybe that isn't the best way, but the concern is still there.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.