i need to compute the occurrence frequency of each word in a given document. Occurrence frequency is the number of times a word is found in a
document. my program will take file text as input
The basic functionality of computing word frequencies will be implemented using a linkedlist. Each node in the list contains a word, its frequency and a pointer to the next node. The
basic algorithm is as follows :
If the word is NOT found in the list
Add this word to the list
Set its frequency to 1
Else
Increment the word’s frequency by 1 //As this word already exists in the list
and this is my class : Word
public class Word {
private String word ;
private int frequency;
public Word() {
}
public String getWord() {
return word;
}
public void setWord(String word) {
this.word = word;
}
public int getFrequency() {
return frequency;
}
public void setFrequency(int frequency) {
this.frequency = frequency;
}
}
and this is Documentword.class
p
ublic class DocumentWords {
private LinkedList<Word> list = new LinkedList<>();
private Word word = new Word();
String test = " ";
Boolean flag = false ;
int count=0;
int c ;
File document = new File("E:\\Jooo\\Project 2 (All Files)\\Project 2 (All Files)\\aa.txt");
Scanner sc;
public DocumentWords() {
try {
this.sc = new Scanner(document);
} catch (FileNotFoundException ex) {
ex.printStackTrace();
}
while (sc.hasNext()) {
test = sc.next();
// System.out.println(test);
if (list.isEmpty()) {
word.setWord(test);
word.setFrequency(1);
list.add(word);
}
else {
System.out.println(list.size());
while(count!=list.size()){
if ((list.get(count).getWord()).equals(test)) {
flag = true;
c=count;
}
count++;
}
if(flag){
list.get(c).setFrequency((list.get(c).getFrequency())+1);
}
else {
word.setWord(test);
word.setFrequency(1);
list.add(word);
}
}
}
System.out.println("Word : "+ list.get(5).getWord()+"---Freq : "+list.get(5).getFrequency());
}
}
and this code cant run well
Related
I have a text file and need to build a concordance out of it. I believe I need a method to update my line number and word count in my WordCount class, but I have trouble on how to do it. I know the method should be of type void since it just updates, and doesn't return any value. But I'm stuck on what to write. I've put a comment in the tester class on where I think this method should go. provided below are my tester, circularlist, and wordcount class. I appreciate any help on this, thanks.
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class Tester
{
public static final int WordsPerLine = 10;
public static void main() throws FileNotFoundException
{
//build then output hash table
HashTable ht = new HashTable();
System.out.println(ht.toString());
String word; //read from input file
WordCount wordToFind; //search for this in the bst
WordCount wordInTree; //found in the bst
//create generic BST, of WordCount here
BSTree<WordCount> t = new BSTree<WordCount>();
//want to read word at a time from input file
Scanner wordsIn = new Scanner(new File("Hamlet.txt"));
wordsIn.useDelimiter("[^A-Za-z']+");
int wordCount = 0;
int lineNum = 1;
System.out.printf("%3d: ", lineNum);
while (wordsIn.hasNext()) {
word = wordsIn.next();
++wordCount;
System.out.print(word + " ");
word = word.toLowerCase();
if(t.find(new WordCount(word)) != null){
wordToFind= new WordCount(word);
wordInTree= t.find(wordToFind);
//I need to have a method here that update word count and line number
}
if (wordCount % WordsPerLine == 0) {
++lineNum;
System.out.printf("\n%3d: ", lineNum);
}
}
//EOF
System.out.println();
//print bst in alpha order
System.out.println(t.toString());
}
}
public class WordCount implements Comparable<WordCount>
{
protected String word;
protected int count;
protected CircularList lineNums;
//required for class to compile
public int compareTo(WordCount other)
{
return word.compareTo(other.word);
}
{
word = "";
count = 0;
lineNums= new CircularList();
}
public WordCount(String w)
{
word = w;
count = 0;
lineNums= new CircularList();
}
public String toString()
{
return String.format("%-12s %3d %3d", word, count, lineNums);
}
}
public class CircularList
{
private Item list;
public CircularList()
{
list = null;
}
public Boolean isEmpty()
{
return list == null;
}
public void append(int x)
{
Item r = new Item(x);
if (isEmpty()) {
r.next = r;
}
else {
r.next = list.next;
list.next = r;
}
list = r;
}
public int nextLine(int x)
{
Item r= new Item(x);
if (!isEmpty()) {
r = list.next;
while (r != list) {
r = r.next;
}
//append last item
}
return r.info;
}
public String toString()
{
StringBuilder s = new StringBuilder("");
if (!isEmpty()) {
Item r = list.next;
while (r != list) {
s.append(r.info + ", ");
r = r.next;
}
//append last item
s.append(r.info);
}
return s.toString();
}
}
Instead, I suggest you focus now on putting WordCount objects into the bst. Than you'll have something that will print out.
So, to put WordCount objects into the bst, in pseudocode, I suggest doing:
create a new WordCount object to find. Set this to the word, count of 1, for lineNums create a new circular list object and use CL::append() to add the lineNum to it
try to find this in the tree
if word is found in the tree
//I need to have a method here that update word count and line number <- don't worry about this bit until you've got words into the tree
else // word is not found in bst, so insert it
use insertBST() to insert the new WordCount object into the tree
Once you get some data into the tree, then it will print out after the wordsIn.hasNext() while loop, at the t.toString().
I read words from a text file and then create a new Word object for each word and store the objects into an ArrayList. The text of the word is passed into the object as a parameter. I have overridden the equals(Object) and hashCode() method of the word class to check for the equality of objects based on text of a word instead of object memory location. I am trying to store all unique words in ArrayList as unique objects and increment the occurrence of the word object if the word repeats in the text file.
Scanner file = new Scanner(new File(textfile));
ArrayList<Word> words = new ArrayList<Word>();
while (file.hasNext()) {
Word w = new Word(fileWord);
if (words.contains(w)) {
w.increaseCount();
} else {
words.add(w);
}
}
Word Class is;
public class Word {
private String text;
private int count;
public Word(String wordText) {
text = wordText;
}
public void increaseCount() {
count += 1;
}
#Override
public boolean equals(Object wordToCompare) {
if (wordToCompare instanceof Word) {
Word castedWord = (Word) wordToCompare;
if (castedWord.text.equals(this.text)) {
return true;
}
}
return false;
}
#Override
public int hashCode() {
return text.hashCode();
}
}
Unique words get added to the ArrayList, but my count does not increment. How to increment the count
The problem is with this statement in your code;
while (file.hasNext()) {
Word w = new Word(fileWord);
if (words.contains(w)) {
w.increaseCount(); // Here's what goes wrong.
} else {
words.add(w);
}
}
You're invoking the function increaseCount() on newly created object and that would get replaced during the next iteration, and you lost the reference. But the actual object is in the ArrayList and you should increase the value of that object. So, I would say, your code should be changed like this;
Scanner file = new Scanner(new File(textfile));
ArrayList<Word> words = new ArrayList<Word>();
while (file.hasNext()) {
Word w = new Word(fileWord);
if (words.contains(w)) {
words.get(words.indexOf(w)).increaseCount(); // Note the change here.
} else {
w.increaseCount(); // This is for the first occurrence as 'count' is 0 initially.
words.add(w);
}
}
The problem is that you create new instance of Word in the loop.
When the array contains the newly created Word, you increase the count for it, not the existing instance which already added to the array before.
Consider to use Map for the problem, the key is the word and the value is the count.
package example.stackoverflow;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class WordCount {
public static void main(String[] args) {
List<String> sourceList = Arrays.asList("ABC", "XYZ", "HGK", "ABC", "PWT", "HGK", "ABC");
Map<String, Integer> wordCount = new HashMap();
for (String word : sourceList) {
if (wordCount.get(word) != null) {
wordCount.put(word, wordCount.get(word) +1);
} else {
wordCount.put(word, 1);
}
}
System.out.println(wordCount);//output: {ABC=3, XYZ=1, PWT=1, HGK=2}
}
}
Check this answer:
Scanner file = new Scanner(new File(textfile));
ArrayList<Word> words = new ArrayList<Word>();
while (file.hasNext()) {
Word w = new Word(fileWord);
if (words.contains(w)) {
w.increaseCount();
int index = words.indexOf(w);
Word w1 = words.get(index);
w1.increaseCount();
words.set(index, w1);
} else {
words.add(w);
}
}
I need to store only numbers that have 8 digits and not a word into an array, and if it is not then just to print it to the console. Once in the array i have to sort them and print them into the right side and the left side has the unsorted list.
So I am stuck at a file with commas it only works when it has not commas or space. I am supposed to use the method "compareTo" and the "StringTokenizer" I know how they both work but just does not do what i want, maybe I am putting it in the wrong function. I also need to separate this file and put in a separate file the GUI functions not sure what to put on that file.
public class Project1 {
static final int LIST_SIZE = 10;
static int ssnSize;
static String line;
static String[] ssnList;
static TextFileInput inFile;
static String inFileName = "Dates.txt"; //save the file in Lab12 folder on BB in your project folder
static JFrame myFrame;
static Container myContentPane;
static TextArea left, right;
public static void main(String[] args) {
initialize();
readNumbersFromFile(inFileName);
printSSNList(ssnList,ssnSize);
printSSNtoJFrame(myFrame,ssnSize);
}
public static void initialize() {
inFile = new TextFileInput(inFileName);
ssnList= new String[LIST_SIZE];
ssnSize=0;
line ="";
left = new TextArea();
right = new TextArea();
myFrame = new JFrame();
myFrame.setSize(400, 400);
myFrame.setLocation(200, 200);
myFrame.setTitle("");
myFrame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
}
public static void readNumbersFromFile(String fileName)
{
String ssn;
ssn = inFile.readLine();
while (ssn != null) {
assert (isValidDate(ssn)): "SSN not valid";
if (!isValidDate(ssn))
throw new IllegalArgumentException("Invalid SSN");
else
storeDates(ssn,ssnList);
ssn = inFile.readLine();
} //while
} //readSSNsFromFile
public static void printSSNList(String[] list, int size)
{
assert (isValidList(list)): "The array is not valid";
if (!isValidList(list)){
throw new IllegalArgumentException("Invalid list)");
}
for (int i=0;i<size;i++)
if (!isValidDate(list[i]))
System.out.println("Invalid SSN: "+list[i]);
else
System.out.println(list[i]);
}
public static void storeDates(String s, String[] list)
{
assert (isValidDate(s)): "The SSN is not valid";
assert (isValidList(list)): "The array is not valid";
if (isValidDate(s) && isValidList(list))
list[ssnSize++]=s;
assert (isValidList(list)):"Resulting list not valid";
}
public static void printSSNtoJFrame(JFrame jf, int size)
{
assert (isValidList(ssnList)): "The array is not valid";
if (!isValidList(ssnList)){
throw new IllegalArgumentException("Invalid list)");
}
jf.setLayout(new GridLayout(1, 2));
myContentPane = jf.getContentPane();
TextArea myLeftArea = new TextArea();
TextArea myRightTextArea = new TextArea();
myContentPane.add(myLeftArea);
myContentPane.add(myRightTextArea);
for (int i=0;i<size;i++)
{
if (!isValidDate(ssnList[i]))
myLeftArea.append("Invalid SSN: "+ssnList[i]+"\n");
else
{
myLeftArea.append(ssnList[i]+"\n");
}
}
sortOnlyNumbers(ssnList);
for(int j=0; j< size; j++)
{
myRightTextArea.append(ssnList[j]+"\n");
}
jf.setVisible(true);
}
private static void sortOnlyNumbers(String[] array)
{
List<Integer> indexes = new ArrayList<Integer>();
List<Integer> numbers = new ArrayList<Integer>();
for (int i = 0; i < array.length; i++) {
try {
numbers.add(Integer.parseInt(array[i]));
indexes.add(i);
} catch (NumberFormatException e) {
// don't care
}
}
Collections.sort(numbers, Collections.reverseOrder());
for (int i = 0; i < numbers.size(); i++) {
array[indexes.get(i)] = String.valueOf(numbers.get(i));
}
}
public static boolean isValidDate(String s)
{
if (s.length() != 8) {
throw new IllegalArgumentException("An SSN length must be 9");
}
for (int i=0;i<8;i++)
if (! Character.isDigit(s.charAt(i))) {
throw new IllegalArgumentException("SSN must have only digits.");
}
return (true);
}
public static boolean isValidList(String[] list)
{
if (list == null){
return false;
}
if (ssnSize == list.length){
return false;
}
return (true);
}
}
the text file has the following:
20161001
20080912,20131120,19980927, \n
20020202,hello
20120104
You can use a regular expression to perform this. An appropriate one for your requirement would be:
(\d{8})
This regex matches groups of 8 consecutive digits in your input data.
I tested this using the snippet below and was able to retrieve all 8-digit numbers from your input string.
public class Snippet {
public static void main(String[] args) {
String input = "20161001 20080912,20131120,19980927, \n 20020202,hello 20120104";
List<String> matches = get8DigitNumbersOnly(input);
System.out.println(matches);
}
public static List<String> get8DigitNumbersOnly(String inputData) {
Pattern pattern = Pattern.compile("(\\d{8})"); // This is the regex.
Matcher matcher = pattern.matcher(inputData);
List<String> matches = new ArrayList<String>();
while(matcher.find()) {
String match = matcher.group();
matches.add(match);
}
return matches;
}
}
Give it as shot. Hope this helps!
I've gotten to a point where I can read the file and output the actual text in the file but i'm not quite sure on how to proceed with searching for a specific word and displaying the word count.
There are many ways. If you're reading the file line-by-line, you can using the method indexOf on the String class to search each line for the text. You'd need to call it repeatedly to move through the line looking for additional occurrences.
See documentation on indexOf at:
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#indexOf(java.lang.String,%20int)
As I understand your question, if you are reading the text line after line you can use recursivity to count how many occurences of the word appear in the same line:
The following method counts how many times a word appears in the same line
private static int numberOfLineOccurences;
public static int countNumberOfTimesInALine(String line, String word) {
if (line.indexOf(word) == -1) {
return numberOfLineOccurences;
} else {
numberOfLineOccurences++;
if (line.indexOf(word) + word.length() > line.length() -1 ) {
return numberOfLineOccurences;
}
return countNumberOfTimesInALine(
line.substring(line.indexOf(word) + word.length()), word );
}
}
In order to keep track of the first occurence of my word in my file along with the number of occurence I created a WordInfo class like this:
class WordInfo {
private int firstOccurenceLineNumber;
private int firstOccurenceColumnNumber;
private String word;
private int numberOfOccurences;
public String getWord() {
return word;
}
public int getNumberOfOccurences() {
return numberOfOccurences;
}
public WordInfo(String word) {
this.word = word;
}
public void upOccurrence() {
numberOfOccurences++;
}
public void upOccurrence(int numberOfTimes) {
numberOfOccurences+= numberOfTimes;
}
public int getFirstOccurenceLineNumber() {
return firstOccurenceLineNumber;
}
public void setFirstOccurenceLineNumber(int firstOccurenceLineNumber) {
this.firstOccurenceLineNumber = firstOccurenceLineNumber;
}
public int getFirstOccurenceColumnNumber() {
return firstOccurenceColumnNumber;
}
public void setFirstOccurenceColumnNumber(int firstOccurenceColumnNumber) {
this.firstOccurenceColumnNumber = firstOccurenceColumnNumber;
}
}
Now I can create my searchWord method. I give him the word to look for, the fileName and a WordInfo object to fill as input parameters
public static boolean searchWord(String word, String filePath, WordInfo wInfo) throws IOException {
boolean result = false;
boolean firstOccurenceFound = false;
int lineNumber = 0;
BufferedReader reader = new BufferedReader(new FileReader(new File(filePath)));
String line = null;
while ( (line = reader.readLine()) != null) {
lineNumber++;
numberOfLineOccurences= 0;
if (line.indexOf(word) != -1) {
if (!result) {
result = true;
}
if (!firstOccurenceFound) {
firstOccurenceFound = true;
wInfo.setFirstOccurenceLineNumber(lineNumber);
wInfo.setFirstOccurenceColumnNumber(line.indexOf(word) + 1);
}
wInfo.upOccurrence(countNumberOfTimesInALine(line, word));
}
}
reader.close();
return result;
}
Here is an illustration and the result below
I have the following content in a file called DemoFile.txt
And I test the code using the following main method (I am looking for the word conceptfor example):
public static void main(String[] args) throws IOException {
WordInfo wInfo = new WordInfo("concept");
if ( searchWord("concept", FILE_PATH, wInfo)) {
System.out.println("Searching for " + wInfo.getWord());
System.out.println("First line where found : " + wInfo.getFirstOccurenceLineNumber());
System.out.println("First column found: " + wInfo.getFirstOccurenceColumnNumber());
System.out.println("Number of occurrences " + wInfo.getNumberOfOccurences());
}
}
And I obtain the following results:
the repeated items in the text file should not be added to the list but this program is outputing every word from the list dont know why hasElement method is not working.I need to create an program object which should be called MTFencoder.java and it should accept the name of a text file as a command-line argument such that if there exists a text file called story.txt then your program could be invoked with the following command:
java MTFencoder test.txt
It should produce one line of output for each word of the input file, such that when a word is first encountered then the output is:
0 word
and if the word has been encountered before then the output is a single integer specifying the index of that word in a list of known words ordered according to the most recently used (MRU order).
import java.util.*;
import java.io.*;
class extmycase
{
public static void main(String [] args)
{
Scanner scan=null;
Scanner scan1=null;
wordlist word=null;
String s;
int count=0;
try
{
scan=new Scanner(new File(args[0]));
scan1=new Scanner(new File(args[0]));
while(scan1.hasNext())
{
scan1.next();
count++;
}
System.out.println("No.of words : " + count);
word = new wordlist(count);
while(scan.hasNext())
{
s=scan.next();
if(word.hasElement(s)==true)
{
System.out.println("has element");
}
else
{
word.add(s);
}
}
word.showlist();
}
catch(Exception e)
{
System.err.println("unable to read from file");
}
finally
{
// Close the stream
if(scan != null)
{
scan.close( );
}
if(scan1 !=null)
{
scan1.close();
}
}
}
}
the wordlist program is
import java.lang.*;
import java.util.*;
public class wordlist
{
public String data [];
private int count;
private int MAX;
public wordlist(int n)
{
MAX = n;
data = new String[MAX];
count = 0;
}
// Adds x to the set if it is not already there
public void add(String x)
{
if (count<MAX)
{
data[count++] = x;
}
}
// Removes x from set by replacing with last item and reducing size
public void replace(String x)
{
for(int i=0;i<count;i++)
{
if(data[i]==x)
{
data[count]=data[i];
for(int j=i;j<count;j++)
data[j]=data[j+1];
}
}
}
// Checks if value x is a member of the set
public boolean hasElement(String x)
{
for(int i=0;i<=count;i++)
{
if(data[i].equals(x))
{
return true;
}
}
return false;
}
public int findIndex(String x)
{
for(int i=0;i<=count;i++)
{
if(data[i].equals(x))
{
return i;
}
}
return 0;
}
public void showlist()
{
int l=0;
for(int i=0;i<count;i++)
{
System.out.println(data[i]);
l++;
}
System.out.println(l);
}
}
Your wordlist will never contain any elements. It is constructed, which sets everything to 0, and then you see whether it contains words, with of course it will never do. Also, both scanners point to the same file, so every word that exists from one will have to exist in the other, and all words will be found making this semi-redundant in the first place.