Creating a concordance from a text file

Creating a concordance from a text file - java

I have a text file and need to build a concordance out of it. I believe I need a method to update my line number and word count in my WordCount class, but I have trouble on how to do it. I know the method should be of type void since it just updates, and doesn't return any value. But I'm stuck on what to write. I've put a comment in the tester class on where I think this method should go. provided below are my tester, circularlist, and wordcount class. I appreciate any help on this, thanks.
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class Tester
{
public static final int WordsPerLine = 10;
public static void main() throws FileNotFoundException
{
//build then output hash table
HashTable ht = new HashTable();
System.out.println(ht.toString());
String word; //read from input file
WordCount wordToFind; //search for this in the bst
WordCount wordInTree; //found in the bst
//create generic BST, of WordCount here
BSTree<WordCount> t = new BSTree<WordCount>();
//want to read word at a time from input file
Scanner wordsIn = new Scanner(new File("Hamlet.txt"));
wordsIn.useDelimiter("[^A-Za-z']+");
int wordCount = 0;
int lineNum = 1;
System.out.printf("%3d: ", lineNum);
while (wordsIn.hasNext()) {
word = wordsIn.next();
++wordCount;
System.out.print(word + " ");
word = word.toLowerCase();
if(t.find(new WordCount(word)) != null){
wordToFind= new WordCount(word);
wordInTree= t.find(wordToFind);
//I need to have a method here that update word count and line number
}
if (wordCount % WordsPerLine == 0) {
++lineNum;
System.out.printf("\n%3d: ", lineNum);
}
}
//EOF
System.out.println();
//print bst in alpha order
System.out.println(t.toString());
}
}
public class WordCount implements Comparable<WordCount>
{
protected String word;
protected int count;
protected CircularList lineNums;
//required for class to compile
public int compareTo(WordCount other)
{
return word.compareTo(other.word);
}
{
word = "";
count = 0;
lineNums= new CircularList();
}
public WordCount(String w)
{
word = w;
count = 0;
lineNums= new CircularList();
}
public String toString()
{
return String.format("%-12s %3d %3d", word, count, lineNums);
}
}
public class CircularList
{
private Item list;
public CircularList()
{
list = null;
}
public Boolean isEmpty()
{
return list == null;
}
public void append(int x)
{
Item r = new Item(x);
if (isEmpty()) {
r.next = r;
}
else {
r.next = list.next;
list.next = r;
}
list = r;
}
public int nextLine(int x)
{
Item r= new Item(x);
if (!isEmpty()) {
r = list.next;
while (r != list) {
r = r.next;
}
//append last item
}
return r.info;
}
public String toString()
{
StringBuilder s = new StringBuilder("");
if (!isEmpty()) {
Item r = list.next;
while (r != list) {
s.append(r.info + ", ");
r = r.next;
}
//append last item
s.append(r.info);
}
return s.toString();
}
}

Instead, I suggest you focus now on putting WordCount objects into the bst. Than you'll have something that will print out.
So, to put WordCount objects into the bst, in pseudocode, I suggest doing:
create a new WordCount object to find. Set this to the word, count of 1, for lineNums create a new circular list object and use CL::append() to add the lineNum to it
try to find this in the tree
if word is found in the tree
//I need to have a method here that update word count and line number <- don't worry about this bit until you've got words into the tree
else // word is not found in bst, so insert it
use insertBST() to insert the new WordCount object into the tree
Once you get some data into the tree, then it will print out after the wordsIn.hasNext() while loop, at the t.toString().

Related

Increment count if the object exists in arraylist, else Add the object into arraylist

I read words from a text file and then create a new Word object for each word and store the objects into an ArrayList. The text of the word is passed into the object as a parameter. I have overridden the equals(Object) and hashCode() method of the word class to check for the equality of objects based on text of a word instead of object memory location. I am trying to store all unique words in ArrayList as unique objects and increment the occurrence of the word object if the word repeats in the text file.
Scanner file = new Scanner(new File(textfile));
ArrayList<Word> words = new ArrayList<Word>();
while (file.hasNext()) {
Word w = new Word(fileWord);
if (words.contains(w)) {
w.increaseCount();
} else {
words.add(w);
}
}
Word Class is;
public class Word {
private String text;
private int count;
public Word(String wordText) {
text = wordText;
}
public void increaseCount() {
count += 1;
}
#Override
public boolean equals(Object wordToCompare) {
if (wordToCompare instanceof Word) {
Word castedWord = (Word) wordToCompare;
if (castedWord.text.equals(this.text)) {
return true;
}
}
return false;
}
#Override
public int hashCode() {
return text.hashCode();
}
}
Unique words get added to the ArrayList, but my count does not increment. How to increment the count

The problem is with this statement in your code;
while (file.hasNext()) {
Word w = new Word(fileWord);
if (words.contains(w)) {
w.increaseCount(); // Here's what goes wrong.
} else {
words.add(w);
}
}
You're invoking the function increaseCount() on newly created object and that would get replaced during the next iteration, and you lost the reference. But the actual object is in the ArrayList and you should increase the value of that object. So, I would say, your code should be changed like this;
Scanner file = new Scanner(new File(textfile));
ArrayList<Word> words = new ArrayList<Word>();
while (file.hasNext()) {
Word w = new Word(fileWord);
if (words.contains(w)) {
words.get(words.indexOf(w)).increaseCount(); // Note the change here.
} else {
w.increaseCount(); // This is for the first occurrence as 'count' is 0 initially.
words.add(w);
}
}

The problem is that you create new instance of Word in the loop.
When the array contains the newly created Word, you increase the count for it, not the existing instance which already added to the array before.
Consider to use Map for the problem, the key is the word and the value is the count.
package example.stackoverflow;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class WordCount {
public static void main(String[] args) {
List<String> sourceList = Arrays.asList("ABC", "XYZ", "HGK", "ABC", "PWT", "HGK", "ABC");
Map<String, Integer> wordCount = new HashMap();
for (String word : sourceList) {
if (wordCount.get(word) != null) {
wordCount.put(word, wordCount.get(word) +1);
} else {
wordCount.put(word, 1);
}
}
System.out.println(wordCount);//output: {ABC=3, XYZ=1, PWT=1, HGK=2}
}
}

Check this answer:
Scanner file = new Scanner(new File(textfile));
ArrayList<Word> words = new ArrayList<Word>();
while (file.hasNext()) {
Word w = new Word(fileWord);
if (words.contains(w)) {
w.increaseCount();
int index = words.indexOf(w);
Word w1 = words.get(index);
w1.increaseCount();
words.set(index, w1);
} else {
words.add(w);
}
}

Word Frequency Counter in my code error

i need to compute the occurrence frequency of each word in a given document. Occurrence frequency is the number of times a word is found in a
document. my program will take file text as input
The basic functionality of computing word frequencies will be implemented using a linkedlist. Each node in the list contains a word, its frequency and a pointer to the next node. The
basic algorithm is as follows :
If the word is NOT found in the list
Add this word to the list
Set its frequency to 1
Else
Increment the word’s frequency by 1 //As this word already exists in the list
and this is my class : Word
public class Word {
private String word ;
private int frequency;
public Word() {
}
public String getWord() {
return word;
}
public void setWord(String word) {
this.word = word;
}
public int getFrequency() {
return frequency;
}
public void setFrequency(int frequency) {
this.frequency = frequency;
}
}
and this is Documentword.class
p
ublic class DocumentWords {
private LinkedList<Word> list = new LinkedList<>();
private Word word = new Word();
String test = " ";
Boolean flag = false ;
int count=0;
int c ;
File document = new File("E:\\Jooo\\Project 2 (All Files)\\Project 2 (All Files)\\aa.txt");
Scanner sc;
public DocumentWords() {
try {
this.sc = new Scanner(document);
} catch (FileNotFoundException ex) {
ex.printStackTrace();
}
while (sc.hasNext()) {
test = sc.next();
// System.out.println(test);
if (list.isEmpty()) {
word.setWord(test);
word.setFrequency(1);
list.add(word);
}
else {
System.out.println(list.size());
while(count!=list.size()){
if ((list.get(count).getWord()).equals(test)) {
flag = true;
c=count;
}
count++;
}
if(flag){
list.get(c).setFrequency((list.get(c).getFrequency())+1);
}
else {
word.setWord(test);
word.setFrequency(1);
list.add(word);
}
}
}
System.out.println("Word : "+ list.get(5).getWord()+"---Freq : "+list.get(5).getFrequency());
}
}
and this code cant run well

How would I search for a user determined word and count the occurrences in a text file using java?

I've gotten to a point where I can read the file and output the actual text in the file but i'm not quite sure on how to proceed with searching for a specific word and displaying the word count.

There are many ways. If you're reading the file line-by-line, you can using the method indexOf on the String class to search each line for the text. You'd need to call it repeatedly to move through the line looking for additional occurrences.
See documentation on indexOf at:
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#indexOf(java.lang.String,%20int)

As I understand your question, if you are reading the text line after line you can use recursivity to count how many occurences of the word appear in the same line:
The following method counts how many times a word appears in the same line
private static int numberOfLineOccurences;
public static int countNumberOfTimesInALine(String line, String word) {
if (line.indexOf(word) == -1) {
return numberOfLineOccurences;
} else {
numberOfLineOccurences++;
if (line.indexOf(word) + word.length() > line.length() -1 ) {
return numberOfLineOccurences;
}
return countNumberOfTimesInALine(
line.substring(line.indexOf(word) + word.length()), word );
}
}
In order to keep track of the first occurence of my word in my file along with the number of occurence I created a WordInfo class like this:
class WordInfo {
private int firstOccurenceLineNumber;
private int firstOccurenceColumnNumber;
private String word;
private int numberOfOccurences;
public String getWord() {
return word;
}
public int getNumberOfOccurences() {
return numberOfOccurences;
}
public WordInfo(String word) {
this.word = word;
}
public void upOccurrence() {
numberOfOccurences++;
}
public void upOccurrence(int numberOfTimes) {
numberOfOccurences+= numberOfTimes;
}
public int getFirstOccurenceLineNumber() {
return firstOccurenceLineNumber;
}
public void setFirstOccurenceLineNumber(int firstOccurenceLineNumber) {
this.firstOccurenceLineNumber = firstOccurenceLineNumber;
}
public int getFirstOccurenceColumnNumber() {
return firstOccurenceColumnNumber;
}
public void setFirstOccurenceColumnNumber(int firstOccurenceColumnNumber) {
this.firstOccurenceColumnNumber = firstOccurenceColumnNumber;
}
}
Now I can create my searchWord method. I give him the word to look for, the fileName and a WordInfo object to fill as input parameters
public static boolean searchWord(String word, String filePath, WordInfo wInfo) throws IOException {
boolean result = false;
boolean firstOccurenceFound = false;
int lineNumber = 0;
BufferedReader reader = new BufferedReader(new FileReader(new File(filePath)));
String line = null;
while ( (line = reader.readLine()) != null) {
lineNumber++;
numberOfLineOccurences= 0;
if (line.indexOf(word) != -1) {
if (!result) {
result = true;
}
if (!firstOccurenceFound) {
firstOccurenceFound = true;
wInfo.setFirstOccurenceLineNumber(lineNumber);
wInfo.setFirstOccurenceColumnNumber(line.indexOf(word) + 1);
}
wInfo.upOccurrence(countNumberOfTimesInALine(line, word));
}
}
reader.close();
return result;
}
Here is an illustration and the result below
I have the following content in a file called DemoFile.txt
And I test the code using the following main method (I am looking for the word conceptfor example):
public static void main(String[] args) throws IOException {
WordInfo wInfo = new WordInfo("concept");
if ( searchWord("concept", FILE_PATH, wInfo)) {
System.out.println("Searching for " + wInfo.getWord());
System.out.println("First line where found : " + wInfo.getFirstOccurenceLineNumber());
System.out.println("First column found: " + wInfo.getFirstOccurenceColumnNumber());
System.out.println("Number of occurrences " + wInfo.getNumberOfOccurences());
}
}
And I obtain the following results:

Filtering/sorting a collection through object fields?

I'm not sure why this isn't working. I'm not sure if it's a problem with the printing, or if it's a problem with the methods themselves.
I am making a program that takes a collection of songs and filters or sorts it according to a given user input. The user should be able to input multiple commands to further narrow down the list.
My filterRank and filterYear methods work perfectly fine, but the other methods end up printing a seemingly random selection of songs that do not change regardless of what is inputted as the title or artist to be filtered by, which generally appears only after an extremely long waiting period and a long series of spaces.
Even after this amalgam of songs is printed, the program does not terminate, and periodically outputs a space in the console, as in a System.out.println() statement were being continuously run.
If I remove the code that configures the output file, which is a requirement for the project, the methods fail to print entirely. Regardless of either of these changes, filterRank and filterYear continue to work perfectly.
This problem also occurs with my sort methods. No matter what sort method I run, it still prints out the spaces and the random songs, or nothing at all.
Is there something I'm missing? I've tried printing out variables and strategically inserting System.out.println("test") in my program to determine what the program is, but it seems as though it's parsing the input correctly, and the methods are indeed being successfully run.
I've been otherwise unable to isolate the problem.
Can I get assistance in determining what I'm missing? Despite poring over my code for two hours, I just can't figure out what the logical error on my part is.
Here is the relevant code:
The main class:
public static void main(String[] args) throws FileNotFoundException, IOException{
//user greeting statements and instructions
//scanning file, ArrayList declaration
Scanner input = new Scanner(System.in);
while (input.hasNextLine()) {
int n = 0;
SongCollection collection = new SongCollection(songs);
String inputType = input.nextLine();
String delims = "[ ]";
String[] tokens = inputType.split(delims);
for (int i = 0; i < tokens.length; i++) {
n = 0;
if (n == 0) {
if ((tokens[i]).contains("year:")) {
collection.filterYear(Range.parse(tokens[i]));
n = 1;
}// end of year loop
if ((tokens[i]).contains("rank:")) {
collection.filterRank(Range.parse(tokens[i]));
n = 1;
}// end of rank
if ((tokens[i]).contains("artist:")) {
collection.filterArtist(tokens[i]);
n = 1;
}// end of artist
if ((tokens[i]).contains("title:")) {
collection.filterTitle(tokens[i]);
n = 1;
}// end of title
if ((tokens[i]).contains("sort:")) {
if ((tokens[i]).contains("title")) {
collection.sortTitle();
n = 1;
}// end of sort title
if ((tokens[i]).contains("artist")) {
collection.sortArtist();
n = 1;
}// end of sort artist
if ((tokens[i]).contains("rank")) {
collection.sortRank();
n = 1;
}// end of sort rank
if ((tokens[i]).contains("year")) {
collection.sortYear();
n = 1;
}// end of sort year
}//end of sort
}// end of for loop
}// end of input.hasNextline loop
/*final PrintStream console = System.out; //saves original System.out
File outputFile = new File("output.txt"); //output file
PrintStream out = new PrintStream(new FileOutputStream(outputFile)); //new FileOutputStream
System.setOut(out); //changes where data will be printed
*/ System.out.println(collection.toString());
/*System.setOut(console); //changes output to print back to console
Scanner outputFileScanner = new Scanner(outputFile); //inputs data from file
while ((outputFileScanner.hasNextLine())) { //while the file still has data
System.out.println(outputFileScanner.nextLine()); //print
}
outputFileScanner.close();
out.close();*/
}
}// end of main
}// end of class
The SongCollection Class, with all of its respective filter and sort methods:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.*;
public class SongCollection {
ArrayList<Song> songs2;
ArrayList<Song> itemsToRemove = new ArrayList<Song>(); // second collection
// for items to
// remove
public SongCollection(ArrayList<Song> songs) { // constructor for SongCollection
System.out.println("Test");
this.songs2 = songs;
}
public void filterYear(Range r) {
int n = 0;
if (n == 0) {
System.out.println("Program is processing.");
n++;
for (Song song1 : songs2) {
if (song1.year > (r.getMax()) || (song1.year) < (r.getMin())) {
itemsToRemove.add(song1);
}
}
songs2.removeAll(itemsToRemove);
itemsToRemove.clear();
}
}
public void filterRank(Range r) {
int n = 0;
if (n == 0) {
System.out.println("Program is processing.");
n++;
for (Song song1 : songs2) {
if (song1.rank > (r.getMax()) || (song1.rank) < (r.getMin())) {
itemsToRemove.add(song1);
}
}
songs2.removeAll(itemsToRemove);
itemsToRemove.clear();
}
}
public void filterArtist(String s) {
int n = 0;
if (n == 0) {
System.out.println("Program is processing.");
n++;
for (Song song1 : songs2) {
if ((!(((song1.artist).contains(s))))) {
itemsToRemove.add(song1);
}
}
songs2.removeAll(itemsToRemove);
itemsToRemove.clear();
}
}
public void filterTitle(String s) {
int n = 0;
if (n == 0) {
System.out.println("Program is processing.");
n++;
for (Song song1 : songs2) {
if ((!(((song1.title).contains(s))))) {
itemsToRemove.add(song1);
}
}
songs2.removeAll(itemsToRemove);
itemsToRemove.clear();
}
}
public void sortTitle() {
Collections.sort(songs2, SongComparator.byTitle()); // now we have a sorted list
}
public void sortRank() {
Collections.sort(songs2, SongComparator.byRank()); // now we have a sorted list
}
public void sortArtist() {
Collections.sort(songs2, SongComparator.byArtist()); // now we have a sorted list
}
public void sortYear() {
Collections.sort(songs2, SongComparator.byYear()); // now we have a sorted list
}
public String toString() {
String result = "";
for (int i = 0; i < songs2.size(); i++) {
result += " " + songs2.get(i);
}
return result;
}
}
SongComparator Class:
import java.util.Comparator;
public class SongComparator implements Comparator<Song> {
public enum Order{
YEAR_SORT, RANK_SORT, ARTIST_SORT, TITLE_SORT
}
private Order sortingBy;
public SongComparator(Order sortingBy){
this.sortingBy = sortingBy;
}
public static SongComparator byTitle() {
return new SongComparator(SongComparator.Order.TITLE_SORT);
}
public static SongComparator byYear() {
return new SongComparator(SongComparator.Order.YEAR_SORT);
}
public static SongComparator byArtist() {
return new SongComparator(SongComparator.Order.ARTIST_SORT);
}
public static SongComparator byRank() {
return new SongComparator(SongComparator.Order.RANK_SORT);
}
#Override
public int compare(Song song1, Song song2) {
switch (sortingBy) {
case YEAR_SORT:
System.out.println("test");
return Integer.compare(song1.year, song2.year);
case RANK_SORT:
System.out.println("test");
return Integer.compare(song1.rank, song2.rank);
case ARTIST_SORT:
System.out.println("test");
return song1.artist.compareTo(song2.artist);
case TITLE_SORT:
System.out.println("test");
return song1.title.compareTo(song2.title);
}
throw new RuntimeException(
"Practically unreachable code, can't be thrown");
}
}

After you output the filtered collection, your program doesn't terminate because you are still in a while loop looking for the next user input line. This is basically what your program is doing:
while (input.hasNextLine()) {
// stuff happens here
System.out.println(collection.toString());
/*
* System.setOut(console); //changes output to print back to console Scanner outputFileScanner = new Scanner(outputFile); //inputs data from file while ((outputFileScanner.hasNextLine()))
* { //while the file still has data System.out.println(outputFileScanner.nextLine()); //print } outputFileScanner.close(); out.close();
*/
}

cant understand why arraylist hasElement method not working in java

the repeated items in the text file should not be added to the list but this program is outputing every word from the list dont know why hasElement method is not working.I need to create an program object which should be called MTFencoder.java and it should accept the name of a text file as a command-line argument such that if there exists a text file called story.txt then your program could be invoked with the following command:
java MTFencoder test.txt
It should produce one line of output for each word of the input file, such that when a word is first encountered then the output is:
0 word
and if the word has been encountered before then the output is a single integer specifying the index of that word in a list of known words ordered according to the most recently used (MRU order).
import java.util.*;
import java.io.*;
class extmycase
{
public static void main(String [] args)
{
Scanner scan=null;
Scanner scan1=null;
wordlist word=null;
String s;
int count=0;
try
{
scan=new Scanner(new File(args[0]));
scan1=new Scanner(new File(args[0]));
while(scan1.hasNext())
{
scan1.next();
count++;
}
System.out.println("No.of words : " + count);
word = new wordlist(count);
while(scan.hasNext())
{
s=scan.next();
if(word.hasElement(s)==true)
{
System.out.println("has element");
}
else
{
word.add(s);
}
}
word.showlist();
}
catch(Exception e)
{
System.err.println("unable to read from file");
}
finally
{
// Close the stream
if(scan != null)
{
scan.close( );
}
if(scan1 !=null)
{
scan1.close();
}
}
}
}
the wordlist program is
import java.lang.*;
import java.util.*;
public class wordlist
{
public String data [];
private int count;
private int MAX;
public wordlist(int n)
{
MAX = n;
data = new String[MAX];
count = 0;
}
// Adds x to the set if it is not already there
public void add(String x)
{
if (count<MAX)
{
data[count++] = x;
}
}
// Removes x from set by replacing with last item and reducing size
public void replace(String x)
{
for(int i=0;i<count;i++)
{
if(data[i]==x)
{
data[count]=data[i];
for(int j=i;j<count;j++)
data[j]=data[j+1];
}
}
}
// Checks if value x is a member of the set
public boolean hasElement(String x)
{
for(int i=0;i<=count;i++)
{
if(data[i].equals(x))
{
return true;
}
}
return false;
}
public int findIndex(String x)
{
for(int i=0;i<=count;i++)
{
if(data[i].equals(x))
{
return i;
}
}
return 0;
}
public void showlist()
{
int l=0;
for(int i=0;i<count;i++)
{
System.out.println(data[i]);
l++;
}
System.out.println(l);
}
}

Your wordlist will never contain any elements. It is constructed, which sets everything to 0, and then you see whether it contains words, with of course it will never do. Also, both scanners point to the same file, so every word that exists from one will have to exist in the other, and all words will be found making this semi-redundant in the first place.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Creating a concordance from a text file - java

Related

Increment count if the object exists in arraylist, else Add the object into arraylist

Word Frequency Counter in my code error

How would I search for a user determined word and count the occurrences in a text file using java?

Filtering/sorting a collection through object fields?

cant understand why arraylist hasElement method not working in java

Categories

Resources