Counting distinct words V2 [duplicate] - java

This question already has an answer here:
Counting distinct words with Threads
(1 answer)
Closed 9 years ago.
I've asked this question before ( Counting distinct words with Threads ) and made the code more appropriate. As described in first question I need to count the distinct words from a file.
De-Bug shows that all my words are stored and sorted correctly, but the issue now is an infinite "while" loop in the Test class that keeps on going after reading all the words (De-bug really helped to figure out some points...).
I'm testing the code on a small file now with no more than 10 words.
DataSet class has been modified mostly.
I need some advice how to get out of the loop.
Test looks like this:
package test;
import java.io.File;
import java.io.IOException;
import junit.framework.Assert;
import junit.framework.TestCase;
import main.DataSet;
import main.WordReader;
public class Test extends TestCase
{
public void test2() throws IOException
{
File words = new File("resources" + File.separator + "test2.txt");
if (!words.exists())
{
System.out.println("File [" + words.getAbsolutePath()
+ "] does not exist");
Assert.fail();
}
WordReader wr = new WordReader(words);
DataSet ds = new DataSet();
String nextWord = wr.readNext();
// This is the loop
while (nextWord != "" && nextWord != null)
{
if (!ds.member(nextWord))
{
ds.insert(nextWord);
}
nextWord = wr.readNext();
}
wr.close();
System.out.println(ds.toString());
System.out.println(words.toString() + " contains " + ds.getLength()
+ " distinct words");
}
}
Here is my updated DataSet class, especially member() method, I'm still not sure about it because at some point I used to get a NullPointerExeption (don't know why...):
package main;
import sort.Sort;
public class DataSet
{
private String[] data;
private static final int DEFAULT_VALUE = 200;
private int nextIndex;
private Sort bubble;
public DataSet(int initialCapacity)
{
data = new String[initialCapacity];
nextIndex = 0;
bubble = new Sort();
}
public DataSet()
{
this(DEFAULT_VALUE);
nextIndex = 0;
bubble = new Sort();
}
public void insert(String value)
{
if (nextIndex < data.length)
{
data[nextIndex] = value;
nextIndex++;
bubble.bubble_sort(data, nextIndex);
}
else
{
expandCapacity();
insert(value);
}
}
public int getLength()
{
return nextIndex + 1;
}
public boolean member(String value)
{
for (int i = 0; i < data.length; i++)
{
if (data[i] != null && nextIndex != 10)
{
if (data[i].equals(value))
return true;
}
}
return false;
}
private void expandCapacity()
{
String[] larger = new String[data.length * 2];
for (int i = 0; i < data.length; i++)
{
data = larger;
}
}
}
WordReader class didn't change much. ArrayList was replaced with simple array, storing method also has been modified:
package main;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
public class WordReader
{
private File file;
private String[] words;
private int nextFreeIndex;
private BufferedReader in;
private int DEFAULT_SIZE = 200;
private String word;
public WordReader(File file) throws IOException
{
words = new String[DEFAULT_SIZE];
in = new BufferedReader(new FileReader(file));
nextFreeIndex = 0;
}
public void expand()
{
String[] newArray = new String[words.length * 2];
// System.arraycopy(words, 0, newArray, 0, words.length);
for (int i = 0; i < words.length; i++)
newArray[i] = words[i];
words = newArray;
}
public void read() throws IOException
{
}
public String readNext() throws IOException
{
char nextCharacter = (char) in.read();
while (in.ready())
{
while (isWhiteSpace(nextCharacter) || !isCharacter(nextCharacter))
{
// word = "";
nextCharacter = (char) in.read();
if (!in.ready())
{
break;
}
}
word = "";
while (isCharacter(nextCharacter))
{
word += nextCharacter;
nextCharacter = (char) in.read();
}
storeWord(word);
return word;
}
return word;
}
private void storeWord(String word)
{
if (nextFreeIndex < words.length)
{
words[nextFreeIndex] = word;
nextFreeIndex++;
}
else
{
expand();
storeWord(word);
}
}
private boolean isWhiteSpace(char next)
{
if ((next == ' ') || (next == '\t') || (next == '\n'))
{
return true;
}
return false;
}
private boolean isCharacter(char next)
{
if ((next >= 'a') && (next <= 'z'))
{
return true;
}
if ((next >= 'A') && (next <= 'Z'))
{
return true;
}
return false;
}
public boolean fileExists()
{
return file.exists();
}
public boolean fileReadable()
{
return file.canRead();
}
public Object wordsLength()
{
return words.length;
}
public void close() throws IOException
{
in.close();
}
public String[] getWords()
{
return words;
}
}
And Bubble Sort class for has been changed for strings:
package sort;
public class Sort
{
public void bubble_sort(String a[], int length)
{
for (int j = 0; j < length; j++)
{
for (int i = j + 1; i < length; i++)
{
if (a[i].compareTo(a[j]) < 0)
{
String t = a[j];
a[j] = a[i];
a[i] = t;
}
}
}
}
}

I suppose the method that actually blocks is the WordReader.readNext(). My suggestion there is that you use Scanner instead of BufferedReader, it is more suitable for parsing a file into words.
Your readNext() method could be redone as such (where scan is a Scanner):
public String readNext() {
if (scan.hasNext()) {
String word = scan.next();
if (!word.matches("[A-Za-z]+"))
word = "";
storeWord(word);
return word;
}
return null;
}
This will have the same functionality as your code (without using isCharacter() or isWhitespace() - the regex (inside matches())checks that a word contains only characters. The isWhitespace() functionality is built-in in next() method which separates words. The added functionality is that it returns null when there are no more words in the file.
You'll have to change your while-loop in Test class for this to work properly or you will get a NullPointerException - just switch the two conditions in the loop definition (always check for null before, or the first will give a NPE either way and the null-check is useless).
To make a Scanner, you can use a BufferedReader as a parameter or the File directly as well, as such:
Scanner scan = new Scanner(file);

Related

avoid repetition in a Java array

I'm trying to get my code to not only search if a char is present in an array, but also if it is present next to one another. So, if the input is hannah, the output should be hanah. It should only remove a char if it is next to the same char.
import java.util.*;
public class test {
static void removeDuplicate(char str[], int length) {
int index = 0;
for (int i = 0; i < length; i++) {
int j;
for (j = 0; j < i; j++) {
if (str[i] == str[j])
{
break;
}
}
if (j == i)
{
str[index++] = str[i];
}
}
System.out.println(String.valueOf(Arrays.copyOf(str, index)));
}
public static void main(String[] args) {
String info = "hannahmontana";
char str[] = info.toCharArray();
int len = str.length;
removeDuplicate(str, len);
}
}
This my solution
static String removeDuplicate(char str[], int length) {
if (length == 0) return "";
List<Character> list = new ArrayList<>();
list.add(str[0]);
for (int i = 1; i < length; i++) {
if (list.get(list.size() - 1) != str[i]) {
list.add(str[i]);
}
}
return list.stream()
.map(Object::toString)
.collect(Collectors.joining());
}
You can do a recursive call here:
import java.util.*;
public class test {
static String removeDuplicate(String input) {
if(input.length()<=1)
return input;
if(input.charAt(0)==input.charAt(1))
return removeDuplicate(input.substring(1));
else
return input.charAt(0) + removeDuplicate(input.substring(1));
}
public static void main(String[] args) {
String info = "hannahmontana";
System.out.println(removeDuplicate(info));
}
}
You can also try RegExp. Maybe not so fast, but I consider it simpler and more readable.
static String removeDuplicate(char[] chars, int ignored) {
return new String(chars).replaceAll("(.)\\1+", "$1")
}
Thanks for all the great answers! It turns out the solution was really simple. I just needed to change str[j] to str[i-1].

Confused with why I am getting Index out of bounds error?

So I am trying to create a program which takes a text file, creates an index (by line numbers) for all the words in the file and writes the index into the output file. Here is the main class:
import java.util.Scanner;
import java.io.*;
public class IndexMaker
{
public static void main(String[] args) throws IOException
{
Scanner keyboard = new Scanner(System.in);
String fileName;
// Open input file:
if (args.length > 0)
fileName = args[0];
else
{
System.out.print("\nEnter input file name: ");
fileName = keyboard.nextLine().trim();
}
BufferedReader inputFile =
new BufferedReader(new FileReader(fileName), 1024);
// Create output file:
if (args.length > 1)
fileName = args[1];
else
{
System.out.print("\nEnter output file name: ");
fileName = keyboard.nextLine().trim();
}
PrintWriter outputFile =
new PrintWriter(new FileWriter(fileName));
// Create index:
DocumentIndex index = new DocumentIndex();
String line;
int lineNum = 0;
while ((line = inputFile.readLine()) != null)
{
lineNum++;
index.addAllWords(line, lineNum);
}
// Save index:
for (IndexEntry entry : index)
outputFile.println(entry);
// Finish:
inputFile.close();
outputFile.close();
keyboard.close();
System.out.println("Done.");
}
}
The program contains two more classes: IndexEntry which represents one index entry, and the DocumentIndex class which represents the entire index for a document: the list of all its index entries. The index entries should always be arranged in alphabetical order. So the implementation for these two classes are shown below
import java.util.ArrayList;
public class IndexEntry {
private String word;
private ArrayList<Integer> numsList;
public IndexEntry(String w) {
word = w.toUpperCase();
numsList = new ArrayList<Integer>();
}
public void add(int num) {
if (!numsList.contains(num)) {
numsList.add(num);
}
}
public String getWord() {
return word;
}
public String toString() {
String result = word + " ";
for (int i=0; i<numsList.size(); i++) {
if (i == 0) {
result += numsList.get(i);
} else {
result += ", " + numsList.get(i);
}
}
return result;
}
}
import java.util.ArrayList;
public class DocumentIndex extends ArrayList<IndexEntry> {
public DocumentIndex() {
super();
}
public DocumentIndex(int c) {
super(c);
}
public void addWord(String word, int num) {
super.get(foundOrInserted(word)).add(num);
}
private int foundOrInserted(String word) {
int result = 0;
for (int i=0; i<super.size(); i++) {
String w = super.get(i).getWord();
if (word.equalsIgnoreCase(w)) {
result = i;
} else if (w.compareTo(word) > 0) {
super.add(i, new IndexEntry(w));
result = i;
}
}
return result;
}
public void addAllWords(String str, int num) {
String[] arr = str.split("[^A-Za-z]+");
for (int i=0; i<arr.length; i++) {
if (arr[i].length() > 0 ) {
addWord(arr[i], num);
}
}
}
}
When I run this program I'm getting an error and I'm not sure where the error came from.
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0
at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248)
at java.base/java.util.Objects.checkIndex(Objects.java:372)
at java.base/java.util.ArrayList.get(ArrayList.java:459)
at DocumentIndex.addWord(DocumentIndex.java:14)
at DocumentIndex.addAllWords(DocumentIndex.java:35)
at Main.main(Main.java:53)```
There is where the problem arises:
String line;
int lineNum = 0;
while ((line = inputFile.readLine()) != null)
{
lineNum++;
index.addAllWords(line, lineNum);
}
You add lineNum by 1 before executing the line after. At the last loop, lineNum will be 1 more than the maximum, because the loop starts at line 1, and it is 0 index based.
Instead, use:
String line;
int lineNum = 0;
while ((line = inputFile.readLine()) != null)
{
index.addAllWords(line, lineNum);
lineNum++;
}

Why am I getting an out of bounds exception error?

I am currently working on a project and I have finally finished writing my code but for some reason I am getting a lot of errors and I am not sure where they are. I know it is frowned upon for posting the whole code here, I am working on Java and it is very difficult for me to find errors on Java. My project partner has never been helpful. I am the only one carrying the burden.
Here is my code
public class Main {
public static int Statement_Number = 1;
public static String Current_Statement;
public static int i = 0; //Index for Script
public static int j = 0; //Index for Statements
public static String Missing_Word;
public static ArrayList<String> Letters_of_the_missing_word = new ArrayList<String>();
public static char WhiteSpace = ' ';
public static String Movie_Script;
public static char Missing_Word_Characters[] = new char[20];
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
int count = 0;
int length = 0; //Length to compute LPS
char Underscore = '_';
ArrayList<String> Statements_List = new ArrayList<String>();
boolean word_missing = false;
BufferedReader Statements = new BufferedReader(new FileReader("C:\\Users\\hasan\\Desktop\\Programming\\Java Programs\\Analysis of Algorithms Project\\term_project\\statements.txt"));
BufferedReader Script = new BufferedReader(new FileReader("C:\\Users\\hasan\\Desktop\\Programming\\Java Programs\\Analysis of Algorithms Project\\term_project\\the_truman_show_script.txt"));
Movie_Script = Script.readLine();
int Script_Length = Movie_Script.length();
//System.out.println(Script_Length); //81,902
//System.out.println(Movie_Script);
while((Current_Statement = Statements.readLine()) != null)
{
Statements_List.add(Current_Statement);
int Current_Statement_length = Current_Statement.length();
for(Statement_Number = 0; Statement_Number < 6; Statement_Number++)
{
//System.out.println(Statements_List.get(i));
Current_Statement = Statements_List.get(Statement_Number);
System.out.println(Statement_Number + ". " + Current_Statement);
KMPSearch(Current_Statement, Movie_Script);
}
}
}
static void KMPSearch(String Current_Statement, String Movie_Script)
{
int Current_Statement_Length = Current_Statement.length();
int Movie_Script_Length = Movie_Script.length();
int lps[] = new int[Current_Statement_Length];
int j = 0; //Index for Current_Statement
char Underscore = '_';
Calculating_LPS_Array(Current_Statement, Current_Statement_Length, lps);
int i = 0;
while(i < Movie_Script_Length)
{
if(Current_Statement.charAt(j) == Movie_Script.charAt(i))
{
i++;
j++;
}
else if(Current_Statement.charAt(j) != Movie_Script.charAt(i) && Current_Statement.charAt(j) == Underscore)
{
//Replace the underscores with the word
Word_Getter();
String New_Statement = Current_Statement.replaceAll("___", Missing_Word);
System.out.println(New_Statement);
System.out.println("");
}
else if(i < Movie_Script_Length && Current_Statement.charAt(j) != Movie_Script.charAt(i))
{
if(j != 0)
{
j = lps[j - 1];
}
else
{
i = i + 1;
}
}
}
if(i == Movie_Script_Length)
{
System.out.println("STATEMENT NOT FOUND");
}
}
static void Calculating_LPS_Array(String Current_Statement, int Current_Statement_Length, int lps[])
{
int len = 0;
int i = 1;
lps[0] = 0;
while(i < Current_Statement_Length)
{
if(Current_Statement.charAt(i) == Current_Statement.charAt(len))
{
len++;
lps[i] = len;
i++;
}
else //Current_Statement.charAt(i) != Current_Statement.charAt(len)
{
if(len != 0)
{
len = lps[len - 1];
}
else
{
lps[i] = len;
i++;
}
}
}
}
static void Word_Getter()
{
if(Movie_Script.charAt(j) != WhiteSpace)
{
Movie_Script.getChars(j, WhiteSpace, Missing_Word_Characters, 0);
j++;
Missing_Word = new String(Missing_Word_Characters);
}
}
}
Here are the errors that I am getting
Exception in thread "main" java.lang.StringIndexOutOfBoundsException:
offset 0, count 32, length 20 at
java.base/java.lang.String.checkBoundsOffCount(String.java:3304) at
java.base/java.lang.String.getChars(String.java:855) at
Main.Word_Getter(Main.java:142) at Main.KMPSearch(Main.java:85) at
Main.main(Main.java:57)
Your help would be much appreciated, thank you in advance.
You just need to change the size of your Array :
public static char Missing_Word_Characters[] = new char[32];

I keep getting string index out of bounds exception and I have no idea why

Here is my input file
So I am reading in a .txt file and I keep getting a string index out of bounds exception. I have been trying to find duplicate words and keep the array sorted as I add words to it. I thought my problem was trying to sort and search the array when It has no words or only one word in it.
The line with the ** in front of it is the problem line. Its line 129
import java.io.*;
import java.util.Scanner;
import java.util.regex.*;
public class BuildDict
{
static String dict[] = new String[20];
static int index = 0;
public static void main(String args [])
{
readIn();
print();
}
public static void readIn()
{
File inFile = new File("carol.txt");
try
{
Scanner scan = new Scanner(inFile);
while(scan.hasNext())
{
String word = scan.next();
if(!Character.isUpperCase(word.charAt(0)))
{
checkRegex(word);
}
}
scan.close();
}
catch(IOException e)
{
System.out.println("Error");
}
}
public static void addToDict(String word)
{
if(index == dict.length)
{
String newAr[] = new String[dict.length*2];
for(int i = 0; i < index; i++)
{
newAr[i] = dict[i];
}
if(dict.length < 2)
{
newAr[index] = word;
index++;
}
else
{
bubbleSort(word);
if(!wordHasDuplicate(word))
{
newAr[index] = word;
index++;
}
}
dict = newAr;
}
else
{
dict[index] = word;
index++;
}
}
public static void checkRegex(String word)
{
String regex = ("[^A-Za-z]");
Pattern check = Pattern.compile(regex);
Matcher regexMatcher = check.matcher(word);
if(!regexMatcher.find())
{
addToDict(word);
}
}
public static void print()
{
try
{
FileWriter outFile = new FileWriter("dict.txt");
for(int i = 0; i < index; i++)
{
outFile.write(dict[i]);
outFile.write(" \n ");
}
outFile.close();
}
catch (IOException e)
{
System.out.println("Error ");
}
}
public static void bubbleSort(String word)
{
boolean swap = true;
String temp;
int wordBeforeIndex = 0;
String wordBefore;
while(swap)
{
swap = false;
wordBefore = dict[wordBeforeIndex];
for(int i = 0; (i < word.length()) && (i < wordBefore.length()) i++)
{
**if(word.charAt(i) < wordBefore.charAt(i))**
{
temp = wordBefore;
dict[wordBeforeIndex] = word;
dict[wordBeforeIndex++] = temp;
wordBeforeIndex++;
swap = true;
}
}
}
}
public static boolean wordHasDuplicate(String word)
{
int low = 0;
int high = dict.length - 1;
int mid = low + (high - low) /2;
while (low <= high && dict[mid] != word)
{
if (word.compareTo(dict[mid]) < 0)
{
low = mid + 1;
}
else
{
high = mid + 1;
}
}
return true;
}
}
Error is shown below:
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 2
at java.lang.String.charAt(String.java:658)
at BuildDict.bubbleSort(BuildDict.java:129)
at BuildDict.addToDict(BuildDict.java:60)
at BuildDict.checkRegex(BuildDict.java:90)
at BuildDict.readIn(BuildDict.java:30)
at BuildDict.main(BuildDict.java:14)
Check the length of wordBefore as a second condition of your for loop:
for(int i = 0; (i < word.length()) && (i < wordbefore.length()); i++)

Parameter passing in Java problems

I am new to java, and have been writing a program to check if a given string is periodic or not.A string is not periodic if it cannot be represented as a smaller string concatenated some number of times. Example "1010" is periodic but "1011" is not. Here is my code. It compiles, but the problem is that it tells every string is not periodic. I guess the problem is with the for loop in the isPeriodic function. Please help me get it correct.
import java.io.*;
import java.util.*;
public class Test {
/**
* #param args
*/
public static void main(String[] args) throws java.lang.Exception {
java.io.BufferedReader R = new java.io.BufferedReader
(new java.io.InputStreamReader(System.in));
//String st = R.readLine();
String st = "10101010";
if (isPeriodic(st) == false) {
System.out.println(" Non Periodic");
}
else {
System.out.println("Periodic");
}
}
private static boolean isPeriodic(String s)
{
String temp = s;
int i;
boolean pflag = false;
for ( i = 1; i <= (s.length()/2); i++) {
s = rotateNltr(s,i);
if (s == temp) {
pflag = true;
break;
}
}
return pflag;
}
private static String rotateNltr(String s, int n) {
if( n > s.length()) {
return null;
}
for ( int i = 0; i < n; i++) {
s = leftRotatebyOne(s);
}
//System.out.println(s);
return s;
}
private static String leftRotatebyOne(String s) {
char[] temp = s.toCharArray();
char t = temp[0];
for ( int i = 0 ; i < s.length()-1 ;i++ ) {
temp[i] = temp [i+1];
}
temp[s.length()-1] = t;
String r = new String(temp);
//System.out.println(r);
return r;
}
}
You can't compare objects (and that includes String's) with ==. You have to use the equals method.
Unlike C++ (which I assume is your language of preference) Java doesn't allow comparing String objects with the == operator. Use the equals method to compare the strings.
if (s.equals(temp)) {
pflag = true;
break;
}
In your isPeriodic() the check you are doing is wrong. Do it as below:
if (s.equals(temp)) {
pflag = true;
break;
}
s.equal(temp) alone wont solve the problem, yes it will make the code execute correctly for the input as given in Main method but for 1010, 1011 it wont.
Try using this method :
private static boolean isPeriodic(String s) {
String temp = s;
int i;
boolean pflag = false;
for (i = 1; i <= (s.length() / 2); i++) {
s = leftRotatebyOne(s);
if (s.equals(temp)) {
pflag = true;
break;
}
}
return pflag;
}
This will ensure that for all combination this program works.

Categories