Word Count from a text file using Java - java

I am trying to write a simple code that will give me the word count from a text file. The code is as follows:
import java.io.File; //to read file
import java.util.Scanner;
public class ReadTextFile {
public static void main(String[] args) throws Exception {
String filename = "textfile.txt";
File f = new File (filename);
Scanner scan = new Scanner(f);
int wordCnt = 1;
while(scan.hasNextLine()) {
String text = scan.nextLine();
for (int i = 0; i < text.length(); i++) {
if(text.charAt(i) == ' ' && text.charAt(i-1) != ' ') {
wordCnt++;
}
}
}
System.out.println("Word count is " + wordCnt);
}
}
this code compiles but does not give the correct word count. What am I doing incorrectly?

Right now you are only incrementing wordCnt if the character you are on is a whitespace and the character before it is not. However this discounts several cases, such as if there is not a space, but a newline character. Consider if your file looked like:
This is a text file\n
with a bunch of\n
words.
Your method should return ten, but since there is not space after the words file, and of it will not count them as words.
If you just want the word count you can do something along the lines of:
while(scan.hasNextLine()){
String text = scan.nextLine();
wordCnt+= text.split("\\s+").length;
}
Which will split on white space(s), and return how many tokens are in the resulting Array

First of all remember about closing resources. Please check this out.
Since Java 8 you can count words in this way:
String regex = "\\s+"
String filename = "textfile.txt";
File f = new File (filename);
long wordCnt = 1;
try (var scanner = new Scanner (f)){
wordCnt scanner.lines().map(str -> str.split(regex)).count();
} catch (IOException e) {
e.printStackTrace();
}
System.out.println("Word count is " + wordCnt);

Related

Finishing File Class

I keep getting an error telling me lineNumber cannot be resolved to a variable? I'm not really sure how to fix this exactly. Am I not importing a certain file to java that helps with this?
And also how would I count the number of chars with spaces and without spaces.
Also I need a method to count unique words but I'm not really sure what unique words are.
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.Scanner;
import java.util.StringTokenizer;
import java.util.ArrayList;
import java.util.List;
public class LineWordChar {
public void main(String[] args) throws IOException {
// Convert our text file to string
String text = new Scanner( new File("way to your file"), "UTF-8" ).useDelimiter("\\A").next();
BufferedReader bf=new BufferedReader(new FileReader("way to your file"));
String lines="";
int linesi=0;
int words=0;
int chars=0;
String s="";
// while next lines are present in file int linesi will add 1
while ((lines=bf.readLine())!=null){
linesi++;}
// Tokenizer separate our big string "Text" to little string and count them
StringTokenizer st=new StringTokenizer(text);
while (st.hasMoreTokens()){
s = st.nextToken();
words++;
// We take every word during separation and count number of char in this words
for (int i = 0; i < s.length(); i++) {
chars++;}
}
System.out.println("Number of lines: "+linesi);
System.out.println("Number of words: "+words);
System.out.print("Number of chars: "+chars);
}
}
abstract class WordCount {
/**
* #return HashMap a map containing the Character count, Word count and
* Sentence count
* #throws FileNotFoundException
*
*/
public static void main() throws FileNotFoundException {
lineNumber=2; // as u want
File f = null;
ArrayList<Integer> list=new ArrayList<Integer>();
f = new File("file_stats.txt");
Scanner sc = new Scanner(f);
int totalLines=0;
int totalWords=0;
int totalChars=0;
int totalSentences=0;
while(sc.hasNextLine())
{
totalLines++;
if(totalLines==lineNumber){
String line = sc.nextLine();
totalChars += line.length();
totalWords += new StringTokenizer(line, " ,").countTokens(); //line.split("\\s").length;
totalSentences += line.split("\\.").length;
break;
}
sc.nextLine();
}
list.add(totalChars);
list.add(totalWords);
list.add(totalSentences);
System.out.println(lineNumber+";"+totalWords+";"+totalChars+";"+totalSentences);
}
}
In order to get your code running you have to do at least two changes:
Replace:
lineNumber=2; // as u want
with
int lineNumber=2; // as u want
Also, you need to modify your main method, you can not throw an exception in your main method declaration because there is nothing above it to catch the exception, you have to handle exceptions inside it:
public static void main(String[] args) {
// Convert our text file to string
try {
String text = new Scanner(new File("way to your file"), "UTF-8").useDelimiter("\\A").next();
BufferedReader bf = new BufferedReader(new FileReader("way to your file"));
String lines = "";
int linesi = 0;
int words = 0;
int chars = 0;
String s = "";
// while next lines are present in file int linesi will add 1
while ((lines = bf.readLine()) != null) {
linesi++;
}
// Tokenizer separate our big string "Text" to little string and count them
StringTokenizer st = new StringTokenizer(text);
while (st.hasMoreTokens()) {
s = st.nextToken();
words++;
// We take every word during separation and count number of char in this words
for (int i = 0; i < s.length(); i++) {
chars++;
}
}
System.out.println("Number of lines: " + linesi);
System.out.println("Number of words: " + words);
System.out.print("Number of chars: " + chars);
} catch (Exception e) {
e.printStackTrace();
}
}
I've used a global Exception catch, you can separate expetion in several catches, in order to handle them separatedly. It gives me an exception telling me an obvious FileNotFoundException, besides of that your code runs now.
lineNumber variable should be declared with datatype.
int lineNumber=2; // as u want
change the first line in the main method from just lineNumber to int lineNumber = 2 by setting its data type, as it is important to set data type of every variable in Java.

Counting words from a text-file in Java

I'm writing a program that'll scan a text file in, and count the number of words in it. The definition for a word for the assignment is: 'A word is a non-empty string consisting of only of letters (a,. . . ,z,A,. . . ,Z), surrounded
by blanks, punctuation, hyphenation, line start, or line end.
'.
I'm very novice at java programming, and so far i've managed to write this instancemethod, which presumably should work. But it doesn't.
public int wordCount() {
int countWord = 0;
String line = "";
try {
File file = new File("testtext01.txt");
Scanner input = new Scanner(file);
while (input.hasNext()) {
line = line + input.next()+" ";
input.next();
}
input.close();
String[] tokens = line.split("[^a-zA-Z]+");
for (int i=0; i<tokens.length; i++){
countWord++;
}
return countWord;
} catch (Exception ex) {
ex.printStackTrace();
}
return -1;
}
Quoting from Counting words in text file?
int wordCount = 0;
while (input.hasNextLine()){
String nextLine = input.nextLine();
Scanner word = new Scanner(nextline);
while(word.hasNext()){
wordCount++;
word.next();
}
word.close();
}
input.close();
The only usable word separators in your file are spaces and hyphens. You can use regex and the split() method.
int num_words = line.split("[\\s\\-]").length; //stores number of words
System.out.print("Number of words in file is "+num_words);
REGEX (Regular Expression):
\\s splits the String at white spaces/line breaks and \\- at hyphens. So wherever there is a space, line break or hyphen, the sentence will be split. The words extracted are copied into and returned as an array whose length is the number of words in your file.
you can use java regular expression.
You can read http://docs.oracle.com/javase/tutorial/essential/regex/groups.html to know about group
public int wordCount(){
String patternToMatch = "([a-zA-z]+)";
int countWord = 0;
try {
Pattern pattern = Pattern.compile(patternToMatch);
File file = new File("abc.txt");
Scanner sc = new Scanner(file);
while(sc.hasNextLine()){
Matcher matcher = pattern.matcher(sc.nextLine());
while(matcher.find()){
countWord++;
}
}
sc.close();
}catch(Exception e){
e.printStackTrace();
}
return countWord > 0 ? countWord : -1;
}
void run(String path)
throws Exception
{
try (BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(path), "UTF-8")))
{
int result = 0;
while (true)
{
String line = reader.readLine();
if (line == null)
{
break;
}
result += countWords(line);
}
System.out.println("Words in text: " + result);
}
}
final Pattern pattern = Pattern.compile("[A-Za-z]+");
int countWords(String text)
{
Matcher matcher = pattern.matcher(text);
int result = 0;
while (matcher.find())
{
++result;
System.out.println("Matcher found [" + matcher.group() + "]");
}
System.out.println("Words in line: " + result);
return result;
}

Checking each character from text file , Java

I have a text file, and when I read each line of the file and write it in array. I want to check if the character is not a dollar sign '$' . If it is, then Jump to next character and write the following part till the next dollar sign in next array. Therefore to divide each line in 3 parts, each part in different array.
Appreciate your time and help!
public void retrieveFromTxt() throws IOException
{
textArea.setText(" ");
String fileName = "Name_Of_City.txt";
String line = " ";
String entry = null;
y = 0;
char letter = '$', value = ' ';
FileReader fileReader = new FileReader(fileName);
BufferedReader br1 = new BufferedReader(fileReader);
String TextLine = br.readLine();
x = 1; // Random Variable to jump from character to character in array.
// To Get Values Back from FIle Each value in correct array, I seperated each value with "$" sign
// Checking each letter and printing it to array[y] possition until the $ sign is met, then jump over it and continue in other array.
try {
while(y < 19) {
while(TextLine != null) {
while(TextLine.charAt(x)!=(letter)) {
line = line + TextLine.charAt(x);
Country[y] = ( line );
x++;
}
}
while(TextLine != null) {
while(TextLine.charAt(x)!=(letter)) {
line = line + TextLine.charAt(x);
City[y] = ( line );
x++;
}
}
while((line = br1.readLine()) != null) {
while(line.charAt(x)!=(letter)) {
Population[y] = (textArea.getText()+ entry );
x++;
}
}
y++;
textArea.setText(textArea.getText()+ "\n" );
}
}
catch(FileNotFoundException error) {
//Exception can be met if program cannot find the file , in that case messageDialog will pop up with help message
JOptionPane.showMessageDialog(null,"Missing file"+", " + fileName + ", for a support contact oleninshelvijs#gmail.com!" ) ;
}catch (StringIndexOutOfBoundsException err){}
br1.close();
}
}
Try using the string.split function. You can do:
String s = "my$sign"
String parts[] = s.split("\\$")
Now parts[0] will hold "my" and parts[1] will hold "sign".
Edit: As pointed out by user Ukfub below, you will need to escape the symbol. Updating my code to reflect.
String parts[] = s.split("$")
That will not work because the parameter for the "split" method is a regular expression. You have to escape the special character '$':
String parts[] = s.split("\\$")

Storing input into file

I am having an issue trying to search a text file for the exact input that a user enters. I want to output the sentence not only by direct user input but i want the program to recognize some word(s) that would signal the desired text. I got searching for the keyword part down pack and working but i am only able to search the text based on the keyword. I want to search based on the keyword and the entire inputted sentence. For example if the keyword is e-mail and the user enter's what is mars e-mail? and the text file contains "mars e-mail is mars3433#aol.com, john e-mail is anonymous" i want to output mars e-mail is ... instead of both sentences. I am completely stuck trying to figure out this issue, Can anyone help me?
public static class DicEntry {
String key;
String[] syns;
Pattern pattern;
public DicEntry(String key, String... syns) {
this.key = key;
this.syns = syns;
pattern = Pattern.compile(".*(?:"
+ Stream.concat(Stream.of(key), Stream.of(syns))
.map(x -> "\\b" + Pattern.quote(x) + "\\b")
.collect(Collectors.joining("|")) + ").*");
}
}
public static void removedata(String s) throws IOException {
File f = new File("data.txt");
File f1 = new File("data2.txt");
BufferedReader input = new BufferedReader(new InputStreamReader(
System.in));
BufferedReader br = new BufferedReader(new FileReader(f));
PrintWriter pr = new PrintWriter(f1);
String line;
while ((line = br.readLine()) != null) {
if (line.contains(s)) {
System.out.println("Enter new Text :");
String newText = input.readLine();
line = newText;
System.out.println("Thank you, Have a good Day!");
}
pr.println(line);
}
br.close();
pr.close();
input.close();
Files.move(f1.toPath(), f.toPath(), StandardCopyOption.REPLACE_EXISTING);
}
public static void parseFile(String s) throws IOException {
File file = new File("data.txt");
Scanner forget = new Scanner(System.in);
Scanner scanner = new Scanner(file);
int flag_found = 0;
while (scanner.hasNextLine()) {
final String lineFromFile = scanner.nextLine();
if (lineFromFile.contains(s)) {
// a match!
System.out.println(lineFromFile);
flag_found = 1;
System.out
.println(" Would you like to update this information ? ");
String yellow = forget.nextLine();
if (yellow.equals("yes")) {
removedata(lineFromFile);
} else if (yellow.equals("no")) {
System.out.println("Have a good day");
// break;
}
}
}
if (flag_found == 0) {// input is not found in the txt file so
// flag_found remains 0
writer();
}
}
public static void writer() {
Scanner Keyboard = new Scanner(System.in);
Scanner input = new Scanner(System.in);
File file = new File("data.txt");
try (BufferedWriter wr = new BufferedWriter(new FileWriter(
file.getAbsoluteFile(), true))) { // Creates a writer object
// called wr
// file.getabsolutefile
// takes the filename and
// keeps on storing the old
System.out.println("I Do not know, Perhaps you want to teach me?"
+ "..."); // data
while ((Keyboard.hasNext())) {
String lines = Keyboard.nextLine();
System.out.print(" is this correct ? ");
String go = input.nextLine();
if (go.equals("no")) {
System.out.println("enter line again");
lines = Keyboard.nextLine();
System.out.print(" is this correct ? ");
go = input.nextLine();
}
else if (go.equals("yes")) {
wr.write(lines);
// wr.write("\n");
wr.newLine();
wr.close();
}
System.out.println("Thankk you");
break;
}
} catch (IOException e) {
System.out.println(" cannot write to file " + file.toString());
}
}
private static List<DicEntry> populateSynonymMap() {
List<DicEntry> responses = new ArrayList<>();
responses.add(new DicEntry("student", "pupil", "scholar"));
responses.add(new DicEntry("office", "post", "room"));
responses.add(new DicEntry("topics", "semester talk"));
return responses;
}
public static void getinput() throws IOException {
List<DicEntry> synonymMap = populateSynonymMap(); // populate the map
Scanner scanner = new Scanner(System.in);
String input = null;
/* End Initialization */
System.out.println("Welcome ");
System.out.println("What would you like to know?");
System.out.print("> ");
input = scanner.nextLine().toLowerCase();
String[] inputs = input.split(" ");
int flag_found = 0;
for (DicEntry entry : synonymMap) { // iterate over each word of the
// sentence.
if (entry.pattern.matcher(input).matches()) {
// System.out.println(entry.key);
parseFile(entry.key);
flag_found = 1;// Input is found
}
}
if (flag_found == 0) {// input is not found in the txt file so
// flag_found remains 0
writer();
}
}
public static void main(String args[]) throws ParseException, IOException {
/* Initialization */
getinput();
}
}
So my methods work like this, the parse file method searching the text file for the keyword in the sentence. My writer( ) writes to the file if the input is not found and my remove data ( ) erases the line and updates it with the new string upon user request. and get input is just a method to get information from the scanner.
In my opinion, additional obstacle is fact, that some word can repeat in unrelated sentences. My solution seems to be quite long for me, but it works. However when I test it, I didn't use your dicEntry. It is impossible to hard-code all synonyms, so you should reconsider this approach.
I added one class, jast as data holder for int repetition variable (see below) and particular sentence:
public class Pair {
int repetitions;
String sentence;
public Pair(int rep, String string){
repetitions = rep;
sentence = string;
}
public int getRepetitions() {
return repetitions;
}
public String getSentence() {
return sentence;
}
}
Then I wrote a method, which loop through input sentence, and file content, looking for sentence from file, in which most inputs words repeted. I pretty sure, it is not most efficient way, but I don't know another :P.
public static String getMostAppropriate(String[] input) throws IOException{
File file = new File("data.txt");
Scanner scanner = new Scanner(file);
ArrayList<Pair> pairs = new ArrayList<>();
int repetitions = 0;
while (scanner.hasNextLine()) {
String newLine = scanner.nextLine();
String[] line = newLine.split(","); // this regex depends on your file format style,
String oneSentence = "";
for(String sentence : line){ // for sentence in file lines
for(String string : sentence.split(" ")){ // for words in these sentences
for(String word : input){ // for words from input
if(word.equals(string)){
repetitions += 1;
oneSentence = sentence;
}
}
}
pairs.add(new Pair(repetitions,oneSentence));
repetitions = 0;
}
}
return mostCommon(pairs);
}
The argument is String[] inputs form your getInput method. In return statement I called another new method, which looks for sentences with most repetitions:
public static String mostCommon(ArrayList<Pair> pairs){
Pair max = new Pair(0,"");
String result = "";
for(Pair pair : pairs){
if(pair.getRepetitions() > max.getRepetitions()){
result = pair.getSentence();
max = pair;
}else if(pair.getRepetitions()==max.getRepetitions()){
result += "; " + pair.getSentence();
}
}
return result;
}
If some sentences have same number of repetitions, it returns both(or more) connected into one sentence (sentence; sentence; etc.).
Implementation into your code I left for you, if you are interested.
As I said, I didn't use your dicEntry, still you can add it as additional loop, but chacking whole dictionary will not be too effective with my method.
Also, if I were you, I would divide some of your methods into smaller one, I mean like: read file in one, ask for additional input in another. Because it is easier to implement changes this way. You don't need to keep eye on whole method, just arguments they pass to each other.
I hope you will find something useful in my post.

Error while counting number of character,lines and words in java

i have written the following code to count the number of character excluding white spaces,count number of words,count number of lines.But my code is not showing proper output.
import java.io.*;
class FileCount
{
public static void main(String args[]) throws Exception
{
FileInputStream file=new FileInputStream("sample.txt");
BufferedReader br=new BufferedReader(new InputStreamReader(file));
int i;
int countw=0,countl=0,countc=0;
do
{
i=br.read();
if((char)i==(' '))
countw++;
else if((char)i==('\n'))
countl++;
else
countc++;
}while(i!=-1);
System.out.println("Number of words:"+countw);
System.out.println("Number of lines:"+countl);
System.out.println("Number of characters:"+countc);
}
}
my file sample.txt has
hi my name is john
hey whts up
and my out put is
Number of words:6
Number of lines:2
Number of characters:26
You need to discard other whitespace characters as well including repeats, if any. A split around \\s+ gives you words separated by not only all whitespace characters but also any appearance of those characters in succession.
Having got a list of all words in the line it gets easier to update the count of words and characters using length methods of array and String.
Something like this will give you the result:
String line = null;
String[] words = null;
while ((line = br.readLine()) != null) {
countl++;
words = line.split("\\s+");
countw += words.length;
for (String word : words) {
countc += word.length();
}
}
A new line means also that the words ends.
=> There is not always a ' ' after each word.
do
{
i=br.read();
if((char)i==(' '))
countw++;
else if((char)i==('\n')){
countl++;
countw++; // new line means also end of word
}
else
countc++;
}while(i!=-1);
End of file should also increase the number of words (if no ' ' of '\n' was the last character.
Also handling of more than one space between words is still not handled correctly.
=> You should think about more changes in your approach to handle this.
import java.io.*;
class FileCount {
public static void main(String args[]) throws Exception {
FileInputStream file = new FileInputStream("sample.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(file));
int i;
int countw = 0, countl = 0, countc = 0;
do {
i = br.read();
if ((char) i == (' ')) { // You should also check for other delimiters, such as tabs, etc.
countw++;
}
if ((char) i == ('\n')) { // This is for linux Windows should be different
countw++; // Newlines also delimit words
countl++;
} // Removed else. Newlines and spaces are also characters
if (i != -1) {
countc++; // Don't count EOF as character
}
} while (i != -1);
System.out.println("Number of words " + countw);
System.out.println("Number of lines " + countl); // Print lines instead of words
System.out.println("Number of characters " + countc);
}
}
Ouput:
Number of words 8
Number of lines 2
Number of characters 31
Validation
$ wc sample.txt
2 8 31 sample.txt
Try this:
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
public class FileCount {
/**
*
* #param filename
* #return three-dimensional int array. Index 0 is number of lines
* index 1 is number of words, index 2 is number of characters
* (excluding newlines)
*/
public static int[] getStats(String filename) throws IOException {
FileInputStream file = new FileInputStream(filename);
BufferedReader br = new BufferedReader(new InputStreamReader(file));
int[] stats = new int[3];
String line;
while ((line = br.readLine()) != null) {
stats[0]++;
stats[1] += line.split(" ").length;
stats[2] += line.length();
}
return stats;
}
public static void main(String[] args) {
int[] stats = new int[3];
try {
stats = getStats("sample.txt");
} catch (IOException e) {
System.err.println(e.toString());
}
System.out.println("Number of words:" + stats[1]);
System.out.println("Number of lines:" + stats[0]);
System.out.println("Number of characters:" + stats[2]);
}
}

Categories