Java - Counting words, lines, and characters from a file - java

I'm trying to read in words from a file. I need to count the words, lines, and characters in the text file. The word count should only include words (containing only alphabetic letters, no punctuation, spaces, or non-alphabetic characters). The character count should only include the characters inside those words.
This is what I have so far. I'm unsure of how to count the characters. Every time I run the program, it jumps to the catch mechanism as soon as I enter the file name (and it should have no issues with the file path, as I've tried using it before). I tried to create the program without the try/catch to see what the error was, but it wouldn't work without it.
Why is it jumping to the catch function when I enter the file name? How can I fix this program to properly count words, lines, and characters in the text file?

I don't get any exception with your code if I give a proper file name. As for reading the number of character, you should modify the logic a little bit. Instead of directly concatenating the number of words count, you should create a new instance of StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+"); and iterate through all the token and sum the length of each token. This should give you the number of characters. Something like below
while (fileScan.hasNextLine()) {
lineC++;
tempo = fileScan.nextLine();
StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+");
wordC += st.countTokens();
while(st.hasMoreTokens()) {
String stt = st.nextToken();
System.out.println(stt); // Displaying string to confirm that like is splitted as I expect it to be
charC += stt.length();
}
System.out.println("Lines: " + lineC + "\nWords: " + wordC+" \nChars: "+charC);
}
Note: Escaping character with StringTokenizer will not work. i.e. you would expect that \\s should delimit with any whitespace character but it will instead delimit based on literal character s. If you want to escape a character, I suggest you to use java.util.Pattern and java.util.Matcher and use it matcher.find() to idenfity words and characters

I tried your code but I didn't receive any exception here. However, I suspect that when you input the file name, maybe you forgot the extension of the file.

You probably forgot the file extension while giving input, but there is a much simpler way of doing this. You also mention you don't know how to count the characters. You can try something like this:
import java.util.Scanner;
import java.util.StringTokenizer;
import java.io.*;
import java.util.stream.*;
public class WordCount
{
public static void main(String[] args)
{
Scanner userInput = new Scanner(System.in);
try {
// Input file
System.out.println("Please enter the name of the file.");
String content = Files.readString(Path.of("C:/Users/garre/OneDrive/Desktop/" + userInput.next()));
System.out.printf("Lines: %d\nWords: %d\nCharacters: %d",content.split("\n").length,Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count(),content.length());
}
catch (IOException ex1) {
System.out.println("Error.");
System.exit(0);
}
}
}
Going through the code
import java.util.stream.*;
Note we use the streams package, for filtering out empty strings while finding words. Now let's skip forward a bit.
String content = Files.readString(Path.of("C:/Users/garre/OneDrive/Desktop/" + userInput.next()));
The above part gets all of the text in the file and stores it as a string.
System.out.printf("Lines: %d\nWords: %d\nCharacters: %d",content.split("\n").length,Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count(),content.length());
Okay, this is a long line. Let's break it down.
"Lines: %d\nWords: %d\nCharacters: %d" is a format string, where each %d is replaced with the corresponding argument in the printf function. The first %d will be replaced by content.split("\n").length, which is the number of lines. We get the number of lines by splitting the string.
The second %d is replaced by Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count(). Stream.of creates a stream from an array, and the array is an array of strings after you split on anything that is non-alphabetic (you said words are anything that are non-alphabetic). Next, we filter all the empty values out, since String.split keeps in empty values. The .count() is self-explanatory, takes the amount of words left after filtering.
The third and last %d is the simplest. It is replaced by the length of the string. content.length() should be self-explanatory.
I left your catch block intact, but I feel like the System.exit(0) is a bit redundant.

Related

My Java program for removing special characters and white space from user input of name fails when I test it with spaces

I'm a student and am working on an assignment that has us trying a few techniques we haven't covered. Removing special characters and spaces from user input, I figured out how to remove the special characters but when I insert a space in my name when I try my program, my name is returned to me missing whatever letters were after the space. Like, "Pat ricia" becomes "Pat". I had to go with a different technique for removing special characters because "replaceAll" didn't work for me there, and it isn't working for me with removing white space. Can I have another set of eyes to tell me what I'm missing?
package edu.gmc.Course_Project;
import java.util.Scanner;
public class Input_Name {
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
//asking for user input
System.out.print("enter your name: ");
String name = scan.next();
//Java won't do all this at once (2 step process) so I'm converting the whole name to lower case first
name=name.toLowerCase();
//now to get the first character
char first=name.charAt(0);
//and changing it to upper case
first=Character.toUpperCase(first);
String newName="";
newName+=first;
//I'm not really sure about this part, but it has to do with the length...without it, my name was returned to me as "P"
for(int i=1;i<name.length();i++)
{
//this part ignores anything other than A-Z or a-z...I had a different method earlier but I couldn't get it to work properly
char ch=name.charAt(i);
if(ch>='a' && ch<='z')
{
newName+=ch;
}
}
//removing white spaces
newName = newName.replaceAll("\\s+", "");
//the program is complete, now what does it do?
System.out.println("your name is "+newName);
}
}
The reason why your program was not working was because you was using String name = scan.next(); at the very start of your program, this command will store the next string inputted, and because there is a space it is simply storing the first string before the space, instead you need to use String name = scan.nextLine(); as it will store the entire line including white space and additional strings,

Java: Parse lines from files

I am in need of some ideas. I have a file with some information like this:
AAA222BBB%
CC333DDDD%
EEEE444FF%
The '%' sign is like an indicator of "end of line"
I would like to read every line, and then parse it to fit a certain format (4 letters, 3 digits and 4 letters again) - So if it looks like the above, it should insert a special sign or whitespace to fill, like this:
AAA-222BBB-%
CC--333DDDD%
EEEE444FF--%
My first and immediate idea was to read every line as a string. And then some huge if-statement saying something like
For each line:
{
if (first symbol !abc...xyz) {
insert -
}
if (second symbol !abc...xyz) {
insert -
}
}
However, I am sure there must be a more elegant and effective way, do to a real parsing of the text, but I'm not sure how. So if anyone has a good idea please enlighten me :-)
Best
My first advice is to read this other post (nice explanation of scanner and regex):
How do I use a delimiter in Java Scanner?
Then my solution (sure it is the the cleverest, but it should work)
For each line:
{
Scanner scan = new Scanner(line);
scan.useDelimiter(Pattern.compile("[0-9]"));
String first = scan.next();
scan.useDelimiter(Pattern.compile("[A-Z]"));
String second = scan.next();
scan.useDelimiter(Pattern.compile("[0-9]"));
String third = scan.next();
scan.close();
System.out.println(addMissingNumber(first, 4) + addMissingNumber(second, 3) + addMissingNumber(third, 4));
}
//For each missing char add "-"
private static String addMissingNumber(String word, int size) {
while (word.length() < size) {
word = word.concat("-");
}
return word;
}
The output: "AAA-222BBB-"

Java Scanner - Ignore Subsequent Letters

My program needs to accept integer numbers, individual characters, or one specific string (I'll use "pear" for this example). Whilst each of these can be separated by whitespace, there shouldn't be any need to.
Currently, my parsing code, which relies on a Scanner, looks something like this:
Scanner scanner = new Scanner(System.in);
while (scanner.hasNext()) {
if (scanner.hasNext("\\s+")) {
// Ignore whitespace…
} else if (scanner.hasNext("[-]?\\d+")) {
// Get a number
String nextNumberString = scanner.next("[-]?\\d+");
// Process the string representing the number…
} else if (scanner.hasNext("pear")) {
scanner.next("pear");
// Do something special…
} else {
// Get the next character
Pattern oldDelimiter = scanner.delimiter();
scanner.useDelimiter("");
String nextCharAsString = scanner.next();
scanner.useDelimiter(oldDelimiter);
char nextCharacter = nextCharAsString.charAt(0);
if (Character.isWhitespace(nextCharacter)) {
// Ignore whitespace…
} else {
// Process character…
}
}
}
At present, my program will accept input like 123 d 456 r pear without any problems. However, it should also accept the same input without any whitespace (123d456rpear), and interpret it the same way, and with my current code, the individual digits are incorrectly interpreted as characters.
I feel like the cause might be the regular expressions that I'm using. However, adding .* to the end of them will cause all of the subsequent characters to be parsed, along with the input that I'm trying to parse. For example, [-]?\d+.* will try to parse the entirety of 123d456rpear as a number, when I really just want 123, leaving the rest to be parsed later. I've also tried wrapping my desired input into a group, and then appending ? or {1}, which hasn't worked, either.
I've also experimented with scanner.findInLine(), but in my testing, this doesn't seem to work either. For example, when I tried this, pearpear would cause an infinite loop, despite my attempts to skip the first instance of pear.
I've also tried setting the delimiter to "", like I do when extracting individual characters (which, in that case, works as expected). However, this causes each individual number to be processed individually, parsing 1, 2, and 3 instead of 123. pear also gets interpreted as individual characters.
So, could someone help me figure out where I'm going wrong? Does this issue lie with my regular expressions? Am I using the wrong methods? Or am I misunderstanding how the Scanner class is designed to work?
To my understanding the idea of the Scanner class is to extract tokens and to throw the delimiters away. But you don't want to throw anything away but whitespaces. However whitespaces are not required in your input. Here is an implementation idea by using an outer and an inner Scanner. The outer tokenizes at whitespaces - if any. The inner uses findInLine() to bypass delimiters at all.
findInLine
Attempts to find the next occurrence of a pattern constructed from the
specified string, ignoring delimiters.
public void scan(Scanner scanner) {
while (scanner.hasNext()) {
String next = scanner.next();
System.out.println("opening inner scanner: " + next);
Scanner innerScanner = new Scanner(next);
do {
next = innerScanner.findInLine("([-]?\\d+)|(pear)|([a-zA-Z])");
if (next == null) {
// Nothing useful in there
} else if (next.equals("pear")) {
System.out.println("pear");
} else if (next.matches("[a-zA-Z]")) {
System.out.println("char: " + next);
} else {
System.out.println("number: " + next);
}
} while (next != null);
innerScanner.close();
}
}
public void run() {
scan(new Scanner("123 d 456 pear"));
scan(new Scanner("123d456pear"));
}
The output of the run() method is as follows:
opening inner scanner: 123
number: 123
opening inner scanner: d
char: d
opening inner scanner: 456
number: 456
opening inner scanner: pear
pear
opening inner scanner: 123d456pear
number: 123
char: d
number: 456
pear
Well the individual digits are incorrectly interpreted as characters because hasNext method of Scanner extracts the token from the given by the delimiter which defaults to whitespace
From java docs
A Scanner breaks its input into tokens using a delimiter pattern,
which by default matches whitespace. The resulting tokens may then be
converted into values of different types using the various next
methods
Hence the whole 123d456rpear is extracted which is not a number but a string

How can i split the user's input with tokens?

i need your advise in order to do something ... i want to take the user's input line as i already do in my program with scanner ... but i want to split each command(word) to tokens .... i do not know how to do that , till now i was playing with substring but if the users for example press twice the space-bar button everything is wrong !!!!!
For example :
Please insert a command : I am 20 years old
with substring or .split(" ") , it runs but think about having :
Please insert a command : I am 20 years old
That is why i need your advice .... The question is how can i split the user's input with tokens.
Well, you need to normalize you string line before splitting it to tokens. A simplest way is to remove repeated whitespace characters:
line = line.replaceAll("\\s+", " ");
(this will also replace all tabs to a single " ").
Use the StringTokenizer class. From the API :
"[It] allows an application to break a string into tokens... A
StringTokenizer object internally maintains a current position within
the string to be tokenized."
The following sample code from the API will give you an idea:
StringTokenizer st = new StringTokenizer("this is a test");
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}
It produces the following result:
this
is
a
test
Okay, so the code that I have made is going to ask you for to insert a command, it is then going to take your command and then split it around any number of spaces (due to the Regex). It then saves these words, numbers, speech marks etc. as tokens. You can then manipulate each word in the for loop.
import java.util.Scanner;
public class CommandReaderProgram{
public static void main(String[] args){
Scanner userInput = new Scanner(System.in);
System.out.println("Please insert a command:");
String temp = userInput.nextLine();
while(temp != null){
String [] tokens = temp.split("\\s+");
for(String word : tokens){
System.out.print(word);
System.out.print(" ");
}
break;
}
System.out.print("\n");
}
}
To test that each word, number and speech mark has actually been saved as a token, change the character in the System.out.print(" "); code to anything. e.g System.out.print(""); or
System.out.print("abcdefg"); and it will put this data between each token, to prove that the tokens are indeed separate.
Unfortunately I am unable to call the token array outside of the for loop at the moment, but will let you know when I figure it out.
I'd like to hear what type of program you are trying to make as I think we are both trying to make something very similar.
Hope this is what you are looking for.
Regards.

How do I change the delimiter from a text file?

Let's say I got a textfile.txt that I want to read from. This is the text in the file:
23:years:old
15:years:young
Using the useDelimiter method, how can I tell my program that : and newlines are delimiters? Putting the text in one line and using useDelimter(":"); works. The problem is when I got several lines of text.
Scanner input = new Scanner(new File("textfile.txt));
input.useDelimiter(:);
while(data.hasNextLine()) {
int age = input.nextInt();
String something = input.next();
String somethingelse = input.next();
}
Using this code I will get an inputMisMatch error.
Try
scanner.useDelimiter("[:]+");
The complete code is
Scanner scanner = new Scanner(new File("C:/temp/text.txt"));
scanner.useDelimiter("[:]+");
while (scanner.hasNext()) {
System.out.println(scanner.next());
}
The output is
23
years
old
15
years
young
Use this code
Scanner input;
String tokenizer[];
try {
input = new Scanner(new File("D:\\textfile.txt"));
input.useDelimiter("\\n");
while(input.hasNextLine()) {
tokenizer = input.next().split(":");
System.out.println(tokenizer[0]+" |"+tokenizer[1]+" | "+tokenizer[2]);
}
}catch(Exception e){}
It will give you output like
23 |years | old
15 |years | young
You have two ways to do this:
Concatenate the string to make it one line.
delimit "newline" first, then delimit ":" each return string token.
If all you want is to get everything split up all at once then I guess you can use
useDelimiter(":\\n")
That should split on both : and newspace but it is not the most efficient way of processing data, especially if each line of text is set out in the same format and represents a complete entry. If that is the case then my suggestion would be to only split on a new line to begin with, like this;
s.useDelimiter("\\n");
while(s.hasNext()){
String[] result = s.next.split(":");
//do whatever you need to with the data and store it somewhere
}
This will allow you to process the data line by line and will also split it at the required places. However if you do plan on going through line by line I recommend you look at BufferedReader as it has a readLine() function that makes things a lot easier.
As long as all the lines have all three fields you can just use input.useDelimiter(":\n");
you probably wants to create a delimiter pattern which includes both ':' and newline
I didn't test it, but [\s|:]+ is a regular expression that matches one or more whitespace characters, and also ':'.
Try put:
input.useDelimiter("[\\s|:]+");

Categories