Java: Parse lines from files - java

I am in need of some ideas. I have a file with some information like this:
AAA222BBB%
CC333DDDD%
EEEE444FF%
The '%' sign is like an indicator of "end of line"
I would like to read every line, and then parse it to fit a certain format (4 letters, 3 digits and 4 letters again) - So if it looks like the above, it should insert a special sign or whitespace to fill, like this:
AAA-222BBB-%
CC--333DDDD%
EEEE444FF--%
My first and immediate idea was to read every line as a string. And then some huge if-statement saying something like
For each line:
{
if (first symbol !abc...xyz) {
insert -
}
if (second symbol !abc...xyz) {
insert -
}
}
However, I am sure there must be a more elegant and effective way, do to a real parsing of the text, but I'm not sure how. So if anyone has a good idea please enlighten me :-)
Best

My first advice is to read this other post (nice explanation of scanner and regex):
How do I use a delimiter in Java Scanner?
Then my solution (sure it is the the cleverest, but it should work)
For each line:
{
Scanner scan = new Scanner(line);
scan.useDelimiter(Pattern.compile("[0-9]"));
String first = scan.next();
scan.useDelimiter(Pattern.compile("[A-Z]"));
String second = scan.next();
scan.useDelimiter(Pattern.compile("[0-9]"));
String third = scan.next();
scan.close();
System.out.println(addMissingNumber(first, 4) + addMissingNumber(second, 3) + addMissingNumber(third, 4));
}
//For each missing char add "-"
private static String addMissingNumber(String word, int size) {
while (word.length() < size) {
word = word.concat("-");
}
return word;
}
The output: "AAA-222BBB-"

Related

Java - Counting words, lines, and characters from a file

I'm trying to read in words from a file. I need to count the words, lines, and characters in the text file. The word count should only include words (containing only alphabetic letters, no punctuation, spaces, or non-alphabetic characters). The character count should only include the characters inside those words.
This is what I have so far. I'm unsure of how to count the characters. Every time I run the program, it jumps to the catch mechanism as soon as I enter the file name (and it should have no issues with the file path, as I've tried using it before). I tried to create the program without the try/catch to see what the error was, but it wouldn't work without it.
Why is it jumping to the catch function when I enter the file name? How can I fix this program to properly count words, lines, and characters in the text file?
I don't get any exception with your code if I give a proper file name. As for reading the number of character, you should modify the logic a little bit. Instead of directly concatenating the number of words count, you should create a new instance of StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+"); and iterate through all the token and sum the length of each token. This should give you the number of characters. Something like below
while (fileScan.hasNextLine()) {
lineC++;
tempo = fileScan.nextLine();
StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+");
wordC += st.countTokens();
while(st.hasMoreTokens()) {
String stt = st.nextToken();
System.out.println(stt); // Displaying string to confirm that like is splitted as I expect it to be
charC += stt.length();
}
System.out.println("Lines: " + lineC + "\nWords: " + wordC+" \nChars: "+charC);
}
Note: Escaping character with StringTokenizer will not work. i.e. you would expect that \\s should delimit with any whitespace character but it will instead delimit based on literal character s. If you want to escape a character, I suggest you to use java.util.Pattern and java.util.Matcher and use it matcher.find() to idenfity words and characters
I tried your code but I didn't receive any exception here. However, I suspect that when you input the file name, maybe you forgot the extension of the file.
You probably forgot the file extension while giving input, but there is a much simpler way of doing this. You also mention you don't know how to count the characters. You can try something like this:
import java.util.Scanner;
import java.util.StringTokenizer;
import java.io.*;
import java.util.stream.*;
public class WordCount
{
public static void main(String[] args)
{
Scanner userInput = new Scanner(System.in);
try {
// Input file
System.out.println("Please enter the name of the file.");
String content = Files.readString(Path.of("C:/Users/garre/OneDrive/Desktop/" + userInput.next()));
System.out.printf("Lines: %d\nWords: %d\nCharacters: %d",content.split("\n").length,Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count(),content.length());
}
catch (IOException ex1) {
System.out.println("Error.");
System.exit(0);
}
}
}
Going through the code
import java.util.stream.*;
Note we use the streams package, for filtering out empty strings while finding words. Now let's skip forward a bit.
String content = Files.readString(Path.of("C:/Users/garre/OneDrive/Desktop/" + userInput.next()));
The above part gets all of the text in the file and stores it as a string.
System.out.printf("Lines: %d\nWords: %d\nCharacters: %d",content.split("\n").length,Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count(),content.length());
Okay, this is a long line. Let's break it down.
"Lines: %d\nWords: %d\nCharacters: %d" is a format string, where each %d is replaced with the corresponding argument in the printf function. The first %d will be replaced by content.split("\n").length, which is the number of lines. We get the number of lines by splitting the string.
The second %d is replaced by Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count(). Stream.of creates a stream from an array, and the array is an array of strings after you split on anything that is non-alphabetic (you said words are anything that are non-alphabetic). Next, we filter all the empty values out, since String.split keeps in empty values. The .count() is self-explanatory, takes the amount of words left after filtering.
The third and last %d is the simplest. It is replaced by the length of the string. content.length() should be self-explanatory.
I left your catch block intact, but I feel like the System.exit(0) is a bit redundant.

Is there a function in Java that allows you to transform a string to an Int considering the string has chars you want to ignore?

I'm doing a project for a Uni course where I need to read an input of an int followed by a '+' in the form of (for example) "2+".
However when using nextInt() it throws an InputMismatchException
What are the workarounds for this as I only want to store the int, but the "user", inputs an int followed by the char '+'?
I've already tried a lot of stuff including parseInt and valueOf but none seemed to work.
Should I just do it manually and analyze char by char?
Thanks in advance for your help.
Edit: just to clear it up. All the user will input is and Int followed by a + after. The theme of the project is to do something in the theme of a Netflix program. This parameter will be used as the age rating for a movie. However, I don't want to store the entire string in the movie as it would make things harder to check if a user is eligible or not to watch a certain movie.
UPDATE: Managed to make the substring into parseInt to work
String x = in.nextLine();
x = x.substring(0, x.length()-1);
int i = Integer.parseInt(x);
Thanks for your help :)
Try out Scanner#useDelimiter():
try(Scanner sc=new Scanner(System.in)){
sc.useDelimiter("\\D"); /* use non-digit as separator */
while(sc.hasNextInt()){
System.out.println(sc.nextInt());
}
}
Input: 2+33-599
Output:
2
33
599
OR with your current code x = x.substring(0, x.length()-1); to make it more precise try instead: x = x.replaceAll("\\D","");
Yes you should manually do it. The methods that are there will throw a parse exception. Also do you want to remove all non digit characters or just plus signs? For example if someone inputs "2 plus 5 equals 7" do you want to get 257 or throw an error? You should define strict rules.
You can do something like: Integer.parseInt(stringValue.replaceAll("[^\d]","")); to remove all characters that are no digits.
Hard way is the only way!
from my Git repo line 290.
Also useful Javadoc RegEx
It takes in an input String and extracts all numbers from it then you tokenize the string with .replaceAll() and read the tokens.
int inputLimit = 1;
Scanner scan = new Scanner(System.in);
try{
userInput = scan.nextLine();
tokens = userInput.replaceAll("[^0-9]", "");
//get integers from String input
if(!tokens.equals("")){
for(int i = 0; i < tokens.length() && i < inputLimit; ++i){
String token = "" + tokens.charAt(i);
int index = Integer.parseInt(token);
if(0 == index){
return;
}
cardIndexes.add(index);
}
}else{
System.out.println("Please enter integers 0 to 9.");
System.out.print(">");
}
Possible solutions already have been given, Here is one more.
Scanner sc = new Scanner(System.in);
String numberWithPlusSign = sc.next();
String onlyNumber = numberWithPlusSign.substring(0, numberWithPlusSign.indexOf('+'));
int number = Integer.parseInt(onlyNumber);

Java Scanner - Ignore Subsequent Letters

My program needs to accept integer numbers, individual characters, or one specific string (I'll use "pear" for this example). Whilst each of these can be separated by whitespace, there shouldn't be any need to.
Currently, my parsing code, which relies on a Scanner, looks something like this:
Scanner scanner = new Scanner(System.in);
while (scanner.hasNext()) {
if (scanner.hasNext("\\s+")) {
// Ignore whitespace…
} else if (scanner.hasNext("[-]?\\d+")) {
// Get a number
String nextNumberString = scanner.next("[-]?\\d+");
// Process the string representing the number…
} else if (scanner.hasNext("pear")) {
scanner.next("pear");
// Do something special…
} else {
// Get the next character
Pattern oldDelimiter = scanner.delimiter();
scanner.useDelimiter("");
String nextCharAsString = scanner.next();
scanner.useDelimiter(oldDelimiter);
char nextCharacter = nextCharAsString.charAt(0);
if (Character.isWhitespace(nextCharacter)) {
// Ignore whitespace…
} else {
// Process character…
}
}
}
At present, my program will accept input like 123 d 456 r pear without any problems. However, it should also accept the same input without any whitespace (123d456rpear), and interpret it the same way, and with my current code, the individual digits are incorrectly interpreted as characters.
I feel like the cause might be the regular expressions that I'm using. However, adding .* to the end of them will cause all of the subsequent characters to be parsed, along with the input that I'm trying to parse. For example, [-]?\d+.* will try to parse the entirety of 123d456rpear as a number, when I really just want 123, leaving the rest to be parsed later. I've also tried wrapping my desired input into a group, and then appending ? or {1}, which hasn't worked, either.
I've also experimented with scanner.findInLine(), but in my testing, this doesn't seem to work either. For example, when I tried this, pearpear would cause an infinite loop, despite my attempts to skip the first instance of pear.
I've also tried setting the delimiter to "", like I do when extracting individual characters (which, in that case, works as expected). However, this causes each individual number to be processed individually, parsing 1, 2, and 3 instead of 123. pear also gets interpreted as individual characters.
So, could someone help me figure out where I'm going wrong? Does this issue lie with my regular expressions? Am I using the wrong methods? Or am I misunderstanding how the Scanner class is designed to work?
To my understanding the idea of the Scanner class is to extract tokens and to throw the delimiters away. But you don't want to throw anything away but whitespaces. However whitespaces are not required in your input. Here is an implementation idea by using an outer and an inner Scanner. The outer tokenizes at whitespaces - if any. The inner uses findInLine() to bypass delimiters at all.
findInLine
Attempts to find the next occurrence of a pattern constructed from the
specified string, ignoring delimiters.
public void scan(Scanner scanner) {
while (scanner.hasNext()) {
String next = scanner.next();
System.out.println("opening inner scanner: " + next);
Scanner innerScanner = new Scanner(next);
do {
next = innerScanner.findInLine("([-]?\\d+)|(pear)|([a-zA-Z])");
if (next == null) {
// Nothing useful in there
} else if (next.equals("pear")) {
System.out.println("pear");
} else if (next.matches("[a-zA-Z]")) {
System.out.println("char: " + next);
} else {
System.out.println("number: " + next);
}
} while (next != null);
innerScanner.close();
}
}
public void run() {
scan(new Scanner("123 d 456 pear"));
scan(new Scanner("123d456pear"));
}
The output of the run() method is as follows:
opening inner scanner: 123
number: 123
opening inner scanner: d
char: d
opening inner scanner: 456
number: 456
opening inner scanner: pear
pear
opening inner scanner: 123d456pear
number: 123
char: d
number: 456
pear
Well the individual digits are incorrectly interpreted as characters because hasNext method of Scanner extracts the token from the given by the delimiter which defaults to whitespace
From java docs
A Scanner breaks its input into tokens using a delimiter pattern,
which by default matches whitespace. The resulting tokens may then be
converted into values of different types using the various next
methods
Hence the whole 123d456rpear is extracted which is not a number but a string

Count words in a String that is NOT in a string array without using split method

I need to count the words in a String. For many of you that seems pretty simple but from what I've read in similar questions people are saying to use arrays but I'd rather not. It complicates my program more than it helps as my string is coming from an input file and the program cannot be hardwired to a specific file.
I have this so far:
while(input.hasNext())
{
String sentences = input.nextLine();
int countWords;
char c = " ";
for (countWords = 0; countWords < sentences.length(); countWords++)
{
if (input.hasNext(c))
countWords++;
}
System.out.println(sentences);
System.out.println(countWords);
}
The problem is that what I have here ends up counting the amount of characters in the string. I thought it would count char c as a delimiter. I've also tried using String c instead with input.hasNext but the compiler tells me:
Program04.java:39: incompatible types
found : java.lang.String[]
required: java.lang.String
String token = sentences.split(delim);
I've since deleted the .split method from the program.
How do I delimit (is that the right word?) without using a String array with a scanned in file?
Don't use the Scanner (input) for more than one thing. You're using it to read lines from a file, and also trying to use it to count words in those lines. Use a second Scanner to process the line itself, or use a different method.
The problem is that the scanner consumes its buffer as it reads it. input.nextLine() returns sentences, but after that it no longer has them. Calling input.hasNext() on it gives you information about the characters after sentences.
The simplest way to count the words in sentences is to do:
int wordCount = sentences.split(" ").length;
Using Scanner, you can do:
Scanner scanner = new Scanner(sentences);
while(scanner.hasNext())
{
scanner.next();
wordCount++;
}
Or use a for loop for best performance (as mentioned by BlackPanther).
Another tip I'd give you is how to better name your variables. countWords should be wordCount. "Count words" is a command, a verb, while a variable should be a noun. sentences should simply be line, unless you know both that the line is composed of sentences and that this fact is relevant to the rest of your code.
Maybe, this is what you are looking for.
while(input.hasNext())
{
String sentences = input.nextLine();
System.out.println ("count : " + line.split (" ").length);
}
what you are trying to achieve is not quite clear. but if you are trying to count the number of words in your text file then try this
int countWords = 0;
while(input.hasNext())
{
String sentences = input.nextLine();
for(int i = 0; i< sentences.length()-1;i++ ) {
if(sentences.charAt(i) == " ") {
countWords++;
}
}
}
System.out.println(countWords);

How can i split the user's input with tokens?

i need your advise in order to do something ... i want to take the user's input line as i already do in my program with scanner ... but i want to split each command(word) to tokens .... i do not know how to do that , till now i was playing with substring but if the users for example press twice the space-bar button everything is wrong !!!!!
For example :
Please insert a command : I am 20 years old
with substring or .split(" ") , it runs but think about having :
Please insert a command : I am 20 years old
That is why i need your advice .... The question is how can i split the user's input with tokens.
Well, you need to normalize you string line before splitting it to tokens. A simplest way is to remove repeated whitespace characters:
line = line.replaceAll("\\s+", " ");
(this will also replace all tabs to a single " ").
Use the StringTokenizer class. From the API :
"[It] allows an application to break a string into tokens... A
StringTokenizer object internally maintains a current position within
the string to be tokenized."
The following sample code from the API will give you an idea:
StringTokenizer st = new StringTokenizer("this is a test");
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}
It produces the following result:
this
is
a
test
Okay, so the code that I have made is going to ask you for to insert a command, it is then going to take your command and then split it around any number of spaces (due to the Regex). It then saves these words, numbers, speech marks etc. as tokens. You can then manipulate each word in the for loop.
import java.util.Scanner;
public class CommandReaderProgram{
public static void main(String[] args){
Scanner userInput = new Scanner(System.in);
System.out.println("Please insert a command:");
String temp = userInput.nextLine();
while(temp != null){
String [] tokens = temp.split("\\s+");
for(String word : tokens){
System.out.print(word);
System.out.print(" ");
}
break;
}
System.out.print("\n");
}
}
To test that each word, number and speech mark has actually been saved as a token, change the character in the System.out.print(" "); code to anything. e.g System.out.print(""); or
System.out.print("abcdefg"); and it will put this data between each token, to prove that the tokens are indeed separate.
Unfortunately I am unable to call the token array outside of the for loop at the moment, but will let you know when I figure it out.
I'd like to hear what type of program you are trying to make as I think we are both trying to make something very similar.
Hope this is what you are looking for.
Regards.

Categories