Why is my word counter sometimes off by one? - java

Most of the time it works correctly. Rarely it counts off by one. Any guess?
public static int countWords(File file) throws FileNotFoundException, IOException{
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
List<String> strList = new ArrayList<>();
while ((line=br.readLine())!=null){
String[] strArray= line.split("\\s+");
for (int i=0; i<strArray.length;i++){
strList.add(strArray[i]);
}
}
return strList.size();
}
Particularly in the example below it gives 3 instead of 2:
\n
k

I guess the second line is split into two string, "" and "k". See the code below:
import java.util.Arrays;
public static void main(String[] args) {
String str = " k";
String[] array = str.split("\\\s+");
System.out.println("length of array is " + array.length); // length is 2
System.out.println(Arrays.toString(array)); //array is [, k]
}

If you are using Java 8, you can use Streams and filter what you consider as a "word". For example:
List<String> l = Files.lines(Paths.get("files/input.txt")) // Read all lines of your input text
.flatMap(s->Stream.of(s.split("\\s+"))) // Split each line by white spaces
.filter(s->s.matches("\\w")) // Keep only the "words" (you can change here as you want)
.collect(Collectors.toList()); // Put the stream in a List
In this specific case, it will output [k].
You can of course do the same in Java 7 by adapting your code and add this condition in your for loop:
if(strArray[i].matches("\\w"))
strList.add(strArray[i]); // Keep only the "words" - again, use your own criteria
It is just more cumbersome.
I hope it helps.

Related

searching for words in a text file in java

I am trying to search for words within a text file and replace all upper-cased with lower-cased characters. The problem is that when I use the replace All function using a regular expression I get a syntax error. I have tried different tactics, but it doesn't work. Any tips? I think that maybe I should create a replace All method that I would have to invoke, but I don't really see its use.
public static void main() throws FileNotFoundException {
ArrayList<String> inputContents = new ArrayList<>();
Scanner inFile =
new Scanner(new FileReader("H:\\csc8001\\data.txt"));
while(inFile.hasNextLine())
{
String line = inFile.nextLine();
inputContents.add(inFile.nextLine());
}
inFile.close();
ArrayList<String> dictionary = new ArrayList<>();
for(int i= 0; i <inputContents.size(); i++)
{
String newLine = inFile.nextLine();
newLine = newLine(i).replaceAll("[^A-Za-z0-9]");
dictionary.add(inFile.nextLine());
}
// PrintWriter outFile =
// new PrintWriter("H:\\csc8001\\results.txt");
}
There is a compilation error on this line:
newLine = newLine(i).replaceAll("[^A-Za-z0-9]");
Because replaceAll takes 2 parameters: a regex and a replacement.
(And because newLine(i) is non-sense.)
This should be closer to what you need:
newLine = newLine.replaceAll("[^A-Za-z0-9]+", " ");
That is, replace non-empty sequences of non-[A-Za-z0-9] characters with a space.
To convert all uppercase letters to lowercase, it's simpler and better to use toLowerCase.
There are many other issues in your code too. For example, some lines in the input will be skipped, due to some inappropriate inFile.nextLine calls. Also, the input file is closed after the first loop, but the second tries to use it, which makes no sense.
With these and a few other issues cleaned up, this should be closer to what you want:
Scanner inFile = new Scanner(new FileReader("H:\\csc8001\\data.txt"));
List<String> inputContents = new ArrayList<>();
while (inFile.hasNextLine()) {
inputContents.add(inFile.nextLine());
}
inFile.close();
List<String> dictionary = new ArrayList<>();
for (String line : inputContents) {
dictionary.add(line.replaceAll("[^A-Za-z0-9]+", " ").toLowerCase());
}
If you want to add words to the dictionary instead of lines, you also need to split the lines on spaces. One simple way to achieve that:
dictionary.addAll(Arrays.asList(line.replaceAll("[^A-Za-z0-9]+", " ").toLowerCase().split(" ")));

Comma-Delimited String of Integers

This is the original prompt:
I need to write a program that gets a comma-delimited String of integers (e.g. “4,8,16,32,…”) from the user at the command line and then converts the String to an ArrayList of Integers (using the wrapper class) with each element containing one of the input integers in sequence. Finally, use a for loop to output the integers to the command line, each on a separate line.
This is the code that I have so far:
import java.util.Scanner;
import java.util.ArrayList;
public class Parser {
public static void main(String[] args) {
Scanner scnr = new Scanner(System.in);
ArrayList<String> myInts = new ArrayList<String>();
String integers = "";
System.out.print("Enter a list of delimited integers: ");
integers = scnr.nextLine();
for (int i = 0; i < myInts.size(); i++) {
integers = myInts.get(i);
myInts.add(integers);
System.out.println(myInts);
}
System.out.println(integers);
}
}
I am confused on where to go with the rest of this program. If someone could help explain to me what I need to do, that would be much appreciated!
As Matthew and Marc pointed out you have to first split the string into tokens and then parse each token to transform them into Integers.
You could try it with something like this:
String stringOfInts = "1,2,3,4,5";
List<Integer> integers = new ArrayList<>();
String[] splittedStringOfInts = stringOfInts.split(",");
for(String strInt : splittedStringOfInts) {
integers.add(Integer.parseInt(strInt));
}
// do something with integers
In the split() method you define how to split the string into tokens. In your case it's simply the comma (,) sign.
Hope this helps.
Regards Patrick

String Arrays not returning expected value

I'm very new to Java but this has had me stumped for the last half an hour or so. I'm reading in lines from a text file and storing them as String Arrays. From here I'm trying to use the values from within the arrays to be used to initialise another class I have. To initialise my Route class (hence using routeName) I need to take the first value from the array and pass it as a string. When I try to return s[0] for routeName, I'm given the last line of from my text file. Any ideas on how to fix this would be greatly appreciated. I'm in the process of testing still so thats why my code is barely finished.
My text file is as follows.
66
Uq Lakes, Southbank
1,2,3,4,5
2,3,4,5,6
and my code:
import java.io.*;
import java.util.*;
public class Scan {
public static void main(String args[]) throws IOException {
String routeName = "";
String stationName = " ";
Scanner timetable = new Scanner(new File("fileName.txt"));
while (timetable.hasNextLine()) {
String[] s = timetable.nextLine().split("\n");
routeName = s[0];
}
System.out.println(routeName);
}
}
The method you are calling timetable.nextLine.split("\n") will return the Array of String.
So every time when you call this method is overwrites your array with new line in file and as the last line is added finally in your array you are getting the lat line at the end.
below is the code you can use.
public static void main(String[] args) throws FileNotFoundException {
String routeName = "";
Scanner timetable;
int count = 0;
String[] s = new String[10];
timetable = new Scanner(new File("fileName.txt"));
while (timetable.hasNextLine()) {
String line = timetable.nextLine();
s[count++] = line;
}
routeName = s[0];
System.out.println(routeName);
}
Scanner.nextLine() returns a single line so splitting by '\n' will always give a single element array, e.g.:
timetable.nextLine().split("\n"); // e.g., "1,2,3,4,5" => ["1,2,3,4,5"]
Try splitting by the ',' instead, e.g.:
timetable.nextLine().split(","); // e.g., "1,2,3,4,5" => ["1", "2", "3", "4", "5"]
NOTE: If you are intending for the array to contain individual lines, then check out this SO post.
Scanner s = new Scanner(new File(filename));
List<String> lines = new ArrayList<String>(); // A List can be dynamically resized
while(s.hasNextLine()) lines.add(s.nextLine()); // Store each line in the list
String[] arr = lines.toArray(new String[0]); // If you really need an Array, use this
Your while loop itterates over all lines and sets the current line to the routeName. Thats why you habe the last line in you string. What you could do is calling a break, when you habe read the first line oft the file. Then you will have the first line.

split the text only 5 times

i want to split the string ...i want spit to happen for just first 5 times then ...rest in 1 string
i tried this
public class FileRead
{
public static void main(String[] args) throws IOException
{
StringBuffer strBuff = new StringBuffer();
String str = null;
File file = new File("D:\\wokies\\5_dataset.txt");
try {
Scanner scanner = new Scanner(file);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
System.out.println(line);
String [] splitSt =line.split(" ");
System.out.println("split happing");
for (int i = 0 ; i < splitSt.length ; i++)
{
System.out.println(splitSt[i]);
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}
I won't write code for you, but the solution is very simple, one solution is to have a counter, initialize it to 0, increment it on each iteration. When it'll be 5, don't split1.
1 I assume you want to split each time on a new input and not the same one.
java.lang.String.split(String regex, int limit) accepts a limit: How often do you want to split the input?
If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter.
So
line.split(" ", 5);
would solve your problem.
You can use split with limit split(regex, limit). Try maybe split(" ", 5), this will create array with max 5 elements and last element will not be split like "a b c d".split(" ", 3) will create ["a", "b", "c d"]
The split method of String can take a second parameter "limit" that "controls the number of times the pattern is applied and therefore affects the length of the resulting array".
Just call line.split(" ", 5);

useDelimiter, read up till first delimiter and then change line

I'm trying to use a Delimiter to pull out the first numbers in a document with 31 rows looking something like "105878-798##176000##JDOE" and put it in an int array.
The numbers I'm interesed in are "105878798", and the number of numbers is not consistent.
I wrote this but can't figure out how to change the line when i reach the first delimiter (of the line).
import java.util.*;
import java.io.*;
public class Test {
public static void main(String[] args) throws Exception {
int n = 0;
String rad;
File fil = new File("accounts.txt");
int[] accountNr = new int[31];
Scanner sc = new Scanner(fil).useDelimiter("##");
while (sc.hasNextLine()) {
rad = sc.nextLine();
rad.replaceAll("-","");
accountNr[n] = Integer.parseInt(rad);
System.out.println(accountNr[n]);
n++;
System.out.println(rad);
}
}
}
Don't use the scanner for this, use the StringTokenizer and set the delimiter to ##, then just keep calling .nextElement() and you will get the next number no matter how long it is.
StringTokenizer st2 = new StringTokenizer(str, "##");
while (st2.hasMoreElements()) {
log.info(st2.nextElement());
}
(Of course, you can iterate in different ways..)
I would suggest for each line use line.split("[#][#]")[0] (of course haldle your exceptions).
also, rad.replaceAll(...) returns a new String, because String is an imutable object. you should execute parseInt on the returned String and not on rad.
just use the following instead of the equivalent 2 lines in your code:
String newRad = rad.replaceAll("-","");
accountNr[n] = Integer.parseInt(newRad);

Categories