BufferedReader ( scanner ) - java

This is what i have for now. I want to know, how many times i have some word in .txt document . Now i am trying to use BufferedReader didn't manage well enough. I guess here is a easier way to solve this, but i don't know.
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
public class TekstiAnalüsaator {
public static void main(String[] args) throws Exception {
InputStream baidid = new FileInputStream("test.txt");
InputStreamReader tekst = new InputStreamReader(baidid, "UTF-8");
BufferedReader puhverdab = new BufferedReader(tekst);
String rida = puhverdab.readLine();
while (rida != null){
System.out.println("Reading: " + rida);
rida = puhverdab.readLine();
}
puhverdab.close();
}
}
I want to search words using this structure. What file, then what word i need to find, (return) how many times, this word is in the file.
TekstiAnalüsaator analüsaator = new TekstiAnalüsaator("kiri.txt");
int esinemisteArv = analüsaator.sõneEsinemisteArv("kala");

Please see the code example below. This should solve the issue you are facing.
import java.io.*;
public class CountWords {
public static void main(String args[]) throws IOException {
System.out.println(count("Test.java", "static"));
}
public static int count(String filename, String wordToSearch) throws IOException {
int tokencount = 0;
FileReader fr = new FileReader(filename);
BufferedReader br = new BufferedReader(fr);
String s;
int linecount = 0;
String line;
while ((s = br.readLine()) != null) {
if (s.contains(wordToSearch))
tokencount++;
// System.out.println(s);
}
return tokencount;
}
}

It is a bit of a tricky question because counting words in a string is not so simple task. Your approach is fine for reading the file line by line so now the problem is how to count the word matches.
For example you can do the simple check for matches like that:
public static int getCountOFWordsInLine(String line, String test){
int count=0;
int index=0;
while(line.indexOf(test,index ) != -1) {
count++;
index=line.indexOf(test,index)+1;
}
return count;
}
The problem with that approach is that if your word is "test" and your string is "Next word matches asdfatestsdf" it will count it as a match. So you can try using some more advanced regex:
public static int getCountOFWordsInLine(String line, String word) {
int count = 0;
Pattern pattern = Pattern.compile("\\b"+word+"\\b");
Matcher matcher = pattern.matcher(line);
while (matcher.find())
count++;
return count;
}
It actually checks for the word surrounded by \b which is word break
It still won't find the word if it start with uppercase though. If you want to make it case insensitive you can modify the previous method by changing everything to lowercase prior to searching. But it depends on your definition of word.
The whole program will become:
public class MainClass {
public static void main(String[] args) throws InterruptedException {
try {
InputStream baidid = new FileInputStream("c:\\test.txt");
InputStreamReader tekst = new InputStreamReader(baidid, "UTF-8");
BufferedReader puhverdab = new BufferedReader(tekst);
String rida = puhverdab.readLine();
String word="test";
int count=0;
while (rida != null){
System.out.println("Reading: " + rida);
count+=getCountOFWordsInLine(rida,word );
rida = puhverdab.readLine();
}
System.out.println("count:"+count);
puhverdab.close();
}catch(Exception e) {
e.printStackTrace();
}
}
public static int getCountOFWordsInLine(String line, String test) {
int count = 0;
Pattern pattern = Pattern.compile("\\b"+test+"\\b");
Matcher matcher = pattern.matcher(line);
while (matcher.find())
count++;
return count;
}
}

import java.io.*;
import java.until.regex.*;
public class TA
{
public static void main(String[] args) throws Exception
{
InputStream baidid = new FileInputStream("test.txt");
InputStreamReader tekst = new InputStreamReader(baidid, "UTF-8");
BufferedReader puhverdab = new BufferedReader(tekst);
String rida;
String word = argv[0]; // search word passed via command line
int count1=0, count2=0, count3=0, count4=0;
Pattern P1 = Pattern.compile("\\b" + word + "\\b");
Pattern P2 = Pattern.compile("\\b" + word + "\\b", Pattern.CASE_INSENSITIVE);
while ((rida = puhverdab.readLine()) != null)
{
System.out.println("Reading: " + rida);
// Version 1 : counts lines containing [word]
if (rida.contains(word)) count1++;
// Version 2: counts every instance of [word]
into pos=0;
while ((pos = rida.indexOf(word, pos)) != -1) { count2++; pos++; }
// Version 3: looks for surrounding whitespace
Matcher m = P1.matcher(rida);
while (m.find()) count3++;
// Version 4: looks for surrounding whitespace (case insensitive)
Matcher m = P2.matcher(rida);
while (m.find()) count4++;
}
System.out.println("Found exactly " + count1 + " line(s) containing word: \"" + word + "\"");
System.out.println("Found word \"" + word + "\" exactly " + count2 + " time(s)");
System.out.println("Found word \"" + word + "\" surrounded by whitespace " + count3 + " time(s).");
System.out.println("Found, case insensitive search, word \"" + word + "\" surrounded by whitespace " + count4 + " time(s).");
puhverdab.close();
}
}

This reads line-by-line as you've already done, splits a line by whitespace to obtain individual words, and checks each word for a match.
int countWords(String filename, String word) throws Exception {
InputStream inputStream = new FileInputStream(filename);
InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "UTF-8");
BufferedReader reader = new BufferedReader(inputStreamReader);
int count = 0;
String line = reader.readLine();
while (line != null) {
String[] words = line.split("\\s+");
for (String w : words)
if (w.equals(word))
count++;
line = reader.readLine();
}
reader.close();
return count;
}

Related

Storing an array of strings without initializing the size

Background: This program reads in a text file and replaces a word in the file with user input.
Problem: I am trying to read in a line of text from a text file and store the words into an array.
Right now the array size is hard-coded with an number of indexes for test purposes, but I want to make the array capable of reading in a text file of any size instead.
Here is my code.
public class FTR {
public static Scanner input = new Scanner(System.in);
public static Scanner input2 = new Scanner(System.in);
public static String fileName = "C:\\Users\\...";
public static String userInput, userInput2;
public static StringTokenizer line;
public static String array_of_words[] = new String[19]; //hard-coded
/* main */
public static void main(String[] args) {
readFile(fileName);
wordSearch(fileName);
replace(fileName);
}//main
/*
* method: readFile
*/
public static void readFile(String fileName) {
try {
FileReader file = new FileReader(fileName);
BufferedReader read = new BufferedReader(file);
String line_of_text = read.readLine();
while (line_of_text != null) {
System.out.println(line_of_text);
line_of_text = read.readLine();
}
} catch (Exception e) {
System.out.println("Unable to read file: " + fileName);
System.exit(0);
}
System.out.println("**************************************************");
}
/*
* method: wordSearch
*/
public static void wordSearch(String fileName) {
int amount = 0;
System.out.println("What word do you want to find?");
userInput = input.nextLine();
try {
FileReader file = new FileReader(fileName);
BufferedReader read = new BufferedReader(file);
String line_of_text = read.readLine();
while (line_of_text != null) { //there is a line to read
System.out.println(line_of_text);
line = new StringTokenizer(line_of_text); //tokenize the line into words
while (line.hasMoreTokens()) { //check if line has more words
String word = line.nextToken(); //get the word
if (userInput.equalsIgnoreCase(word)) {
amount += 1; //count the word
}
}
line_of_text = read.readLine(); //read the next line
}
} catch (Exception e) {
System.out.println("Unable to read file: " + fileName);
System.exit(0);
}
if (amount == 0) { //if userInput was not found in the file
System.out.println("'" + userInput + "'" + " was not found.");
System.exit(0);
}
System.out.println("Search for word: " + userInput);
System.out.println("Found: " + amount);
}//wordSearch
/*
* method: replace
*/
public static void replace(String fileName) {
int amount = 0;
int i = 0;
System.out.println("What word do you want to replace?");
userInput2 = input2.nextLine();
System.out.println("Replace all " + "'" + userInput2 + "'" + " with " + "'" + userInput + "'");
try {
FileReader file = new FileReader(fileName);
BufferedReader read = new BufferedReader(file);
String line_of_text = read.readLine();
while (line_of_text != null) { //there is a line to read
line = new StringTokenizer(line_of_text); //tokenize the line into words
while (line.hasMoreTokens()) { //check if line has more words
String word = line.nextToken(); //get the word
if (userInput2.equalsIgnoreCase(word)) {
amount += 1; //count the word
word = userInput;
}
array_of_words[i] = word; //add word to index in array
System.out.println("WORD: " + word + " was stored in array[" + i + "]");
i++; //increment array index
}
//THIS IS WHERE THE PRINTING HAPPENS
System.out.println("ARRAY ELEMENTS: " + Arrays.toString(array_of_words));
line_of_text = read.readLine(); //read the next line
}
BufferedWriter outputWriter = null;
outputWriter = new BufferedWriter(new FileWriter("C:\\Users\\..."));
for (i = 0; i < array_of_words.length; i++) { //go through the array
outputWriter.write(array_of_words[i] + " "); //write word from array to file
}
outputWriter.flush();
outputWriter.close();
} catch (Exception e) {
System.out.println("Unable to read file: " + fileName);
System.exit(0);
}
if (amount == 0) { //if userInput was not found in the file
System.out.println("'" + userInput2 + "'" + " was not found.");
System.exit(0);
}
}//replace
}//FTR
You can use java.util.ArrayList (which dynamically grows unlike an array with fixed size) to store the string objects (test file lines) by replacing your array with the below code:
public static List<String> array_of_words = new java.util.ArrayList<>();
You need to use add(string) to add a line (string) and get(index) to retrieve the line (string)
Please refer the below link for more details:
http://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html
You may want to give a try to ArrayList.
In Java normal arrays cannot be initialized without giving initial size and they cannot be expanded during run time. Whereas ArrayLists have resizable-array implementation of the List interface.ArrayList also comes with number of useful builtin functions such as
Size()
isEmpty()
contains()
clone()
and others. On top of these you can always convert your ArrayList to simple array using ArrayList function toArray(). Hope this answers your question. I'll prepare some code and share with you to further explain things you can achieve using List interface.
Use not native [] arrays but any kind of java collections
List<String> fileContent = Files.readAllLines(Paths.get(fileName));
fileContent.stream().forEach(System.out::println);
long amount = fileContent.stream()
.flatMap(line -> Arrays.stream(line.split(" +")))
.filter(word -> word.equalsIgnoreCase(userInput))
.count();
List<String> words = fileContent.stream()
.flatMap(line -> Arrays.stream(line.split(" +")))
.filter(word -> word.length() > 0)
.map(word -> word.equalsIgnoreCase(userInput) ? userInput2 : word)
.collect(Collectors.toList());
Files.write(Paths.get(fileName), String.join(" ", words).getBytes());
of course you can works with such lists more traditionally, with loops
for(String line: fileContent) {
...
}
or even
for (int i = 0; i < fileContent.size(); ++i) {
String line = fileContent.get(i);
...
}
i just like streams :)

Counting words from a text-file in Java

I'm writing a program that'll scan a text file in, and count the number of words in it. The definition for a word for the assignment is: 'A word is a non-empty string consisting of only of letters (a,. . . ,z,A,. . . ,Z), surrounded
by blanks, punctuation, hyphenation, line start, or line end.
'.
I'm very novice at java programming, and so far i've managed to write this instancemethod, which presumably should work. But it doesn't.
public int wordCount() {
int countWord = 0;
String line = "";
try {
File file = new File("testtext01.txt");
Scanner input = new Scanner(file);
while (input.hasNext()) {
line = line + input.next()+" ";
input.next();
}
input.close();
String[] tokens = line.split("[^a-zA-Z]+");
for (int i=0; i<tokens.length; i++){
countWord++;
}
return countWord;
} catch (Exception ex) {
ex.printStackTrace();
}
return -1;
}
Quoting from Counting words in text file?
int wordCount = 0;
while (input.hasNextLine()){
String nextLine = input.nextLine();
Scanner word = new Scanner(nextline);
while(word.hasNext()){
wordCount++;
word.next();
}
word.close();
}
input.close();
The only usable word separators in your file are spaces and hyphens. You can use regex and the split() method.
int num_words = line.split("[\\s\\-]").length; //stores number of words
System.out.print("Number of words in file is "+num_words);
REGEX (Regular Expression):
\\s splits the String at white spaces/line breaks and \\- at hyphens. So wherever there is a space, line break or hyphen, the sentence will be split. The words extracted are copied into and returned as an array whose length is the number of words in your file.
you can use java regular expression.
You can read http://docs.oracle.com/javase/tutorial/essential/regex/groups.html to know about group
public int wordCount(){
String patternToMatch = "([a-zA-z]+)";
int countWord = 0;
try {
Pattern pattern = Pattern.compile(patternToMatch);
File file = new File("abc.txt");
Scanner sc = new Scanner(file);
while(sc.hasNextLine()){
Matcher matcher = pattern.matcher(sc.nextLine());
while(matcher.find()){
countWord++;
}
}
sc.close();
}catch(Exception e){
e.printStackTrace();
}
return countWord > 0 ? countWord : -1;
}
void run(String path)
throws Exception
{
try (BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(path), "UTF-8")))
{
int result = 0;
while (true)
{
String line = reader.readLine();
if (line == null)
{
break;
}
result += countWords(line);
}
System.out.println("Words in text: " + result);
}
}
final Pattern pattern = Pattern.compile("[A-Za-z]+");
int countWords(String text)
{
Matcher matcher = pattern.matcher(text);
int result = 0;
while (matcher.find())
{
++result;
System.out.println("Matcher found [" + matcher.group() + "]");
}
System.out.println("Words in line: " + result);
return result;
}

Replace a substring by a changing replacement string

I'm trying to write a small program that detect comments in a code file, and tag them by a index-tag, meaning a tag with an increasing value.
For example this input:
method int foo (int y) {
int temp; // FIRST COMMENT
temp = 63; // SECOND COMMENT
// THIRD COMMENT
}
should be change to:
method int foo (int y) {
int temp; <TAG_0>// FIRST COMMENT</TAG>
temp = 63; <TAG_1>// SECOND COMMENT</TAG>
<TAG_2>// THIRD COMMENT</TAG>
}
I tried the following code:
String prefix, suffix;
String pattern = "(//.*)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(fileText);
int i = 0;
suffix = "</TAG>";
while (m.find()) {
prefix = "<TAG_" + i + ">";
System.out.println(m.replaceAll(prefix + m.group() + suffix));
i++;
}
The output for the above code is:
method int foo (int y) {
int temp; <TAG_0>// FIRST COMMENT</TAG>
temp = 63; <TAG_0>// SECOND COMMENT</TAG>
<TAG_0>// THIRD COMMENT</TAG>
}
To replace occurrences of detected patterns, you should use the Matcher#appendReplacement method which fills a StringBuffer:
StringBuffer sb = new StringBuffer();
while (m.find()) {
prefix = "<TAG_" + i + ">";
m.appendReplacement(sb, prefix + m.group() + suffix);
i++;
}
m.appendTail(sb); // append the rest of the contents
The reason replaceAll will do the wrong replacement is that it will have the Matcher scan the whole string to replace every matched pattern with <TAG_0>...</TAG>. In effect, the loop would only execute once.
Have you tried reading the file per line, like:
String prefix, suffix;
suffix = " </TAG>";
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
int i = 0;
for (String line; (line = br.readLine()) != null;) {
if (line.contains("//")) {
prefix = "<TAG_" + i + ">//";
System.out.println(line.split("//*")[0] + " " + prefix + line.split("//*")[1] + suffix);
i++;
}
}
} catch (IOException e) {
}
fichiertexte.txt :
method int foo (int y) {
int temp; // FIRST COMMENT
temp = 63; // SECOND COMMENT
// THIRD COMMENT
}
App.java :
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class App {
public static void main(String[] args) {
String fileText = "";
String fichier = "fichiertexte.txt";
// lecture du fichier texte
try {
InputStream ips = new FileInputStream(fichier);
InputStreamReader ipsr = new InputStreamReader(ips);
BufferedReader br = new BufferedReader(ipsr);
String ligne;
while ((ligne = br.readLine()) != null) {
//System.out.println(ligne);
fileText += ligne + "\n";
}
br.close();
} catch (Exception e) {
System.err.println(e.toString());
}
String prefix, suffix;
String pattern = "(//.*)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(fileText);
int i = 0;
suffix = "</TAG>";
StringBuffer sb = new StringBuffer();
while (m.find()) {
prefix = "<TAG_" + i + ">";
m.appendReplacement(sb, prefix + m.group() + suffix);
i++;
}
System.out.println(sb.toString());
}
}
System.out :
method int foo (int y) {
int temp; <TAG_0>// FIRST COMMENT</TAG>
temp = 63; <TAG_1>// SECOND COMMENT</TAG>
<TAG_2>// THIRD COMMENT</TAG>
}

Removing back to back dashes and asterisks in a string

I am having some trouble reading in a file and removing all of the punctuation from the file.
Below is what I currently have and I can not figure out why "----" and "*****" would still occur.
Can anyone point me in a direction to figure out how I need to adjust my replaceAll() in order to make sure repeated occurrences of punctuation can be removed?
public void analyzeFile(File filepath) {
try {
FileInputStream fStream = new FileInputStream(filepath);
DataInputStream in = new DataInputStream(fStream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String textFile = "";
String regex = "[a-zA-Z0-9\\s]";
String putString = "";
wordCount = 0;
while ((textFile = br.readLine()) != null) {
if (!textFile.equals("") && textFile.length() > 0) {
String[] words = textFile.split(" ");
wordCount += words.length;
for (int i = 0; i < words.length; i++) {
putString = cleanString(regex, words[i]);
if(putString.length() > 0){
mapInterface.put(putString, 1);
}
}
putString = "";
}
}
in.close();
} catch (Exception e) {
System.out.println("Error while attempting to read file: "
+ filepath + " " + e.getMessage());
}
}
private String cleanString(String regex, String str){
String newString = "";
Pattern regexChecker = Pattern.compile(regex);
Matcher regexMatcher = regexChecker.matcher(str);
while(regexMatcher.find()){
if(regexMatcher.group().length() != 0){
newString += regexMatcher.group().toString();
}
}
return newString;
}
Surely you can use the \w escaped alphanumeric character? This will recognise all letters and numbers, but not punctuation.
putString = words[i].replaceAll("[^\w]+", "");
This replaces any non-word character with an empty string.

Error while counting number of character,lines and words in java

i have written the following code to count the number of character excluding white spaces,count number of words,count number of lines.But my code is not showing proper output.
import java.io.*;
class FileCount
{
public static void main(String args[]) throws Exception
{
FileInputStream file=new FileInputStream("sample.txt");
BufferedReader br=new BufferedReader(new InputStreamReader(file));
int i;
int countw=0,countl=0,countc=0;
do
{
i=br.read();
if((char)i==(' '))
countw++;
else if((char)i==('\n'))
countl++;
else
countc++;
}while(i!=-1);
System.out.println("Number of words:"+countw);
System.out.println("Number of lines:"+countl);
System.out.println("Number of characters:"+countc);
}
}
my file sample.txt has
hi my name is john
hey whts up
and my out put is
Number of words:6
Number of lines:2
Number of characters:26
You need to discard other whitespace characters as well including repeats, if any. A split around \\s+ gives you words separated by not only all whitespace characters but also any appearance of those characters in succession.
Having got a list of all words in the line it gets easier to update the count of words and characters using length methods of array and String.
Something like this will give you the result:
String line = null;
String[] words = null;
while ((line = br.readLine()) != null) {
countl++;
words = line.split("\\s+");
countw += words.length;
for (String word : words) {
countc += word.length();
}
}
A new line means also that the words ends.
=> There is not always a ' ' after each word.
do
{
i=br.read();
if((char)i==(' '))
countw++;
else if((char)i==('\n')){
countl++;
countw++; // new line means also end of word
}
else
countc++;
}while(i!=-1);
End of file should also increase the number of words (if no ' ' of '\n' was the last character.
Also handling of more than one space between words is still not handled correctly.
=> You should think about more changes in your approach to handle this.
import java.io.*;
class FileCount {
public static void main(String args[]) throws Exception {
FileInputStream file = new FileInputStream("sample.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(file));
int i;
int countw = 0, countl = 0, countc = 0;
do {
i = br.read();
if ((char) i == (' ')) { // You should also check for other delimiters, such as tabs, etc.
countw++;
}
if ((char) i == ('\n')) { // This is for linux Windows should be different
countw++; // Newlines also delimit words
countl++;
} // Removed else. Newlines and spaces are also characters
if (i != -1) {
countc++; // Don't count EOF as character
}
} while (i != -1);
System.out.println("Number of words " + countw);
System.out.println("Number of lines " + countl); // Print lines instead of words
System.out.println("Number of characters " + countc);
}
}
Ouput:
Number of words 8
Number of lines 2
Number of characters 31
Validation
$ wc sample.txt
2 8 31 sample.txt
Try this:
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
public class FileCount {
/**
*
* #param filename
* #return three-dimensional int array. Index 0 is number of lines
* index 1 is number of words, index 2 is number of characters
* (excluding newlines)
*/
public static int[] getStats(String filename) throws IOException {
FileInputStream file = new FileInputStream(filename);
BufferedReader br = new BufferedReader(new InputStreamReader(file));
int[] stats = new int[3];
String line;
while ((line = br.readLine()) != null) {
stats[0]++;
stats[1] += line.split(" ").length;
stats[2] += line.length();
}
return stats;
}
public static void main(String[] args) {
int[] stats = new int[3];
try {
stats = getStats("sample.txt");
} catch (IOException e) {
System.err.println(e.toString());
}
System.out.println("Number of words:" + stats[1]);
System.out.println("Number of lines:" + stats[0]);
System.out.println("Number of characters:" + stats[2]);
}
}

Categories