Count number of sentences in a text file - java

Sentences I guess being string that end in ! ? .
Excepting thing like Dr. Mr.
It is true that you cannot really know a sentence in java because of grammar.
But I guess what I mean is a period or exclamation mark or question mark and then what follows being a capital letter.
How would one do this.
This be what I have
But its not working.....
BufferedReader Compton = new BufferedReader(new FileReader(fileName));
int sentenceCount=0;
String violet;
String limit="?!.";
while(Compton.ready())
{
violet=Compton.readLine();
for(int i=0; i<violet.length()-1;i++)
{
if(limit.indexOf(violet.charAt(i)) != -1 && i>0 && limit.indexOf(violet.charAt(i-1)) != -1)
{
sentenceCount++;
}
}
}
System.out.println("the amount of sentence is " + sentenceCount);
EDIT
New way that works better
String violet;
while(Compton.ready())
{
violet=Compton.readLine();
sentenceCount=violet.split("[!?.:]+").length;
System.out.println("the number of words in line is " +
sentenceCount);
}

BufferedReader reader = new BufferedReader(new FileReader(fileName));
int sentenceCount = 0;
String line;
String delimiters = "?!.";
while ((line = reader.readLine()) != null) { // Continue reading until end of file is reached
for (int i = 0; i < line.length(); i++) {
if (delimiters.indexOf(line.charAt(i)) != -1) { // If the delimiters string contains the character
sentenceCount++;
}
}
}
reader.close();
System.out.println("The number of sentences is " + sentenceCount);

One liner:
int n = new String (Files.readAllBytes(Paths.get(path))).split ("[\\.\\?!]").length
Uses Java 7 constructs to read whole file to byte array, create a string from that and split into sentence array then gets the length of the array.

A potential way to do this is to scan your file as words and then count words that are not in your exception list that end in your given punctuation.
Here's a possible implementation using Java 8 streams:
List<String> exceptions = Arrays.toList("Dr.", "Mr.");
Iterable<String> iterableScanner = () -> new Scanner(filename);
int sentenceCount = StreamSupport.stream(iterableScanner, false)
.filter(word -> word.matches(".*[\\.\\?!]))
.filter(word -> !exceptions.contains(word))
.count();

Related

Doesn't save the words in array

i've got a propably simple question. I try to read the file and i want to add each single word to my array "phrase". The problem occures in for loop. I got the exception "index 0 out of bounds for length 0".
Can you please help me with that?
String [] tokens;
String line;
String hash = " ";
int n = 0;
String [] phrase = new String [n];
public void loadFile()
{
try
{
#SuppressWarnings("resource")
BufferedReader br = new BufferedReader(new FileReader("z3data1.txt"));
while((line = br.readLine()) != null)
{
tokens = line.split("[ ]");
n += tokens.length;
}
for(int j = 0; j<tokens.length; j++)
{
phrase[j] = tokens[j];
}
}
catch(IOException ex)
{
ex.printStackTrace();
}
}
A couple observations.
you are getting the error because your array is not large enough and the index j is exceeding its size.
you keep overwriting tokens in the while loop. The while loop needs to encompass the copying of the tokens to the phrase array.
So try the following:
while((line = br.readLine()) != null) {
tokens = line.split("[ ]");
n += tokens.length; // don't really need this.
//starting offset to write into phrase
int len = phrase.length;
phrase = Arrays.copyOf(phrase,phrase.length + tokens.length);
for(int j = 0; j<tokens.length; j++) {
phrase[j + len] = tokens[j];
}
}
This statement
phrase = Arrays.copyOf(phrase,phrase.length + tokens.length)
Copies the contents of phrase and increases the array size to handle the writing of tokens.
Another (and probably preferred) alternative is to use a List<String> which grows as you need it.
List<String> phrase = new ArrayList<>();
for(int j = 0; j<tokens.length; j++) {
phrase.add(tokens[j]);
}
// or skip the loop and just do
Collections.addAll(phrase,tokens);
One observation. I don't know what you are splitting on but your split statement looks suspicious.
You're setting n to 0, so phrase is also of length 0 when you say String[] phrase = String[n]. Therefore, you can't add anything to it.
If you want something of variable length, you can use an ArrayList. In the code below, you can directly use Collections.addAll to split up the line and put everything into the phrase ArrayList.
String line;
//Note that you can get rid of tokens here, since it's being inlined below
ArrayList<String> phrase = new ArrayList<>();
public void loadFile()
{
try
{
#SuppressWarnings("resource")
BufferedReader br = new BufferedReader(new FileReader("z3data1.txt"));
while((line = br.readLine()) != null)
{
//No need for a for-loop below, you can do everything in one line
Collections.addAll(phrase, line.split("[ ]"));
}
}
catch(IOException ex)
{
ex.printStackTrace();
}
}

How can i pass all integers i this loop and stop before the last line

this is my input
first line
5 6
3 4
2 3
2 5
1 0
word 2 2 4
i need to add all the integers to a graph but not the last line (word 1 2 4)..
i have splitted the first line (first line etc.) and put them in a arraylist.. No problem there
but then i have this for-loop
for (int i = 0; i < (amount of lines); i++) {
StringTokenizer st = new StringTokenizer(in.readLine());
graph.addEdge(Integer.parseInt(st.nextToken()), Integer.parseInt(st.nextToken()));
}
i cant write in the code how many times i want it to put integers, because my code should run generally with other inputs...
How can i make it stop before the last line, i still need to be able to use the last bufferreaderline
Assuming that you always have one first line to create the graph, and a last line to do whatever. What you want to iterate over for adding Edges to the Graph are the lines between first an last.
I hope I'm guessing right.
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
String line;
List<String> lines = new ArrayList<>();
/* Iterating over the whole input first*/
while((line = in.readLine()) != null) {
lines.add(line);
}
String firstLine = lines.get(0);
String lastLine = lines.get(lines.size() - 1);
StringTokenizer stok = new StringTokenizer(firstLine);
ArrayList<String> pax = new ArrayList<String>();
while(stok.hasMoreTokens()){
pax.add(stok.nextToken());
}
int v = pax.size();
Graaph graph = new Graaph(V);
/* Create the edges */
for (int i = 1; i < lines.size()-1; i++) {
StringTokenizer st = new StringTokenizer(in.readLine());
graph.addEdge(Integer.parseInt(st.nextToken()),
Integer.parseInt(st.nextToken()));
}
This assumes that your input is not empty, but I guess you can handle that.
I hope this helps.
I would just use try-catch:
ArrayList<Integer> nums = new ArrayList<Integer>();
String lastLine = "";
try {
while ((lastLine = in.readLine()) != null) {
StringTokenizer st2 = new StringTokenizer(lastLine);
graph.addEdge(Integer.parseInt(st2.nextToken()),
Integer.parseInt(st2.nextToken()));
}
}
catch (NumberFormatException e) {}
System.out.println(lastLine);

java bufferedReader. how to read parts of a line

Ok now, here's my question. I wrote an algorithm to do specific things. Currently I create my processes myself in the class constructor and store them in a priority queue. However I want to be able to write a .txt file with multiple lines. Each line will represent a process with its different attributes separated by space. Here's what my .txt will look like:
P1 0 8
P2 1 4
P3 2 9
P4 3 3
END 4 9999
p1, p2... etc are the names of each process. Then the second column is the first attribute and the third column is the second attribute.
I need to be able to read each column at a time and store the value in my processes. How can I read those values and distinguish between them? (treat them as separate things)
So you want to read the file line-by-line and separate each line?
BufferReader in=new BufferedReader...
String line;
while ((line=in.readLine())!=null) {
String[] data=line.split(" ");
//now, data will be a array which contains the data
//data[0] = the first item in the line
//data[1] = the first number
//data[2] = the second number
}
Have a look at the java.util.Scanner class, it can help to read separate tokens from a Reader.
It has methods to read the next token as an integer, as a string or many other types. There are also some examples in the class Javadoc...
You got both whitespace (seperating the attributes) and new line (seperates the whole process information) as seperators.
Using a BufferedReader, you could either read a whole line (reader.readLine()) to parse one whole process information and use String.split() to seperate the attributes (edit: see answer from dyslabs).
An obviously more performant (but less intuitive) approach is to read single characters (reader.read()) and check if you either read a whitespace- or a new-line-character:
// caution: code is not tested but shows the general approach
List<ProcessInformation> processInfo = new ArrayList<>();
String pInfoStr = new String[3];
int processInfoIndex = 0;
String[] processInfoHolder = new String[3];
String processInfo = "";
int c;
while( (c = reader.read()) != -1 ) {
if (Character.isWhitespace(c)) {
processInfoHolder[processInfoIndex++] = processInfo;
processInfoStr = "";
}
else if (c == 10) { // not sure if correct codepoint for whitespace
processInfo.add(new ProcessInfo(processInfoHolder));
processInfoIndex = 0;
}
else {
processInfoStr += c;
}
}
You could even more optimize this method by using StringBuilder.
In order to be able to read a file line by line I use readLine() != null while in order to retrieve the values separated by whitespace, use the split method and store each value of a single line in an array,
here's how I implemented your example:
public static void main(String[] args) {
// TODO Auto-generated method stub
BufferedReader buffer;
FileReader fileReader;
String p1[] = new String[4];
String p2[] = new String[4];
String p3[] = new String[4];
String p4[] = new String[4];
String end[] = new String[4];
try {
fileReader = new FileReader(new File("file.txt"));
buffer = new BufferedReader(fileReader);
String line;
line = buffer.readLine();
// ============= Read the fist line =============
p1 = line.split("\\s+");
while((line = buffer.readLine()) != null) {
// ============= Read the second line =============
p2 = line.split("\\s+");
// ============= Read the third line =============
if((line = buffer.readLine()) != null) {
p3 = line.split("\\s+");
}
// ============= Read the forth line =============
if((line = buffer.readLine()) != null) {
p4 = line.split("\\s+");
}
// ============= Read the last line =============
if((line = buffer.readLine()) != null) {
end = line.split("\\s+");
}
}
fileReader.close();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
int v1[] = new int[3];
int v2[] = new int[3];
int v3[] = new int[3];
int v4[] = new int[3];
int v_end[] = new int[3];
for (int i = 0 ; i < p1.length; i++)
System.out.print(p1[i]+ " ");
System.out.println();
for (int i = 0 ; i < p2.length; i++)
System.out.print(p2[i]+ " ");
System.out.println();
for (int i = 0 ; i < p3.length; i++)
System.out.print(p3[i]+ " ");
System.out.println();
for (int i = 0 ; i < p4.length; i++)
System.out.print(p4[i]+ " ");
System.out.println();
for (int i = 0 ; i < end.length; i++)
System.out.print(end[i]+ " ");
}

In Java how to check if the next record in a file is the end?

I want to sequentially read each line of an input unsorted file into consecutive elements of the array until there are no more records in
the file or until the input size is reached, whichever occurs first. but i can't think of a way to check the next line if its the end of the file?
This is my code:
Scanner cin = new Scanner(System.in);
System.out.print("Max number of items: ");
int max = cin.nextInt();
String[] input = new String[max];
try {
BufferedReader br = new BufferedReader(new FileReader("src/ioc.txt"));
for(int i=0; i<max; i++){ //to do:check for empty record
input[i] = br.readLine();
}
}
catch (IOException e){
System.out.print(e.getMessage());
}
for(int i=0; i<input.length; i++){
System.out.println((i+1)+" "+input[i]);
}
the file has 205 lines, if I input 210 as max, the array prints with five null elements like so..
..204 Seychelles
205 Algeria
206 null
207 null
208 null
209 null
210 null
Thanks for your responses in advance!
From the docs:
public String readLine()
Returns: A String containing the contents of the line, not including
any line-termination characters, or null if the end of the stream has
been reached
In other words, you should do
String aux = br.readLine();
if(aux == null)
break;
input.add(aux)
I recomend you use a variable-size array (you can pre-allocated with the requested size if reasonable). Such that you get either the expected size or the actual number of lines, and can check later.
(depending on how long your file is, you might want to look at readAllLines() too.)
Please refer this Number of lines in a file in Java and modify your for loop to take whatever is the least out of the entered max value or the no.of lines in the file.
Use List<String>
List<String> lines = new ArrayList<>(); // Growing array.
try (BufferedReader br = new BufferedReader(new FileReader("src/ioc.txt"))) {
for(;;) {
String line = br.readLine();
if (line == null) {
break;
}
lines.add(line);
}
} catch (IOException e) {
System.out.print(e.getMessage());
} // Closes automatically.
// If lines wanted as array:
String[] input = lines.toArray(new String[lines.size()]);
Using a dynamically growing ArrayList is the normal way to deal with such problem.
P.S.
FileReader will read in the current platform encoding, i.e. a local file, created locally.
You could do a null check in your first for-loop like:
public static void main(String[] args) {
Scanner cin = new Scanner(System.in);
System.out.print("Max number of items: ");
int max = cin.nextInt();
BufferedReader br = new BufferedReader(new FileReader("src/ioc.txt"));
List<String> input = new ArrayList<>();
String nextString;
int i;
for (i = 0; i < max && ((nextString = br.readline()) != null); i++) {
input.add(nextString);
}
for (int j = 0; j < i; j++) {
System.out.println((j + 1) + " " + input.get(j));
}
}
Try :
for(int i=0; i<max; i++){ //to do:check for empty record
if(br.readLine()!=null)
input[i] = br.readLine();
else
break;
}
int i=0;
for(; i<max; i++){ //to do:check for empty record
String line=br.readLine();
if(line==null){
break;
}
input[i] = line;
}
//i will contain the count of lines read. indexes 0...(i-1) represent the data.

Trying to find the nth element to the last from a list (with a file input)

Input would look like
a b c d 4
e f g h 2
where each line would be read like a list and integer representing as an index in the list
I first try to read the file line be line and store it in the list. Heres what i have
public class FileReader {
public static void main(String[] args) {
String line = null;
List<String> list = new ArrayList<String>();
try {
FileInputStream fstream = new FileInputStream("test.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
// File file = new File("test.txt");
// Scanner scanner = new Scanner(file);
while ((line = br.readLine()) != null) {
list.add(line);
}
System.out.println(list);
} catch (Exception e) {
e.printStackTrace();
}
}
}
Now i want to remove the white spaces from the list and store the values in char array and then i was planning on traversing that array backwards till the nth element, depending on the input for n.
String[] elements = line.trim().split("\\s");
char[] chars = new char[elements.length - 1];
int i= Integer.parseInt(elements[elements.length - 1]);
for (i = 0; i < elements.length - 1; i++)
char[i] = elements[i].charAt(i);
Someone provided me this piece of code earlier and i tried it and it throws a nullpointerexception at String[] elements.
It's because you are running until line is null here
while((line = br.readLine()) != null)
{
list.add(line);
}
And then you are trying to call .trim() on it.
Do you mean to be processing the strings in list instead?
If so try looping over you list, you are already splitting it correctly and getting the last element. All you need to do is caluclate the offset, in this case it will be the length - 1 - the last element, in you String[] elements and you can print that out.
for (int i = 0; i < list.size(); i++)
{
String currentLine = list.get(i);
String[] elements = currentLine.trim().split("\\s");
int lastElement = Integer.parseInt(elements[elements.length - 1]);
String desiredValue = elements[elements.length - 1 - lastElement];
System.out.println("desiredValue = " + desiredValue);
}
You can avoid most of the work you're doing. I don't know if your input will require much flexibility (code to that if necessary) but in your example you only have 1 digit for the index.
Just avoid all the traversing and looping entirely:
String currentLine = file.nextLine();
//Find value from last space in the string, until the end of the string (will be the number)
int index = Integer.parseInt(currentLine.substring(
currentLine.lastIndexOf(' ') + 1, currentLine.length()));
//Remove all spaces from the current line
currentLine = currentLine.replaceAll("\\s+","");
//Remove the index at the end from the string, leaving only the characters
currentLine = currentLine.substring(0, currentLine.indexOf(index + ""));
char desiredValue = currentLine.charAt(currentLine.length() - index);
System.out.println("desiredValue = " + desiredValue);
This saves a lot of adding stuff to arrays if none of that is needed later, just do it all the first time through.

Categories