Look for an amount of substring in a file Java - java

I am looking for an amount of substring in a file
In brief, the file contains a certain amount of article, and I need to know how many.
Each article starts with: #ARTICLE{
or with #ARTICLE{(series of integer)
Useful infos:
- I have 10 files to look in
- No files are empty
- This code gives me a StringIndexOutOfBounds exception
Here is the code I have so far:
//To read through all files
for(int i=1; i<=10; i++)
{
try
{
//To look through all the bib files
reader = new Scanner(new FileInputStream("C:/Assg_3-Needed-Files/Latex"+i+".bib"));
System.out.println("Reading Latex"+i+".bib->");
//To read through the whole file
while(reader.hasNextLine())
{
String line = reader.nextLine();
String articles = line.substring(1, 7);
if(line.equals("ARTICLE"))
count+=1;
}
}
catch(FileNotFoundException e)
{
System.err.println("Error opening the file Latex"+i+".bib");
}
}
System.out.print("\n"+count);

Try just using String#contains on each line:
while(reader.hasNextLine()) {
String line = reader.nextLine();
if (line.contains("ARTICLE")) {
count += 1;
}
}
This would at least get around the problem of having to take a substring in the first place. The problem is that while matching lines should not have the out of bounds exception, nor should lines longer than 7 characters which don't match, lines having fewer than 7 characters would cause a problem.
You could also use a regex pattern to make sure that you match ARTICLE as a standalone word:
while(reader.hasNextLine()) {
String line = reader.nextLine();
if (line.matches("\\bARTICLE\\b")) {
count += 1;
}
}
This would ensure that you don't count a line having something like articles in it, which is not your exact target.

You can check if line starts with needed sequence:
if (line.startsWith("ARTICLE")) {
count += 1;
}

You're getting an StringIndexOutOfBounds from this line of code:
String articles = line.substring(1, 7);
The line read in can be empty or have less than 7 characters. To avoid getting the StringIndexOutOfBounds you should have a conditional check to see if the
line.length > 7
Aside from that then its better to use the answers recommended above (ie .contains or .startsWith)

Since you are reading line by line so, string.contains is good choice instead of substring, on the other hand all article start with "#ARTICLE", Therefore use "#ARTICLE" in condition. For code test, please try this -
public class test {
public static void main(String[] args) {
int count = 0;
for (int i = 1; i <= 10; i++) {
try {
//To look through all the bib files
Scanner reader = new Scanner(new FileInputStream("C:/Assg_3-Needed-Files/Latex" + i + ".bib"));
System.out.println("Reading Latex" + i + ".bib->");
//To read through the whole file
while (reader.hasNextLine()) {
String line = reader.nextLine();
if (line.contains("#ARTICLE")) {
count += 1;
}
}
} catch (FileNotFoundException e) {
System.err.println("Error opening the file Latex" + i + ".bib");
}
}
System.out.print("\n" + count);
} }

Related

Java trying to read intention as an Integer string

I'm trying to parse the a data file. This code has worked successfully for my other data files; however, I'm now getting an error. These data files are indented therefore the computer is trying to read the first space. How would I skip over this space?
String line = br.readLine();
while (line != null) {
String[] parts = line.split(" ");
if (linecounter != 0) {
for (int j=0; j<parts.length; j++) {
if (j==parts.length-1)
truepartition.add(Integer.parseInt(parts[j]));
else {
tempvals.add(Double.parseDouble(parts[j]));
numbers.add(Double.parseDouble(parts[j]));
}
}
Points.add(tempvals);
tempvals = new ArrayList<Double>();
} else {
//Initialize variables with values in the first line
// Reads each elements in the text file into the program 1 by 1
for (int i=0; i<parts.length; i++) {
if (i==0)
numofpoints = Integer.parseInt(parts[i]);
else if (i==1)
dim = Integer.parseInt(parts[i]) - 1;
else if (i==2)
numofclus = Integer.parseInt(parts[i]);
}
}
linecounter++;
line = br.readLine();
}
Data File
75 3 4
4 53 0
5 63 0
10 59 0
This number format error is coming because you can't format space characters in the base 10.
I can see many extra whitespaces is there in your input, split(" ") this will not work.
So replace normal white space split with regex.
use the below code and it will take care of extra whitespaces in your input.
String[] parts = line.trim().split("\\s+");
Have you considered using the Scanner class. Skips over spaces and aids in parsing files.
for example:
try {
Scanner inp = new Scanner(new File("path/to/dataFile"));
while(inp.hasNext()) {
int value = inp.nextInt();
System.out.println(value);
}
inp.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}

Java FileInputStream Issues(Extra Whitespace)

When I use FileInputStream to input text from my spawn.txt file for a tile-based map, it adds extra whitespace in-between each line.
package mapInit;
import java.io.FileInputStream;
public class SpawnMap {
static char[][] spawnWorld = new char[30][30];
public static void main(String[] args) {
try {
FileInputStream spawn = new FileInputStream("Resources/Map/spawn.txt");
int i = 0;
int h = 0;
int k = 0;
while((i=spawn.read())!=-1){
if(h == 30) {
h = 0;
k++;
}
spawnWorld[k][h] = (char)i;
h++;
}
spawn.close();
} catch (Exception e) {
}
for (int i=0; i<30; i++) {
for (int j=0;j<30;j++) {
System.out.println(spawnWorld[i][j]);
}
}
}
}
This is the result of the output loop:
This is a picture of the text file:
GitHub Link: https://github.com/WeaponGod243/Machlandia
I think Scanner class it's more suitable for your task
Scanner class in Java is found in the java.util package. Java provides
various ways to read input from the keyboard, the java.util.Scanner
class is one of them.
The Java Scanner class breaks the input into tokens using a delimiter
which is whitespace by default. It provides many methods to read and
parse various primitive values.
The Java Scanner class is widely used to parse text for strings and
primitive types using a regular expression. It is the simplest way to
get input in Java. By the help of Scanner in Java, we can get input
from the user in primitive types such as int, long, double, byte,
float, short, etc.
The Java Scanner class extends Object class and implements Iterator
and Closeable interfaces.
The Java Scanner class provides nextXXX() methods to return the type
of value such as nextInt(), nextByte(), nextShort(), next(),
nextLine(), nextDouble(), nextFloat(), nextBoolean(), etc. To get a
single character from the scanner, you can call next().charAt(0)
method which returns a single character.
Source
Scanner Java Doc
public static void main(String arg[]) {
try (Scanner sc = new Scanner(new File("Resources/Map/spawn.txt"))) {
// Checking if sc has another token in the file
while(sc.hasNext()) {
// Print line
System.out.println(sc.next());
}
} catch (Exception ex) {
// Use a Logger to log exceptions in real projects
ex.printStackTrace();
}
}
You could use Apache Commons IO library too
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.5</version>
</dependency>
import org.apache.commons.io.FileUtils;
import java.io.File;
import java.io.IOException;
import java.util.List;
public class ReadTextFile {
public static void main(String[] args) throws IOException {
try {
File f = new File("Resources/Map/spawn.txt");
List<String> lines = FileUtils.readLines(f, "UTF-8");
for (String line : lines) {
System.out.println(line);
}
} catch (IOException e) {
// Use a Logger to log exceptions in real projects
e.printStackTrace();
}
}
}
Change the System.out.println inside your loop to just System.out.print.
System.out.println will automatically add a newline character every time, even if you're only printing a single character. System.out.print will take the String you pass in, and print it as is.
Here's a link to the official Javadoc:
https://docs.oracle.com/javase/8/docs/api/java/io/PrintStream.html#print-java.lang.String-
Also, unless it's on purpose, verify you're not printing the \n (newline character) at the end of each line. If you actually want to start a new line at the end of each line, simply put a System.out.println(); line after the end of the innermost loop.
I have looked at your codes. It seems by using this I have detected that by using FileInputStream it will produce an empty space in between the values.
static char[][] spawnWorld = new char[30][30];
public static void main(String[] args) {
try {
FileInputStream spawn = new FileInputStream("C:/Daniel/spawn.txt");
int i = 0;
int h = 0;
int k = 0;
while ((i = spawn.read()) != -1) {
if (h == 30) {
h = 0;
k++;
}
spawnWorld[k][h] = (char) i;
if (h == 19) System.out.println((char) i);
System.out.println("[" + k + "][" + h + "] : " + i);
h++;
}
spawn.close();
} catch (Exception e) {
e.printStackTrace();
}
System.out.println("Start Printing");
for (int i = 0; i < 30; i++) {
for (int j = 0; j < 30; j++) {
System.out.println("[" + i + "][" + j + "] : " + spawnWorld[i][j]);
}
}
System.exit(0);
}
From then on you may debug it accordingly.
At line 19, your 'i' returns a blank space or char value i.
I would suggest to use the Scanner as your way to read the file instead.
A text file is made up of characters on a line followed by end-of-line sequences, typically newlines or carriage returns followed by newlines. (These can also be thought of as line separators rather than line endings, and the last line might not be followed by one, depending on how the file was created.) You are reading the file character by character but you make no allowance for these end of line characters.
It doesn't make any difference whether you use FileInputStream or Scanner; the problem is, you read the file one character at a time but don't account for carriage return and newline.
For example:
while ((i=spawn.read()) != -1) {
if (Character.isWhitespace((char)i)) {
continue;
}
if (h == 30) {
h = 0;
k++;
}
spawnWorld[k][h] = (char)i;
h++;
}

Having an issue reading in a CSV File for a project

Hi I'm getting a NumberFormatException error for reading in this CSV text file for a project. Here's the CSV
12345,Left-Handed Bacon Stretcher,125.95,PGH,2
24680,Smoke Shifter,0.98,PGH,48
86420,Pre-dug Post Hole,2.49,ATL,34
25632,Acme Widget,98.29,LOU,342
97531,Anti-Gravity Turbine,895.29,ATL,3
24680,Battery-Powered Battery Charger,252.98,ATL,2
12345,Left-Handed Bacon Stretcher,125.95,LOU,35
97531,Anti-Gravity Turbine,895.29,PHL,8
00000,Glass Hammer,105.90,PGH,8
01020,Inflatable Dartboard,32.95,PGH,453
86420,Pre-dug Post Hole,2.49,LOU,68
86420,Pre-dug Post Hole,2.49,PGH,124
24680,Battery-Powered Battery Charger,252.98,PHL,5
I have a general understanding what is going on. The error is appearing I believe because it reaches the end of the first line and then the error pops up
Exception in thread "main" java.lang.NumberFormatException: For input
string: "2 24680"
This is what I have so far:
import java.io.*;
import java.util.*;
public class Prog7
{
public static void main(String[] args)
{
String warehouseID = null;
String city = null;
String state = null;
int partNumber = 0;
String description = null;
double price = 0.0;
int quantity = 0;
int count = 0;
int numWarehouse = 4;
int numParts = 13;
Scanner warehouseFile = null;
Scanner partFile = null;
Warehouse[] warehouse = new Warehouse[10];
Part[] parts = new Part[20];
try
{
warehouseFile = new Scanner(new File("warehouse.txt"));
while (warehouseFile.hasNext())
{
warehouseID = warehouseFile.next();
city = warehouseFile.next();
state = warehouseFile.next();
warehouse[count] = new Warehouse(warehouseID, city, state);
count++;
}
partFile = new Scanner(new File("parts.txt"));
partFile.useDelimiter(",");
while (partFile.hasNext())
{
partNumber = Integer.parseInt(partFile.next());
description = partFile.next();
price = Double.parseDouble(partFile.next());
warehouseID = partFile.next();
quantity = Integer.parseInt(partFile.next());
parts[count] = new Part(partNumber, description, price, warehouseID, quantity);
count++;
}
}
catch (FileNotFoundException e)
{
System.err.print("warehouse.txt or parts.txt not found");
}
for (int i = 0; i < numWarehouse; i++)
{
System.out.printf("%5s %5s %5s\n", warehouse[i].getWarehouseID(), warehouse[i].getCity(),
warehouse[i].getState());
for (int j = 0; j < numParts; j++)
{
if (parts[j].getWarehouseID().equals(warehouse[i].getWarehouseID()))
{
System.out.printf("%5s %5s %10.2f %5\nd", parts[j].getPartNumber(), parts[j].getDescription(),
parts[j].getPrice(), parts[j].getQuantity());
}
}
}
}
}
I think it has something to do with the program is reading in each value but then there's nothing for going to the next line. I have a tried a partFile.nextLine() instruction and a hasNextLine() while loop and I still get the same error. Is there something perhaps I could do with a newline character?
I think the problem is here:
partFile.useDelimiter(",");
You are telling the scanner to split only on commas. Once it reaches the last item in the first line of parts.txt, it reads onwards until it finds the first comma in the next line, and hence returns 2 followed by an end-of-line followed by 24680 as the next item.
You don't just want to split by commas, you also want to split by newline characters as well. useDelimiter takes a regular expression: the following tells the scanner to split on either a comma or any combination of newline characters:
partFile.useDelimiter(",|[\\r\\n]+");
In addition to Luke Woodward's answer, perhaps you should take into account that the year is 2018, not 1999. The following code should do what you're looking for, with the knowledge of how to parse a line moved into the class where it belongs. I made it a constructor, but it could also be a static valueOf method.
Path warehouseFile = Paths.get("warehouse.txt");
Files.lines(warehouseFile)
.map(Warehouse::new)
.collect(toList());
static class Warehouse {
public Warehouse(String line) {...}
}

Reading from a text file into an array - getting "nulls"

I'm reading from a file and copying that into an array. My file has five lines of text, a sentence each. I get my output "Array size is 5" but nothing after that. If I do add a print line of the array, it gives me 5 nulls...
Can someone help explain what I did wrong? Thanks!
public static int buildArray() throws Exception
{
System.out.println("BuildArray is starting ");
java.io.File textFile; // declares a variable of type File
textFile = new java.io.File ("textFile.txt"); //reserves the memory
Scanner input = null;
try
{
input = new Scanner(textFile);
}
catch (Exception ex)
{
System.out.println("Exception in method");
System.exit(0);
}
int arraySize = 0;
while(input.hasNextLine())
{
arraySize = arraySize + 1;
if (input.nextLine() == null)
break;
}
System.out.println("Array size is " + arraySize);
// Move the lines into the array
String[] linesInRAM = new String[arraySize];// reserve the memory
int count = 0;
if (input.hasNextLine())
{
while(count < arraySize)
{
System.out.println("test");
linesInRAM[count] = input.nextLine();
System.out.println(linesInRAM[count]);
count = count + 1;
}
}
In this code
int count = 0;
if (input.hasNextLine())
The above hasNextLine will always be false as you have already read all the way through the file.
Either reset the scanner to the beginning of the file, or use a dynamic list e.g. ArrayList to add the elements to.
My Java is a bit rusty, but the basic gist of my answer is that you should create a new Scanner object so that it reads from the beginning of the file again. This is the easiest way to "reset" to the beginning.
Your code is currently not working because when you call input.nextLine() you're actually incrementing the scanner, and thus at the end of that first while() loop input is sitting at the end of the file, so when you call input.nextLine() again it returns null.
Scanner newScanner = new Scanner(textFile);
Then in the bottom of your code, your loop should look like this instead:
if (newScanner.hasNextLine())
{
while(count < arraySize)
{
System.out.println("test");
linesInRAM[count] = newScanner.nextLine();
System.out.println(linesInRAM[count]);
count = count + 1;
}
}

A Good Method to read files in Java

I Really would appreciate it if someone can help me with this. I am trying to do external sorting and I am stuck on the part of merging. I get how I should merge it just not sure what function to use.
Right now I am trying to read in the first words of multiple small text files and store them in a string array of the size of the amount of files. So basically I will have a string array of the first word of each file. Then I determine which one is the smallest alphabetically wise and write that to a new file, after that I would read the next word of the file of that smallest word. This word would be placed in the position of the smallest word that got outputted in string array and compare it to the rest of the first word from the other file. This will keep repeating till all words are sorted.
The main problem I am running into is the fact that I was using scanner and after the first run of comparing it cant switch the smallest word with the next word in the file because scanner don't keep a point of what it has read. I know readline do but since my files are all words separated by only a white space I can't use readline. Can someone please guide me to a sufficient reading function that can't help me solve this problem.
for (int i = 0; i<B;i++)
{
try
{
BufferedReader ins = new BufferedReader(new FileReader(Run-"+ i + ".txt"));
Scanner scanner2 = new Scanner(ins);
temp3[i] = scanner2.next();
System.out.println(temp3[i]);
}
catch(IOException e)
{
}
}
for(int i=0;i<N;i++)
{
String smallest = temp3[0];
int smallestfile = 0;
for(j=0;j<B;j++)
{
int comparisonResult = smallest.compareTo(temp3[j]);
if(comparisonResult>0)
{
smallest = temp3[j];
smallestfile = j;
}
}
BufferedReader ins = new BufferedReader(new FileReader("C:/Run-"+ smallestfile + ".txt"));
Scanner scanner2 = new Scanner(ins);
if(scanner2.hasNext())
{
temp3[smallestfile]=scanner2.next();
}
}
}
catch(Exception e)
{
}
If the files are small enough read the entire file to memory, and use String.split() to separate the strings in arrays and do your magic.
If the the files are bigger, keep then open and read each byte until you find and space, then do it for all the files, compare the strings, do your magic and repeat until all the files reach the end.
EDIT :
how to read the files with BufferedReader
how to split the lines with String.split()
String line = readeOneLineFromTheCurrentFile();
String[] words = line.split(" ");
As for temporarily sorting/storing the words, use a PriorityQueue (not an array). Sorry, I'm too busy watching baseball to add more.
I'm not sure, if I understood you right, but a Scanner does keep the position in a file. You need just as many of them as there are files
import java.util.Scanner;
import java.io.File;
import java.io.FileNotFoundException;
public class so {
// returns the index of the smallest word
// returns -1 if there are no more words
private static int smallest(String[] words) {
int min = -1;
for (int i = 0; i < words.length; ++i)
if (words[i] != null) {
if (min == -1 || words[i].compareTo(words[min]) < 0)
min = i;
}
return min;
}
public static void main(String[] args) throws FileNotFoundException {
// open all files
Scanner[] files = new Scanner[args.length];
for (int i = 0; i < args.length; ++i) {
File f = new File(args[i]);
files[i] = new Scanner(f);
}
// initialize first words
String[] first = new String[args.length];
for (int i = 0; i < args.length; ++i)
first[i] = files[i].next();
// compare words and read following words from scanners
int min = smallest(first);
while (min >= 0) {
System.out.println(first[min]);
if (files[min].hasNext()) {
first[min] = files[min].next();
} else {
first[min] = null;
files[min].close();
files[min] = null;
}
min = smallest(first);
}
}
}
Tested with
a.txt: a d g j
b.txt: b e h k m
c.txt: c f i
Update:
In your example, you open and close the file inside the outer for loop. When you reopen a file the next time, it starts at the beginning of the file, of course.
To prevent this, you must keep the file open and move the scanner2 variable and its initialization in front of the outer for loop. You also need multiple Scanner variables, i.e. an array, to keep multiple files open simultaneously.

Categories