I'm trying to parse the a data file. This code has worked successfully for my other data files; however, I'm now getting an error. These data files are indented therefore the computer is trying to read the first space. How would I skip over this space?
String line = br.readLine();
while (line != null) {
String[] parts = line.split(" ");
if (linecounter != 0) {
for (int j=0; j<parts.length; j++) {
if (j==parts.length-1)
truepartition.add(Integer.parseInt(parts[j]));
else {
tempvals.add(Double.parseDouble(parts[j]));
numbers.add(Double.parseDouble(parts[j]));
}
}
Points.add(tempvals);
tempvals = new ArrayList<Double>();
} else {
//Initialize variables with values in the first line
// Reads each elements in the text file into the program 1 by 1
for (int i=0; i<parts.length; i++) {
if (i==0)
numofpoints = Integer.parseInt(parts[i]);
else if (i==1)
dim = Integer.parseInt(parts[i]) - 1;
else if (i==2)
numofclus = Integer.parseInt(parts[i]);
}
}
linecounter++;
line = br.readLine();
}
Data File
75 3 4
4 53 0
5 63 0
10 59 0
This number format error is coming because you can't format space characters in the base 10.
I can see many extra whitespaces is there in your input, split(" ") this will not work.
So replace normal white space split with regex.
use the below code and it will take care of extra whitespaces in your input.
String[] parts = line.trim().split("\\s+");
Have you considered using the Scanner class. Skips over spaces and aids in parsing files.
for example:
try {
Scanner inp = new Scanner(new File("path/to/dataFile"));
while(inp.hasNext()) {
int value = inp.nextInt();
System.out.println(value);
}
inp.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Related
I am looking for an amount of substring in a file
In brief, the file contains a certain amount of article, and I need to know how many.
Each article starts with: #ARTICLE{
or with #ARTICLE{(series of integer)
Useful infos:
- I have 10 files to look in
- No files are empty
- This code gives me a StringIndexOutOfBounds exception
Here is the code I have so far:
//To read through all files
for(int i=1; i<=10; i++)
{
try
{
//To look through all the bib files
reader = new Scanner(new FileInputStream("C:/Assg_3-Needed-Files/Latex"+i+".bib"));
System.out.println("Reading Latex"+i+".bib->");
//To read through the whole file
while(reader.hasNextLine())
{
String line = reader.nextLine();
String articles = line.substring(1, 7);
if(line.equals("ARTICLE"))
count+=1;
}
}
catch(FileNotFoundException e)
{
System.err.println("Error opening the file Latex"+i+".bib");
}
}
System.out.print("\n"+count);
Try just using String#contains on each line:
while(reader.hasNextLine()) {
String line = reader.nextLine();
if (line.contains("ARTICLE")) {
count += 1;
}
}
This would at least get around the problem of having to take a substring in the first place. The problem is that while matching lines should not have the out of bounds exception, nor should lines longer than 7 characters which don't match, lines having fewer than 7 characters would cause a problem.
You could also use a regex pattern to make sure that you match ARTICLE as a standalone word:
while(reader.hasNextLine()) {
String line = reader.nextLine();
if (line.matches("\\bARTICLE\\b")) {
count += 1;
}
}
This would ensure that you don't count a line having something like articles in it, which is not your exact target.
You can check if line starts with needed sequence:
if (line.startsWith("ARTICLE")) {
count += 1;
}
You're getting an StringIndexOutOfBounds from this line of code:
String articles = line.substring(1, 7);
The line read in can be empty or have less than 7 characters. To avoid getting the StringIndexOutOfBounds you should have a conditional check to see if the
line.length > 7
Aside from that then its better to use the answers recommended above (ie .contains or .startsWith)
Since you are reading line by line so, string.contains is good choice instead of substring, on the other hand all article start with "#ARTICLE", Therefore use "#ARTICLE" in condition. For code test, please try this -
public class test {
public static void main(String[] args) {
int count = 0;
for (int i = 1; i <= 10; i++) {
try {
//To look through all the bib files
Scanner reader = new Scanner(new FileInputStream("C:/Assg_3-Needed-Files/Latex" + i + ".bib"));
System.out.println("Reading Latex" + i + ".bib->");
//To read through the whole file
while (reader.hasNextLine()) {
String line = reader.nextLine();
if (line.contains("#ARTICLE")) {
count += 1;
}
}
} catch (FileNotFoundException e) {
System.err.println("Error opening the file Latex" + i + ".bib");
}
}
System.out.print("\n" + count);
} }
Im writing a program to read in a file and store the strings in arraylist and ints in an array. The file contains strings and ints in the format: String int
I have already got the string section to work, I'm looking to know why the following code is populating my array with the number 7, six times rather than the correct numbers.
Correct output would be:
12, 14, 16, 31, 42, 7
But it gives:
7, 7, 7, 7, 7, 7
Code:
BufferedReader buffy = new BufferedReader(new FileReader(fileName));
while((str = buffy.readLine()) != null) {
for(int i = 0; i <= arrayInt.length - 1; i++) {
for(int k = 0; k <= str.length()-1; k++) {
if(str.substring(k, k + 1).equals(" ")) {
String nums = str.substring(k+1);
arrayInt[i] = Integer.parseInt(nums);
}
}
}
}
buffy.close();
This happens because for each line in file you fill whole array.
Try this:
int i = 0;
BufferedReader buffy = new BufferedReader(new FileReader(fileName));
while((str = buffy.readLine()) != null) {
if(i < arrayInt.length) {
for(int k = 0; k <= str.length()-1; k++) {
if(str.substring(k, k + 1).equals(" ")) {
String nums = str.substring(k+1);
arrayInt[i] = Integer.parseInt(nums);
break;
}
}
i++;
}
}
buffy.close();
Also you can use indexOf
int i = 0;
BufferedReader buffy = new BufferedReader(new FileReader(fileName));
while((str = buffy.readLine()) != null) {
if(i < arrayInt.length) {
int k = str.indexOf(" ");
if(k!=-1) {
String nums = str.substring(k+1);
arrayInt[i] = Integer.parseInt(nums);
}
i++;
}
}
buffy.close();
File read are typically batch/ETL kind of job and if this code is going to production and would be used multiple times instead of only once then I would like to stress on Performance and Ease of Maintenance:
Only read least characters to identify the space index
#talex added a very good line of code i.e. break; inside the loop that way you won't need to read till the end of line but this would only work if the string has no spaces. If the string can contain spaces than you would need lastIndexOf space (" ") or no break; at all.
I prefer using framework method lastIndexOf assuming you are using java because:
it would start reading from right instead of left and assuming the numbers would be always less length than the string it would find index of space faster in most of the cases than reading from start.
2nd benefit is that there are lots of scenarios framework/utilities method already handled so why to reinvent the wheel
int k = str.lastIndexOf(" ");
last but not the least if someone else is going to maintain this code it would be easier for him/her as there would be enough documentation available.
Only read required lines from files
Seems like you only need certain number of lines to read arrayInt.length if that's the case then you should 'break;' the while loop once the counter i is more than array length.
I/O operations are costly and though you will get right output you would end-up scanning whole file even if it's not required.
Dont forget try-catch-finally
The code assumes that there will not be any issue and it would be able to close the file after done but there could be n number of combinations that can result in error resulting the application to crash and locking the file.
See the below example:
private Integer[] readNumbers(String fileName) throws Exception {
Integer[] arrayInt = new Integer[7];
String str = null;
BufferedReader buffy = new BufferedReader(new FileReader(fileName));
try {
int i=0;
while ((str = buffy.readLine()) != null) {
if(i> arrayInt.length){
break;
}
//get last index of " "
int k = str.lastIndexOf(" ");
if(k > -1){
String nums = str.substring(k+1);
arrayInt[i] = Integer.parseInt(nums);
}
//increment the line counter
i++;
}
} catch (Exception ex) {
//handle exception
} finally {
buffy.close();
}
return arrayInt;
}
I am struggling to use scanner class to read in a text file while skipping the blank lines.
Any suggestions?
Scanner sc = new Scanner(new BufferedReader(new FileReader("training2.txt")));
trainingData = new double[48][2];
while(sc.hasNextLine()) {
for (int i=0; i<trainingData.length; i++) {
String[] line = sc.nextLine().trim().split(" ");
if(line.length==0)
{
sc.nextLine();
}else{
for (int j=0; j<line.length; j++) {
trainingData[i][j] = Double.parseDouble(line[j]);
}
}
}
}
if(sc.hasNextLine())
{
sc.nextLine();
}
sc.close();
I am currently trying to get it working like this. But it is not working
Scanner sc = new Scanner(new BufferedReader(new FileReader("training.txt")));
trainingData = new double[48][2];
while(sc.hasNextLine()) {
String line = sc.nextLine().trim();
if(line.length()!=0)
{
for (int i=0; i<trainingData.length; i++) {
String[] line2 = sc.nextLine().trim().split(" ");
for (int j=0; j<line2.length; j++) {
trainingData[i][j] = Double.parseDouble(line2[j]);
}
}
}
}
return trainingData;
while(sc.hasNextLine()) {
for (int i=0; i<trainingData.length; i++) {
String[] line = sc.nextLine().trim().split(" ");
You can't just check the scanner once to see if it has data and then use a loop to read the lines of data. You can't assume that you have 48 lines of data just because you define your array to hold 48 lines of data.
You need to go back to the basics and learn how to read data from a file one line at a time and then you process that data.
Here is a simple example to get you started:
import java.util.*;
public class ScannerTest2
{
public static void main(String args[])
throws Exception
{
String data = "1 2\n\n3 4\n\n5 6\n7 8";
// First attempt
System.out.println("Display All Lines");
Scanner s = new Scanner( data );
while (s.hasNextLine())
{
String line = s.nextLine();
System.out.println( line );
}
// Second attempt
System.out.println("Display non blank lines");
s = new Scanner( data );
while (s.hasNextLine())
{
String line = s.nextLine();
if (line.length() != 0)
{
System.out.println( line );
}
}
// Final attempt
String[][] values = new String[5][2];
int row = 0;
System.out.println("Add data to 2D Array");
s = new Scanner( data );
while (s.hasNextLine())
{
String line = s.nextLine();
if (line.length() != 0)
{
String[] digits = line.split(" ");
values[row] = digits;
row++;
}
}
for (int i = 0; i < values.length; i++)
System.out.println( Arrays.asList(values[i]) );
}
}
The example uses a String variable to simulate data from a file.
The first block of code is how you simply read all lines of data from the file. The logic simply:
invokes the hasNextLine() method so see if there is data
invokes the nextLine() method to get the line of data
display the data that was read
repeats steps 1-3 until there is no data.
Then next block of code simply adds an "if condition" so that you only display non-blank data.
Finally the 3rd block of code is closer to what you want. As it reads each line of data, it splits the data into an array and then adds this array to the 2D array.
This is the part of code you will need to change. You will need to convert the String array to an double array before adding it to your 2D array. So change this code first to get it working. Then once this works make the necessary changes to your real application once you understand the concept.
Note in my code how the last row displays [null, null]. This is why it is not a good idea to use arrays because you never know how big the array should be. If you have less that 5 you get the null values. If you have more than 5 you will get an out of bounds exception.
Try adding this to your code:
sc.skip("(\r\n)");
It will ignore blank lines. For More information: Scanner.skip()
I'm having a problem counting the number of words in a file. The approach that I am taking is when I see a space or a newLine then I know to count a word.
The problem is that if I have multiple lines between paragraphs then I ended up counting them as words also. If you look at the readFile() method you can see what I am doing.
Could you help me out and guide me in the right direction on how to fix this?
Example input file (including a blank line):
word word word
word word
word word word
You can use a Scanner with a FileInputStream instead of BufferedReader with a FileReader. For example:-
File file = new File("sample.txt");
try(Scanner sc = new Scanner(new FileInputStream(file))){
int count=0;
while(sc.hasNext()){
sc.next();
count++;
}
System.out.println("Number of words: " + count);
}
I would change your approach a bit. First, I would use a BufferedReader to read the file file in line-by-line using readLine(). Then split each line on whitespace using String.split("\\s") and use the size of the resulting array to see how many words are on that line. To get the number of characters you could either look at the size of each line or of each split word (depending of if you want to count whitespace as characters).
This is just a thought. There is one very easy way to do it. If you just need number of words and not actual words then just use Apache WordUtils
import org.apache.commons.lang.WordUtils;
public class CountWord {
public static void main(String[] args) {
String str = "Just keep a boolean flag around that lets you know if the previous character was whitespace or not pseudocode follows";
String initials = WordUtils.initials(str);
System.out.println(initials);
//so number of words in your file will be
System.out.println(initials.length());
}
}
Just keep a boolean flag around that lets you know if the previous character was whitespace or not (pseudocode follows):
boolean prevWhitespace = false;
int wordCount = 0;
while (char ch = getNextChar(input)) {
if (isWhitespace(ch)) {
if (!prevWhitespace) {
prevWhitespace = true;
wordCount++;
}
} else {
prevWhitespace = false;
}
}
I think a correct approach would be by means of Regex:
String fileContent = <text from file>;
String[] words = Pattern.compile("\\s+").split(fileContent);
System.out.println("File has " + words.length + " words");
Hope it helps. The "\s+" meaning is in Pattern javadoc
import java.io.BufferedReader;
import java.io.FileReader;
public class CountWords {
public static void main (String args[]) throws Exception {
System.out.println ("Counting Words");
FileReader fr = new FileReader ("c:\\Customer1.txt");
BufferedReader br = new BufferedReader (fr);
String line = br.readLin ();
int count = 0;
while (line != null) {
String []parts = line.split(" ");
for( String w : parts)
{
count++;
}
line = br.readLine();
}
System.out.println(count);
}
}
Hack solution
You can read the text file into a String var. Then split the String into an array using a single whitespace as the delimiter StringVar.Split(" ").
The Array count would equal the number of "Words" in the file.
Of course this wouldnt give you a count of line numbers.
3 steps: Consume all the white spaces, check if is a line, consume all the nonwhitespace.3
while(true){
c = inFile.read();
// consume whitespaces
while(isspace(c)){ inFile.read() }
if (c == '\n'){ numberLines++; continue; }
while (!isspace(c)){
numberChars++;
c = inFile.read();
}
numberWords++;
}
File Word-Count
If in between words having some symbols then you can split and count the number of Words.
Scanner sc = new Scanner(new FileInputStream(new File("Input.txt")));
int count = 0;
while (sc.hasNext()) {
String[] s = sc.next().split("d*[.#:=#-]");
for (int i = 0; i < s.length; i++) {
if (!s[i].isEmpty()){
System.out.println(s[i]);
count++;
}
}
}
System.out.println("Word-Count : "+count);
Take a look at my solution here, it should work. The idea is to remove all the unwanted symbols from the words, then separate those words and store them in some other variable, i was using ArrayList. By adjusting the "excludedSymbols" variable you can add more symbols which you would like to be excluded from the words.
public static void countWords () {
String textFileLocation ="c:\\yourFileLocation";
String readWords ="";
ArrayList<String> extractOnlyWordsFromTextFile = new ArrayList<>();
// excludedSymbols can be extended to whatever you want to exclude from the file
String[] excludedSymbols = {" ", "," , "." , "/" , ":" , ";" , "<" , ">", "\n"};
String readByteCharByChar = "";
boolean testIfWord = false;
try {
InputStream inputStream = new FileInputStream(textFileLocation);
byte byte1 = (byte) inputStream.read();
while (byte1 != -1) {
readByteCharByChar +=String.valueOf((char)byte1);
for(int i=0;i<excludedSymbols.length;i++) {
if(readByteCharByChar.equals(excludedSymbols[i])) {
if(!readWords.equals("")) {
extractOnlyWordsFromTextFile.add(readWords);
}
readWords ="";
testIfWord = true;
break;
}
}
if(!testIfWord) {
readWords+=(char)byte1;
}
readByteCharByChar = "";
testIfWord = false;
byte1 = (byte)inputStream.read();
if(byte1 == -1 && !readWords.equals("")) {
extractOnlyWordsFromTextFile.add(readWords);
}
}
inputStream.close();
System.out.println(extractOnlyWordsFromTextFile);
System.out.println("The number of words in the choosen text file are: " + extractOnlyWordsFromTextFile.size());
} catch (IOException ioException) {
ioException.printStackTrace();
}
}
This can be done in a very way using Java 8:
Files.lines(Paths.get(file))
.flatMap(str->Stream.of(str.split("[ ,.!?\r\n]")))
.filter(s->s.length()>0).count();
BufferedReader bf= new BufferedReader(new FileReader("G://Sample.txt"));
String line=bf.readLine();
while(line!=null)
{
String[] words=line.split(" ");
System.out.println("this line contains " +words.length+ " words");
line=bf.readLine();
}
The below code supports in Java 8
//Read file into String
String fileContent=new String(Files.readAlBytes(Paths.get("MyFile.txt")),StandardCharacters.UFT_8);
//Keeping these into list of strings by splitting with a delimiter
List<String> words = Arrays.asList(contents.split("\\PL+"));
int count=0;
for(String x: words){
if(x.length()>1) count++;
}
sop(x);
So easy we can get the String from files by method: getText();
public class Main {
static int countOfWords(String str) {
if (str.equals("") || str == null) {
return 0;
}else{
int numberWords = 0;
for (char c : str.toCharArray()) {
if (c == ' ') {
numberWords++;
}
}
return ++numberWordss;
}
}
}
I want to read in a grid of numbers (n*n) from a file and copy them into a multidimensional array, one int at a time. I have the code to read in the file and print it out, but dont know how to take each int. I think i need to splitstring method and a blank delimiter "" in order to take every charcter, but after that im not sure. I would also like to change blank characters to 0, but that can wait!
This is what i have got so far, although it doesnt work.
while (count <81 && (s = br.readLine()) != null)
{
count++;
String[] splitStr = s.split("");
String first = splitStr[number];
System.out.println(s);
number++;
}
fr.close();
}
A sample file is like this(the spaces are needed):
26 84
897 426
4 7
492
4 5
158
6 5
325 169
95 31
Basically i know how to read in the file and print it out, but dont know how to take the data from the reader and put it in a multidimensional array.
I have just tried this, but it says 'cannot covernt from String[] to String'
while (count <81 && (s = br.readLine()) != null)
{
for (int i = 0; i<9; i++){
for (int j = 0; j<9; j++)
grid[i][j] = s.split("");
}
Based on your file this is how I would do it:
Lint<int[]> ret = new ArrayList<int[]>();
Scanner fIn = new Scanner(new File("pathToFile"));
while (fIn.hasNextLine()) {
// read a line, and turn it into the characters
String[] oneLine = fIn.nextLine().split("");
int[] intLine = new int[oneLine.length()];
// we turn the characters into ints
for(int i =0; i < intLine.length; i++){
if (oneLine[i].trim().equals(""))
intLine[i] = 0;
else
intLine[i] = Integer.parseInt(oneLine[i].trim());
}
// and then add the int[] to our output
ret.add(intLine):
}
At the end of this code, you will have a list of int[] which can be easily turned into an int[][].
private static int[][] readMatrix(BufferedReader br) throws IOException {
List<int[]> rows = new ArrayList<int[]>();
for (String s = br.readLine(); s != null; s = br.readLine()) {
String items[] = s.split(" ");
int[] row = new int[items.length];
for (int i = 0; i < items.length; ++i) {
row[i] = Integer.parseInt(items[i]);
}
rows.add(row);
}
return rows.toArray(new int[rows.size()][]);
}
EDIT: You just updated your post to include a sample input file, so the following won't work as-is for your case. However, the principle is the same -- tokenize the line you read based on whatever delimiter you want (spaces in your case) then add each token to the columns of a row.
You didn't include a sample input file, so I'll make a few basic assumptions.
Assuming that the first line of your input file is "n", and the remainder is the n x n integers you want to read, you need to do something like the following:
public static int[][] parseInput(final String fileName) throws Exception {
BufferedReader reader = new BufferedReader(new FileReader(fileName));
int n = Integer.parseInt(reader.readLine());
int[][] result = new int[n][n];
String line;
int i = 0;
while ((line = reader.readLine()) != null) {
String[] tokens = line.split("\\s");
for (int j = 0; j < n; j++) {
result[i][j] = Integer.parseInt(tokens[j]);
}
i++;
}
return result;
}
In this case, an example input file would be:
3
1 2 3
4 5 6
7 8 9
which would result in a 3 x 3 array with:
row 1 = { 1, 2, 3 }
row 2 = { 4, 5, 6 }
row 3 = { 7, 8, 9 }
If your input file doesn't have "n" as the first line, then you can just wait to initialize your final array until you've counted the tokens on the first line.