I'm trying to find the number of characters in a given text file.
I've tried using both a scanner and a BufferedReader, but I get conflicting results. With the use of a scanner I concatenate every line after I append a new line character. E.g. like this:
FileReader reader = new FileReader("sampleFile.txt");
Scanner lineScanner = new Scanner(reader);
String totalLines = "";
while (lineScanner.hasNextLine()){
String line = lineScanner.nextLine()+'\n';
totalLines += line;
}
System.out.println("Count "+totalLines.length());
This returns the true character count for my file, which is 5799
Whereas when I use:
BufferedReader reader = new BufferedReader(new FileReader("sample.txt"));
int i;
int count = 0;
while ((i = in.read()) != -1) {
count++;
}
System.out.println("Count "+count);
I get 5892.
I know using the lineScanner will be off by one if there is only one line, but for my text file I get the correct ouput.
Also in notepad++ the file length in bytes is 5892 but the character count without blanks is 5706.
Your file may have lines terminated with \r\n rather than \n. That could cause your discrepancy.
You have to consider the newline/carriage returns character in a text file. This also counts as a character.
I would suggest using the BufferedReader as it will return more accurate results.
Related
I am reading a file with comma separated values which when split into an array will have 10 values for each line . I expected the file to have line breaks so that
line = bReader.readLine()
will give me each line. But my file doesnt have a line break. Instead after the first set of values there are lots of spaces(465 to be precise) and then the next line begins.
So my above code of readLine() is reading the entire file in one go as there are no lined breaks. Please suggest how best to efficiently tackle this scenario.
One way is to replace String with 465 spaces in your text with new line character "\n" before iterating it for reading.
I second Ninan's answer: replace the 465 spaces with a newline, then run the function you were planning on running earlier.
For aesthetics and readability I would suggest using Regex's Pattern to replace the spaces instead of a long unreadable String.replace(" ").
Your code could like below, but replace 6 with 465:
// arguments are passed using the text field below this editor
public static void main(String[] args)
{
String content = "DOG,CAT MOUSE,CHEESE";
Pattern p = Pattern.compile("[ ]{6}",
Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
String newString = p.matcher(content).replaceAll("\n");
System.out.println(newString);
}
My suggestion is read file f1.txt and write to anther file f2.txt by removing all empty lines and spaces then read f2.txt something like
FileReader fr = new FileReader("f1.txt");
BufferedReader br = new BufferedReader(fr);
FileWriter fw = new FileWriter("f2.txt");
String line;
while((line = br.readLine()) != null)
{
line = line.trim(); // remove leading and trailing whitespace
if (!line.equals("")) // don't write out blank lines
{
fw.write(line, 0, line.length());
}
}
Then try using your code.
You might create your own subclass of a FilterInputStream or a PushbackInputStream and pass that to an InputStreamReader. One overrides int read().
Such a class unfortunately needs a bit of typing. (A nice excercise so to say.)
private static final int NO_CHAR = -2;
private boolean fromCache;
private int cachedSpaces;
private int cachedNonSpaceChar = NO_CHAR;
int read() throws IOException {
if (fromCache) {
if (cachecSpaces > 0) ...
if (cachedNonSpaceChar != NO_CHAR) ...
...
}
int ch = super.read();
if (ch != -1) {
...
}
return ch;
}
The idea is to cache spaces till either a nonspace char, and in read() either take from the cache, return \n instead, call super.read() when not from cache, recursive read when space.
My understanding is that you have a flat CSV file without proper line break, which supposed to have 10 values on each line.
Updated:
1. (Recommended) You can use Scanner class with useDelimiter to parse csv effectively, assuming you are trying to store 10 values from a line:
public static void parseCsvWithScanner() throws IOException {
Scanner scanner = new Scanner(new File("test.csv"));
// set your delimiter for scanner, "," for csv
scanner.useDelimiter(",");
// storing 10 values as a "line"
int LINE_LIMIT = 10;
// implement your own data structure to store each value of CSV
int[] tempLineArray = new int[LINE_LIMIT];
int lineBreakCount = 0;
while(scanner.hasNext()) {
// trim start and end spaces if there is any
String temp = scanner.next().trim();
tempLineArray[lineBreakCount++] = Integer.parseInt(temp);
if (lineBreakCount == LINE_LIMIT) {
// replace your own logic for handling the full array
for(int i=0; i<tempLineArray.length; i++) {
System.out.print(tempLineArray[i]);
} // end replace
// resetting array and counter
tempLineArray = new int[LINE_LIMIT];
lineBreakCount = 0;
}
}
scanner.close();
}
Or use the BufferedReader.
You might not need the ArrayList to store all values if there is memory issue by replacing your own logic.
public static void parseCsv() throws IOException {
BufferedReader br = new BufferedReader(new FileReader(file));
// your delimiter
char TOKEN = ',';
// your requirement of storing 10 values for each "line"
int LINE_LIMIT = 10;
// tmp for storing from BufferedReader.read()
int tmp;
// a counter for line break
int lineBreakCount = 0;
// array for storing 10 values, assuming the values of CSV are integers
int[] tempArray = new int[LINE_LIMIT];
// storing tempArray of each line to ArrayList
ArrayList<int[]> lineList = new ArrayList<>();
StringBuilder sb = new StringBuilder();
while((tmp = br.read()) != -1) {
if ((char)tmp == TOKEN) {
if (lineBreakCount == LINE_LIMIT) {
// your logic to handle the current "line" here.
lineList.add(tempArray);
// new "line"
tempArray = new int[LINE_LIMIT];
lineBreakCount = 0;
}
// storing current value from buffer with trim of spaces
tempArray[lineBreakCount] =
Integer.parseInt(sb.toString().trim());
lineBreakCount++;
// clear the buffer
sb.delete(0, sb.length());
}
else {
// add current char from BufferedReader if not delimiter
sb.append((char)tmp);
}
}
br.close();
}
Basically I've got an assignment which reads multiple lines from a .txt file.
There are 4 values in the text file per line and each value is separated by 2 spaces.
There are about 10 lines of data in the file.
After taking the input from the file the program then puts it onto a Database. The database connection functionality works fine.
My issue now is with reading from the file using a BufferedReader.
The issue is that if I uncomment any 1 of the 3 lines at the bottom the BufferedReader reads every other line. And if I don't use them then there's an exception as the next input is of type String.
I have contemplated using a Scanner with the .hasNextLine() method.
Any thoughts on what could be the problem and how to fix it?
Thanks.
File file = new File(FILE_INPUT_NAME);
FileReader fr = new FileReader(file);
BufferedReader readFile = new BufferedReader(fr);
String line = null;
while ((line = readFile.readLine()) != null) {
String[] split = line.split(" ", 4);
String id = split[0];
nameFromFile = split[1];
String year = split[2];
String mark = split[3];
idFromFile = Integer.parseInt(id);
yearOfStudyFromFile = Integer.parseInt(year);
markFromFile = Integer.parseInt(mark);
//line = readFile.readLine();
//readFile.readLine();
//System.out.println(readFile.readLine());
}
Edit: There was an error in the formatting of the .txt file. a missing value.
But now I get an ArrayOutOfBoundsException.
Edit edit: Another error in the .txt file! Turns out there was a single space instead of a double. It seems to be working now. But any advice on how to deal with file errors like this in the future?
The issue is that if I uncomment any 1 of the 3 lines at the bottom the BufferedReader reads every other line.
Correct. If you put any of those lines of code in, the line of text read will be thrown away and not processed. You're already reading in the while condition. You don't need another read. If you put any of those lines in, they will be thrown away and not proce
A compilable version of the code posted could be
public void read() throws IOException {
File file = new File(FILE_INPUT_NAME);
FileReader fr = new FileReader(file);
BufferedReader readFile = new BufferedReader(fr);
String line;
while ((line = readFile.readLine()) != null) {
String[] split = line.split(" ", 4);
if (split.length != 4) { // Not enough tokens (e.g., empty line) read
continue;
}
String id = split[0];
String nameFromFile = split[1];
String year = split[2];
String mark = split[3];
int idFromFile = Integer.parseInt(id);
int yearOfStudyFromFile = Integer.parseInt(year);
int markFromFile = Integer.parseInt(mark);
//line = readFile.readLine();
//readFile.readLine();
//System.out.println(readFile.readLine());
}
}
The above uses a single space (" " instead of the original " "). To split on any number of changes, a regular expression can be used, e.g. "\\s+". Of course, exactly 2 spaces can also be used, if that reflects the structure of the input data.
What the method should do with the extracted values (e.g., returning them in an object of some type, or saving them to a database directly), is up to the application using it.
I am taking a the programming class where we have to compress a file using a Huffman Tree and decompress it.
I am running into a problem where I am unable to capture the last newline character of a txt file.
E.G.
This is a line
This is a secondline
//empty line
So if I compress and decompress the above text in a file, I end up with a file with this
This is a line
This is a secondline
Right now I'm doing
while(Scanner.hasNextLine()){
char[] cArr = file.nextLine().toCharArray();
//count amount of times a character appears with a hashmap
if(file.hasNextLine()){
//add an occurrence of \n to the hashmap
}
}
I understand the problem is that the last line technically does not have a "Scanner.hasNextline()" since I just consumed the last '\n' of the file with the nextLine() call.
Upon realizing that I have tried doing useDelimiter("") and Scanner.next() instead of Scanner.nextLine() and both still lead to similar problems.
So is there a way to fix this?
Thanks in advance.
Not to completely change your code or approach, using StringBuilder seems to work well.
File testfile = new File("test.txt");
StringBuilder stringBuffer = new StringBuilder();
try{
BufferedReader reader = new BufferedReader(new FileReader(testfile));
char[] buff = new char[500];
for (int charsRead; (charsRead = reader.read(buff)) != -1; ) {
stringBuffer.append(buff, 0, charsRead);
}
}
catch(Exception e){
System.out.print(e);
}
System.out.println(stringBuffer);
Checking the total bytes read:
System.out.println(stringBuffer.length()); // 51
List of file size:
ls -l .
51 test.txt
Bytes read match, so it appears it got all lines including the blank line.
note: I use Java 6, modify to suit your version.
Hope this helps.
Came across this code which replaces all the characters of the given value.
File temp = File.createTempFile("newfile", ".txt");
FileWriter fw = new FileWriter(temp);
Reader reader = new FileReader(file);
BufferedReader br = new BufferedReader(reader);
while (br.ready()) {
fw.write(br.readLine().replaceAll("n", "j") + "\n");
}
fw.close();
br.close();
reader.close();
temp.renameTo(file);
}
instead of replacing all 'n's with 'j's isn't there a way to specify the index I want to change only?
You can use replaceFirst (that accepts a regex), using the following pattern:
(?<=.{N-1}).
Where "N" is the index you want to replace.
Of course there are many alternatives, look at the String API to fuel your creative fire.
The line within the while statement as follows:
fw.write(br.readLine().replaceAll("n", "j") + "\n");
This can be expanded to:
String str = br.readLine(); //get text
String replace = str.replaceAll("n", "j"); //replace content
replace = replace + "\n"; //add new line
fw.write(replace); //write to file
For your instance of replacing a certain index, you would want to do something similar:
StringBuilder str = new StringBuilder(br.readLine()); //Read line into StringBuilder
if(str.length() > 3) //Check if string is long enough
str.setChar(4, 'x'); //Replace character in line at index 4 to 'x'
fw.write(str); //write to file
I believe I am not using correctly String Tokenizer. Here is my code:
buffer = new byte[(int) (end - begin)];
fin.seek(begin);
fin.read(buffer, 0, (int) (end - begin));
StringTokenizer strk = new StringTokenizer(new String(buffer),
DELIMS,true);
As you can see I am reading a chunk of lines from a file(end and begin are line numbers) and I am transfering the data to a string tokenizer. My delimitators are:
DELIMS = "\r\n ";
because I want to separate words that have a space between them, or are on the next line.
However this code sometimes separates whole words also. What could be the explanation?? Is my DELIMS string conceived wrong?
Also I am passing "true" as an argument to the tokenizer because I want the delimitators to be treated as tokens as well.( I want this because I want to count the line I am currently at)
Could you please help me. Thanks a lot.
To start with, your method for converting bytes into a String is a bit suspect, and this overall method will be less-than-efficient, especially for a larger file.
Are you required to use StringTokenizer? If not, I'd strongly recommend using Scanner instead. I'd provide you with an example, but will ask that you just refer to the Javadocs instead, which are quite comprehensive and already contain good examples. That said, it accepts delimiters as well - but as Regular Expressions, so just be aware.
You could always wrap your input stream in a LineNumberReader. That will keep track of the line number for you. LineNumberReader extends BufferedReader, which has a readLine() method. With that, you could use a regular StringTokenizer to get your words as tokens. You could use regular expressions or Scanner, but for this case, StringTokenizer is simpler for beginners to understand and quicker.
You must have a RandomAccessFile. You didn't specify that, but I'm guessing based on the methods you used. Try something like:
byte [] buffer = ...; // you know how to get this.
ByteArrayInputStream stream = new ByteArrayInputStream(buffer);
// if you have java.util.Scanner
{
int lineNumber = 0;
Scanner s = new Scanner(stream);
while (s.hasNextLine()) {
lineNum++;
String line = s.nextLine();
System.out.format("I am on line %s%n", lineNum);
Scanner lineScanner = new Scanner(line);
while (lineScanner.hasNext()) {
String word = lineScanner.next();
// do whatever with word
}
}
}
// if you don't have java.util.Scanner, or want to use StringTokenizer
{
LineNumberReader reader = new LineNumberReader(
new InputStreamReader(stream));
String line = null;
while ((line = reader.nextLine()) != null) {
System.out.println("I am on line " + reader.getLineNumber());
StringTokenizer tok = new StringTokenizer(line);
while (tok.hasMoreTokens()) {
String word = tok.nextToken();
// do whatever with word
}
}
}