How is the method read() in FileReader moving through a file? - java

So I just wrote a program that reads a specific file and returns the frequency of each character used. This was done by using a singly linked list(not java LinkedList, but very similar). What I want to know is why this:
while(txtFile.read() != -1){
Character letter = (char) txtFile.read();
freqBag.add(Character.toLowerCase(letter));
}
doesn't work(it doesn't return the correct frequency of the given character), and why this:
int c;
while((c = txtFile.read()) != -1){
Character letter = (char) c;
freqBag.add(Character.toLowerCase(letter));
}
works. I wrote the first one, and a friend helped me fix it.

It doesn't work because you're discarding characters. Each read() function brings back the next byte (as a signed int), so your code is dropping every even character (0, 2, 4...).
while(txtFile.read() != -1){ // Read and discard a character
Character letter = (char) txtFile.read(); // Read a character into letter
reqBag.add(Character.toLowerCase(letter)); // Store this letter
}
Your friend's code shouldn't be working either:
int c; // variable outside the loop
while((c = txtFile.read()) != -1){ // Read a character into c, compare to -1
Character letter = (char) txtFile.read(); // Read another character
freqBag.add(Character.toLowerCase(letter)); // Store this letter
}
The correct method would be to read just once:
int c;
while((c = txtFile.read()) != -1) {
freqBag.add(Character.toLowerCase((char)c));
}
I suspect either you have a typo, or you used a different file and didn't realize that letters were still being dropped.

First of all you need to keep in mind that when you call read method you already read one byte from file, so if you do it inside of your while statement you lose one byte.
Second thing is that for me (considering operators precedence) this two pieces of code does exact same thing so the problem might be in other part of code.

Related

Reading wrong characters from file Java

I am reading characters from file by skipping 2 times
fis = new FileInputStream("C:/data/25130.in ");
fis.skip(24305);//This position contains _(UnderScore)
l=fis.read();
fis.skip(24312);//This position also contains _(Underscore)
i = fis.read();
ch= (char)l;
c = (char)i;
System.out.print("Ch: "+ch);//Returns Underscore
System.out.print("C: "+c); // Returns 9 instead of UnderScore
If i delete the fist skip like the following
fis = new FileInputStream("C:/data/25130.in ");
fis.skip(24312);//This position also contains _(underscore)
i = fis.read();
c = (char)i;
System.out.print("C: "+c); // Now it returns Underscore
I intend to read 2 characters at 2 positions..Where was the problem
fis.skip(24312) skips that many characters (it reads 24312 bytes and throws them away....)
What you want to do is "position" the input stream, and throw away only (24312 - 24305) bytes, or fis.skip(7)
EDIT: hmmm, lutzh is right, you want to fis.skip(6) but....
what you really want to do is use a RandomAccessFile and use the seek(position) method...
I think FileInputStream.skip does not go to the given position, it skips the given number of bytes. So after your second skip you will end up at 48617, plus one more that you actually read.
Try 6 as parameter for your second skip.

Comparing String Integers Issue

I have a scanner that reads a 7 character alphanumeric code (inputted by the user). the String variable is called "code".
The last character of the code (7th character, 6th index) MUST BE NUMERIC, while the rest may be either numeric or alphabetical.
So, I sought ought to make a catch, which would stop the rest of the method from executing if the last character in the code was anything but a number (from 0 - 9).
However, my code does not work as expected, seeing as even if my code ends in an integer between 0 and 9, the if statement will be met, and print out "last character in code is non-numerical).
example code: 45m4av7
CharacterAtEnd prints out as the string character 7, as it should.
however my program still tells me my code ends non-numerically.
I'm aware that my number values are string characters, but it shouldnt matter, should it?
also I apparently cannot compare actual integer values with an "|", which is mainly why im using String.valueOf, and taking the string characters of 0-9.
String characterAtEnd = String.valueOf(code.charAt(code.length()-1));
System.out.println(characterAtEnd);
if(!characterAtEnd.equals(String.valueOf(0|1|2|3|4|5|6|7|8|9))){
System.out.println("INVALID CRC CODE: last character in code in non-numerical.");
System.exit(0);
I cannot for the life of me, figure out why my program is telling me my code (that has a 7 at the end) ends non-numerically. It should skip the if statement and continue on. right?
The String contains method will work here:
String digits = "0123456789";
digits.contains(characterAtEnd); // true if ends with digit, false otherwise
String.valueOf(0|1|2|3|4|5|6|7|8|9) is actually "15", which of course can never be equal to the last character. This should make sense, because 0|1|2|3|4|5|6|7|8|9 evaluates to 15 using integer math, which then gets converted to a String.
Alternatively, try this:
String code = "45m4av7";
char characterAtEnd = code.charAt(code.length() - 1);
System.out.println(characterAtEnd);
if(characterAtEnd < '0' || characterAtEnd > '9'){
System.out.println("INVALID CRC CODE: last character in code in non-numerical.");
System.exit(0);
}
You are doing bitwise operations here: if(!characterAtEnd.equals(String.valueOf(0|1|2|3|4|5|6|7|8|9)))
Check out the difference between | and ||
This bit of code should accomplish your task using regular expressions:
String code = "45m4av7";
if (!code.matches("^.+?\\d$")){
System.out.println("INVALID CRC CODE");
}
Also, for reference, this method sometimes comes in handy in similar situations:
/* returns true if someString actually ends with the specified suffix */
someString.endsWith(suffix);
As .endswith(suffix) does not take regular expressions, if you wanted to go through all possible lower-case alphabet values, you'd need to do something like this:
/* ASCII approach */
String s = "hello";
boolean endsInLetter = false;
for (int i = 97; i <= 122; i++) {
if (s.endsWith(String.valueOf(Character.toChars(i)))) {
endsInLetter = true;
}
}
System.out.println(endsInLetter);
/* String approach */
String alphabet = "abcdefghijklmnopqrstuvwxyz";
boolean endsInLetter2 = false;
for (int i = 0; i < alphabet.length(); i++) {
if (s.endsWith(String.valueOf(alphabet.charAt(i)))) {
endsInLetter2 = true;
}
}
System.out.println(endsInLetter2);
Note that neither of the aforementioned approaches are a good idea - they are clunky and rather inefficient.
Going off of the ASCII approach, you could even do something like this:
ASCII reference : http://www.asciitable.com/
int i = (int)code.charAt(code.length() - 1);
/* Corresponding ASCII values to digits */
if(i <= 57 && i >= 48){
System.out.println("Last char is a digit!");
}
If you want a one-liner, stick to regular expressions, for example:
System.out.println((!code.matches("^.+?\\d$")? "Invalid CRC Code" : "Valid CRC Code"));
I hope this helps!

Encoding-aware RandomAccessReader implementation?

The default implementation of RandomAccessFile is 'broken', in the sense that you can't specify which encoding your file is in.
I'm looking for an alternative which matches the following criteria:
Encoding-aware
Random access! (dealing with very big files, need to be able to position the cursor using a byte offset without streaming the whole thing).
I had a poke around in Commons IO, but there's nothing there. I'd rather not have to implement this myself, because there are entirely too many places it could go wrong.
RandomAccessFile is intended for accessing binary data. It is not possible to efficiently create a random access encoded file which is appropriate in all situations.
Even if you find such a solution I would check it carefully to ensure it suits your needs.
If you were to write it, I would suggest considering a random position of row and column rather than character offset from the start of the file.
This has the advantage that you only have to remember where the start of each line is and you can scan the line to get your character. If you index the position of every character, this could use 4 bytes for every character (assuming the file is < 4 GB)
The answer turned out to be less painful than I assumed:
// This gives me access to buffering and charset magic
new BufferedReader(new InputStreamReader(Channels.newInputStream(randomAccessFile.getChannel()), encoding)), encoding
....
I can then implement a readLine() method which reads character by character. Using String.getBytes(encoding) I can keep track of the offset in the file. Calling seek() on the underlying RandomAccessFile allows me to reposition the cursor at will. There are probably some bugs lurking in there, but the basic tests seem to work.
public String readLine() throws IOException {
eol = "";
lastLineByteCount = 0;
StringBuilder builder = new StringBuilder();
char[] characters = new char[1];
int status = reader.read(characters, 0, 1);
if (status == -1) {
return null;
}
char c = characters[0];
while (status != -1) {
if (c == '\n') {
eol += c;
break;
}
if (c == '\r') {
eol += c;
} else {
builder.append(c);
}
status = reader.read(characters, 0, 1);
c = characters[0];
}
String line = builder.toString();
lastLineByteCount = line.getBytes(encoding).length + eol.getBytes(encoding).length;
return line;
}

In java, when reading in a file one character at a time, how do I determine EOF?

I am having to read in a while and use an algorithm to code each letter and then print them to another file. I know generally to find the end of a file you would use readLine and check to see if its null. I am using a bufferedReader. Is there anyway to check to see if there is another character to read in? Basically, how do I know that I just read in the last character of the file?
I guess i could use readline and see if there was another line if I knew how to determine when I was at the end of my current line.
I found where the File class has a method called size() that supposidly turns the length in bytes of the file. Would that be telling me how many characters are in the file? Could i do while(charCount<length) ?
I don't exactly understand what you want to do. I guess you may want to read a file character by character. If so, you can do:
FileInputStream fileInput = new FileInputStream("file.txt");
int r;
while ((r = fileInput.read()) != -1) {
char c = (char) r;
// do something with the character c
}
fileInput.close();
FileInputStream.read() returns -1 when there are no more characters to read. It returns an int and not a char so a cast is mandatory.
Please note that this won't work if your file is in UTF-8 format and contains multi-byte characters. In that case you have to wrap the FileInputStream in an InputStreamReader and specify the appropriate charset. I'm omitting it here for the sake of simplicity.
From my understanding, buffers will return -1 if there are no characters left. So you could write:
BufferedInputStream in = new BufferedInputStream(new FileInputStream("filename"));
while (currentChar = in.read() != -1) {
//do something
}
in.close();

Efficient ByteArrayInputStream manipulation

I am working with a ByteArrayInputStream that contains an XML document consisting of one element with a large base 64 encoded string as the content of the element. I need to remove the surrounding tags so I can decode the text and output it as a pdf document.
What is the most efficient way to do this?
My knee-jerk reaction is to read the stream into a byte array, find the end of the start tag, find the beginning of the end tag and then copy the middle part into another byte array; but this seems rather inefficient and the text I am working with can be large at times (128KB). I would like a way to do this without the extra byte arrays.
Base 64 does not use the characters < or > so I'm assuming you are using a web-safe base64 variant meaning you do not need to worry about HTML entities or comments inside the content.
If you are really sure that the content has this form, then do the following:
Scan from the right looking for a '<'. This will be the beginning of the close tag.
Scan left from that position looking for a '>'. This will be the end of the start tag.
The base 64 content is between those two positions, exclusive.
You can presize your second array by using
((end - start + 3) / 4) * 3
as an upper bound on the decoded content length, and then b64decode into it. This works because each 4 base64 digits encodes 3 bytes.
If you want to get really fancy, since you know the first few bytes of the array contain ignorable tag data and the encoded data is smaller than the input, you could destructively decode the data over your current byte buffer.
Do your search and conversion while you are reading the stream.
// find the start tag
byte[] startTag = new byte[]{'<', 't', 'a', 'g', '>'};
int fnd = 0;
int tmp = 0;
while((tmp = stream.read()) != -1) {
if(tmp == startTag[fnd])
fnd++;
else
fnd=0;
if(fnd == startTage.size()) break;
}
// get base64 bytes
while(true) {
int a = stream.read();
int b = stream.read();
int c = stream.read();
int d = stream.read();
byte o1,o2,o3; // output bytes
if(a == -1 || a == '<') break;
//
...
outputStream.write(o1);
outputStream.write(o2);
outputStream.write(o3);
}
note The above was written in my web browser, so syntax errors may exist.

Categories