Java Reading from a File using FileChannel - java

I've been getting some strange outputs from this code upon reading from a large file, the file was printed using a while loop to 99,999 digits however, upon reading the file and printing the contents it only outputs 99,988 lines. Also, is using a ByteBuffer the only option for reading back the file? I've seen some other code using a CharBuffer, but I'm not sure which one I should use, and in what cases I should use them.
NOTE: filePath is a Path object pointing to a file on the disk.
private void byteChannelTrial() throws Exception {
try (FileChannel channel = (FileChannel) Files.newByteChannel(filePath, READ)) {
ByteBuffer buffer = ByteBuffer.allocate(1024);
String encoding = System.getProperty("file.encoding");
while (channel.read(buffer) != -1) {
buffer.rewind();
System.out.print(Charset.forName(encoding).decode(buffer));
buffer.clear();
}
}

Usually, flip() is called before buffer data is read. the rewind() method does bellowing works:
public final Buffer rewind() {
position = 0;
mark = -1;
return this;
}
it does not set the 'limit' as flip() does:
public final Buffer flip() {
limit = position;
position = 0;
mark = -1;
return this;
}
So, take a tray using flip() instead of rewind() before reading.

For reading text BufferedReader is the best
try (BufferedReader rdr = Files.newBufferedReader(Paths.get("path"),
Charset.defaultCharset())) {
for (String line; (line = rdr.readLine()) != null;) {
System.out.println(line);
}
}
BTW
String encoding = System.getProperty("file.encoding");
Charset.forName(encoding);
is equivalent to
Charset.defaultCharset();

Well, it actually turns out that this combination works:
private void byteChannelTrial() throws Exception {
try (FileChannel channel = (FileChannel) Files.newByteChannel(this.filePath, READ)) {
ByteBuffer buffer = ByteBuffer.allocate(1024);
while (channel.read(buffer) != -1) {
buffer.flip();
System.out.print(Charset.defaultCharset().decode(buffer));
buffer.clear();
}
}
}
As to why it works, i am not quite certain.

Related

Filter an InputStream line-by-line

I am retrieving large gzipped files from Amazon S3. I would like to be able to transform each line of these files on-the-fly and upload the output to another S3 bucket.
The upload API takes an InputStream as input.
S3Object s3object = s3.fetch(bucket, key);
InputStream is = new GZIPInputStream(s3object.getObjectContent());
// . . . ?
s3.putObject(new PutObjectRequest(bucket, key, is, metadata));
I believe that the most efficient way of doing this is to create my own custom input stream which transforms the original input stream into another input stream. I am not very familiar with this approach and curious to find out more.
The basic idea is as follows.
It's not terribly efficient but should get the job done.
public class MyInputStream extends InputStream {
private final BufferedReader input;
private final Charset encoding = StandardCharsets.UTF_8;
private ByteArrayInputStream buffer;
public MyInputStream(InputStream is) throws IOException {
input = new BufferedReader(new InputStreamReader(is, this.encoding));
nextLine();
}
#Override
public int read() throws IOException {
if (buffer == null) {
return -1;
}
int ch = buffer.read();
if (ch == -1) {
if (!nextLine()) {
return -1;
}
return read();
}
return ch;
}
private boolean nextLine() throws IOException {
String line;
while ((line = input.readLine()) != null) {
line = filterLine(line);
if (line != null) {
line += '\n';
buffer = new ByteArrayInputStream(line.getBytes(encoding));
return true;
}
}
return false;
}
#Override
public void close() throws IOException {
input.close();
}
private String filterLine(String line) {
// Filter the line here ... return null to skip the line
// For example:
return line.replace("ABC", "XYZ");
}
}
nextLine() pre-fills the line buffer with a (filtered) line. Then read() (called by the upload job) fetches bytes from the buffer one-by-one and calls nextLine() again to load the next line.
Use as:
s3.putObject(new PutObjectRequest(bucket, key, new MyInputStream(is), metadata));
A performance improvement could be to also implement the int read(byte[] b, int off, int len) method (if cpu use is high) and use a BufferedInputStream in case the S3 client doesn't internally use a buffer (I don't know).
new BufferedReader(is).lines()

InputStream returns unexpected -1/empty

I seem to be hitting a constant unexpected end of my file. My file contains first a couple of strings, then byte data.
The file contains a few separated strings, which my code reads correctly.
However when I begin to read the bytes, it returns nothing. I am pretty sure it has to do with me using the Readers. Does the BufferedReader read the entire stream? If so, how can I solve this?
I have checked the file, and it does contain plenty of data after the strings.
InputStreamReader is = new InputStreamReader(in);
BufferedReader br = new BufferedReader(is);
String line;
{
line = br.readLine();
String split[] = line.split(" ");
if (!split[0].equals("#binvox")) {
ErrorHandler.log("Not a binvox file");
return false;
}
ErrorHandler.log("Binvox version: " + split[1]);
}
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int nRead, cnt = 0;
byte[] data = new byte[16384];
while ((nRead = in.read(data, 0, data.length)) != -1) {
buffer.write(data, 0, nRead);
cnt += nRead;
}
buffer.flush();
// cnt is always 0
The binvox format is as followed:
#binvox 1
dim 64 40 32
translate -3 0 -2
scale 6.434
data
[byte data]
I'm basically trying to convert the following C code to Java:
http://www.cs.princeton.edu/~min/binvox/read_binvox.html
For reading the whole String you should do this:
ArrayList<String> lines = new ArrayList<String>();
while ((line = br.readLine();) != null) {
lines.add(line);
}
and then you may do a cycle to split each line, or just do what you have to do during the cycle.
As icza has alraedy wrote, you can't create a InputStream and a BufferedReader and user both. The BufferedReader will read from the InputStream as many as he wants, and then you can't access your data from the InputStream.
You have several ways to fix it:
Don't use any Reader. Read the bytes yourself from an InputStream and call new String(bytes) on it.
Store your data encoded (e.g. Base64). Encoded data can be read from a Reader. I would recommend this solution. That'll look like that:
public byte[] readBytes (Reader in) throws IOException
{
String base64 = in.readLine(); // Note that a Base64-representation never contains \n
byte[] data = Base64.getDecoder().decode(base64);
return data
}
You can't wrap an InputStream in a BufferedReader and use both.
As its name hints, BufferedReader might read ahead and buffer data from the underlying InputStream which then will not be available when reading from the underlying InputStream directly.
Suggested solution is not to mix text and binary data in one file. They should be stored in 2 separate files and then they can be read separately. If the remaining data is not binary, then you should not read them via InputStream but via your wrapper BufferedReader just as you read the first lines.
I recommend to create a BinvoxDetectorStream that pre-reads some bytes
public class BinvoxDetectorStream extends InputStream {
private InputStream orig;
private byte[] buffer = new byte[4096];
private int buflen;
private int bufpos = 0;
public BinvoxDetectorStream(InputStream in) {
this.orig = new BufferedInputStream(in);
this.buflen = orig.read(this.buffer, 0, this.buffer.length);
}
public BinvoxInfo getBinvoxVersion() {
// creating a reader for the buffered bytes, to read a line, and compare the header
ByteArrayInputStream bais = new ByteArrayInputStream(buffer);
BufferedReader rdr = new BufferedReader(new InputStreamReader(bais)));
String line = rdr.readLine();
String split[] = line.split(" ");
if (split[0].equals("#binvox")) {
BinvoxInfo info = new BinvoxInfo();
info.version = split[1];
split = rdr.readLine().split(" ");
[... parse all properties ...]
// seek for "data\r\n" in the buffered data
while(!(bufpos>=6 &&
buffer[bufpos-6] == 'd' &&
buffer[bufpos-5] == 'a' &&
buffer[bufpos-4] == 't' &&
buffer[bufpos-3] == 'a' &&
buffer[bufpos-2] == '\r' &&
buffer[bufpos-1] == '\n') ) {
bufpos++;
}
return info;
}
return null;
}
#Override
public int read() throws IOException {
if(bufpos < buflen) {
return buffer[bufpos++];
}
return orig.read();
}
}
Then, you can detect the Binvox version without touching the original stream:
BinvoxDetectorStream bds = new BinvoxDetectorStream(in);
BinvoxInfo info = bds.getBinvoxInfo();
if (info == null) {
return false;
}
...
[moving bytes in the usual way, but using bds!!! ]
This way we preserve the original bytes in bds, so we'll be able to copy it later.
I saw someone else's code that solved exactly this.
He/she used DataInputStream, which can do a readLine (although deprecated) and readByte.

How to safely read a text file that might be binary?

We have some Java code that processes a user-provided file by looping through the file using BufferedReader.readline() to read in each line.
The problem is that when the user uploads a file that has extremely long lines, like an arbitrary binary JPG or such, this can cause out-of-memory issues. Even the first readline() may not return. We want to reject the files with long lines before it OOMs.
Is there a standard Java idiom to handle this, or do we just change to read() and write our own safe version of readLine()?
You will need to read the file character by character (or chunk by chunk) yourself (via some form of read()), and then form the lines into Strings when you encounter a newline character. This way you can throw an Exception (avoiding the OOM error) if some maximum number of characters is hit before a newline is encountered.
If you use a Reader instance it should not be too difficult to implement this code, just read from the Reader into a buffer (which you allocate to your maximum possible line length), and then convert the buffer to String when you encounter a newline (or throw an exception if you don't).
There doesn't appear to be any way to set a line length limit for BufferedReader.readLine(), so it will accumulate the entire line before feeding it to your code, however long that line may be.
Therefore, you'll have to do the line-splitting part yourself, and give up once a line is too long.
You might use the following as a starting point:
class LineTooLongException extends Exception {}
class ShortLineReader implements AutoCloseable {
final Reader reader;
final char[] buf = new char[8192];
int nextIndex = 0;
int maxIndex = 0;
boolean eof;
public ShortLineReader(Reader reader) {
this.reader = reader;
}
public String readLine() throws IOException, LineTooLongException {
if (eof) {
return null;
}
for (;;) {
for (int i = nextIndex; i < maxIndex; i++) {
if (buf[i] == '\n') {
String result = new String(buf, nextIndex, i - nextIndex);
nextIndex = i + 1;
return result;
}
}
if (maxIndex - nextIndex > 6000) {
throw new LineTooLongException();
}
System.arraycopy(buf, nextIndex, buf, 0, maxIndex - nextIndex);
maxIndex -= nextIndex;
nextIndex = 0;
int c = reader.read(buf, maxIndex, buf.length - maxIndex);
if (c == -1) {
eof = true;
return new String(buf, nextIndex, maxIndex - nextIndex);
} else {
maxIndex += c;
}
}
}
#Override
public void close() throws Exception {
reader.close();
}
}
public class Test {
public static void main(String[] args) throws Exception {
File file = new File("D:\\t\\output.log");
// try (OutputStream fos = new BufferedOutputStream(new FileOutputStream(file))) {
// for (int i = 0; i < 10000000; i++) {
// fos.write(65);
// }
// }
try (ShortLineReader r = new ShortLineReader(new FileReader(file))) {
String s;
while ((s = r.readLine()) != null) {
System.out.println(s);
}
}
}
}
Note: This assumes unix-style line termination.
Use BufferedInputStream to read binary data rather than BufferedReader...
for example if it is an image file, using ImageIO and InputStream you can do it like this..
File file = new File("image.gif");
image = ImageIO.read(file);
InputStream is = new BufferedInputStream(new FileInputStream("image.gif"));
image = ImageIO.read(is);
hope it helps...
There doesn't appear to be a definite way but a few things you can do:
Check file headers. jMimeMagic seems to be a pretty good library for this purpose.
Check the type of characters the file contains. Essentially do statistical analysis on the first 'x' bytes of the file and use that to estimate the rest of the content.
Check for newlines '\n' or '\r' in the files, binary files usually wont contain newlines.
Hope that helps.

BufferedReader for large ByteBuffer?

Is there a way to read a ByteBuffer with a BufferedReader without having to turn it into a String first? I want to read through a fairly large ByteBuffer as lines of text and for performance reasons I want to avoid writing it to the disk. Calling toString on the ByteBuffer doesn't work because the resulting String is too large (it throws java.lang.OutOfMemoryError: Java heap space). I would have thought there would be something in the API to wrap a ByteBuffer in a suitable reader, but I can't seem to find anything suitable.
Here's an abbreviated code sample the illustrates what I am doing):
// input stream is from Process getInputStream()
public String read(InputStream istream)
{
ReadableByteChannel source = Channels.newChannel(istream);
ByteArrayOutputStream ostream = new ByteArrayOutputStream(bufferSize);
WritableByteChannel destination = Channels.newChannel(ostream);
ByteBuffer buffer = ByteBuffer.allocateDirect(writeBufferSize);
while (source.read(buffer) != -1)
{
buffer.flip();
while (buffer.hasRemaining())
{
destination.write(buffer);
}
buffer.clear();
}
// this data can be up to 150 MB.. won't fit in a String.
result = ostream.toString();
source.close();
destination.close();
return result;
}
// after the process is run, we call this method with the String
public void readLines(String text)
{
BufferedReader reader = new BufferedReader(new StringReader(text));
String line;
while ((line = reader.readLine()) != null)
{
// do stuff with line
}
}
It's not clear why you're using a byte buffer to start with. If you've got an InputStream and you want to read lines for it, why don't you just use an InputStreamReader wrapped in a BufferedReader? What's the benefit in getting NIO involved?
Calling toString() on a ByteArrayOutputStream sounds like a bad idea to me even if you had the space for it: better to get it as a byte array and wrap it in a ByteArrayInputStream and then an InputStreamReader, if you really have to have a ByteArrayOutputStream. If you really want to call toString(), at least use the overload which takes the name of the character encoding to use - otherwise it'll use the system default, which probably isn't what you want.
EDIT: Okay, so you really want to use NIO. You're still writing to a ByteArrayOutputStream eventually, so you'll end up with a BAOS with the data in it. If you want to avoid making a copy of that data, you'll need to derive from ByteArrayOutputStream, for instance like this:
public class ReadableByteArrayOutputStream extends ByteArrayOutputStream
{
/**
* Converts the data in the current stream into a ByteArrayInputStream.
* The resulting stream wraps the existing byte array directly;
* further writes to this output stream will result in unpredictable
* behavior.
*/
public InputStream toInputStream()
{
return new ByteArrayInputStream(array, 0, count);
}
}
Then you can create the input stream, wrap it in an InputStreamReader, wrap that in a BufferedReader, and you're away.
You can use NIO, but there's no real need here. As Jon Skeet suggested:
public byte[] read(InputStream istream)
{
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buffer = new byte[1024]; // Experiment with this value
int bytesRead;
while ((bytesRead = istream.read(buffer)) != -1)
{
baos.write(buffer, 0, bytesRead);
}
return baos.toByteArray();
}
// after the process is run, we call this method with the String
public void readLines(byte[] data)
{
BufferedReader reader = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(data)));
String line;
while ((line = reader.readLine()) != null)
{
// do stuff with line
}
}
This is a sample:
public class ByteBufferBackedInputStream extends InputStream {
ByteBuffer buf;
public ByteBufferBackedInputStream(ByteBuffer buf) {
this.buf = buf;
}
public synchronized int read() throws IOException {
if (!buf.hasRemaining()) {
return -1;
}
return buf.get() & 0xFF;
}
#Override
public int available() throws IOException {
return buf.remaining();
}
public synchronized int read(byte[] bytes, int off, int len) throws IOException {
if (!buf.hasRemaining()) {
return -1;
}
len = Math.min(len, buf.remaining());
buf.get(bytes, off, len);
return len;
}
}
And you can use it like this:
String text = "this is text"; // It can be Unicode text
ByteBuffer buffer = ByteBuffer.wrap(text.getBytes("UTF-8"));
InputStream is = new ByteBufferBackedInputStream(buffer);
InputStreamReader r = new InputStreamReader(is, "UTF-8");
BufferedReader br = new BufferedReader(r);

Efficient way of handling file pointers in Java? (Using BufferedReader with file pointer)

I have a log file which gets updated every second. I need to read the log file periodically, and once I do a read, I need to store the file pointer position at the end of the last line I read and in the next periodic read I should start from that point.
Currently, I am using a random access file in Java and using the getFilePointer() method to get he offset value and the seek() method to go to the offset position.
However, I have read in most articles and even the Java doc recommendations to use BufferredReader for efficient reading of a file. How can I achieve this (getting the filepointer and moving to the last line) using a BufferedReader, or is there any other efficient way to achieve this task?
A couple of ways that should work:
open the file using a FileInputStream, skip() the relevant number of bytes, then wrap the BufferedReader around the stream (via an InputStreamReader);
open the file (with either FileInputStream or RandomAccessFile), call getChannel() on the stream/RandomAccessFile to get an underlying FileChannel, call position() on the channel, then call Channels.newInputStream() to get an input stream from the channel, which you can pass to InputStreamReader -> BufferedReader.
I haven't honestly profiled these to see which is better performance-wise, but you should see which works better in your situation.
The problem with RandomAccessFile is essentially that its readLine() method is very inefficient. If it's convenient for you to read from the RAF and do your own buffering to split the lines, then there's nothing wrong with RAF per se-- just that its readLine() is poorly implemented
Neil Coffey's solution is good if you are reading fixed length files. However for files that have variable length (data keep coming in) there are some problems with using BufferedReader directly on FileInputStream or FileChannel inputstream via an InputStreamReader. For ex consider the cases
1)
You want to read data from some offset to current file length. So you use BR on FileInputStream/FileChannel(via an InputStreamReader) and use its readLine method. But while you are busy reading the data let say some data got added which causes BF's readLine to read more data than what you expected(the previous file length)
2)
You finished readLine stuff but when you try to read the current file length/channel position some data got added suddenly which causes the current file length/channel position to increase but you have already read less data than this.
In both of the above cases it is difficult to know the actual data you have read (you cannot just use the length of data read using readLine because it skips some chars like carriage return)
So it is better to read the data in buffered bytes and use a BufferedReader wrapper around this. I wrote some methods like this
/** Read data from offset to length bytes in RandomAccessFile using BufferedReader
* #param offset
* #param length
* #param accessFile
* #throws IOException
*/
public static void readBufferedLines(long offset, long length, RandomAccessFile accessFile) throws IOException{
if(accessFile == null) return;
int bufferSize = BYTE_BUFFER_SIZE;// constant say 4096
if(offset < length && offset >= 0){
int index = 1;
long curPosition = offset;
/*
* iterate (length-from)/BYTE_BUFFER_SIZE times to read into buffer no matter where new line occurs
*/
while((curPosition + (index * BYTE_BUFFER_SIZE)) < length){
accessFile.seek(offset); // seek to last parsed data rather than last data read in to buffer
byte[] buf = new byte[bufferSize];
int read = accessFile.read(buf, 0, bufferSize);
index++;// Increment whether or not read successful
if(read > 0){
int lastnewLine = getLastLine(read,buf);
if(lastnewLine <= 0){ // no new line found in the buffer reset buffer size and continue
bufferSize = bufferSize+read;
continue;
}
else{
bufferSize = BYTE_BUFFER_SIZE;
}
readLine(buf, 0, lastnewLine); // read the lines from buffer and parse the line
offset = offset+lastnewLine; // update the last data read
}
}
// Read last chunk. The last chunk size in worst case is the total file when no newline occurs
if(offset < length){
accessFile.seek(offset);
byte[] buf = new byte[(int) (length-offset)];
int read = accessFile.read(buf, 0, buf.length);
if(read > 0){
readLine(buf, 0, read);
offset = offset+read; // update the last data read
}
}
}
}
private static void readLine(byte[] buf, int from , int lastnewLine) throws IOException{
String readLine = "";
BufferedReader reader = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(buf,from,lastnewLine) ));
while( (readLine = reader.readLine()) != null){
//do something with readLine
System.out.println(readLine);
}
reader.close();
}
private static int getLastLine(int read, byte[] buf) {
if(buf == null ) return -1;
if(read > buf.length) read = buf.length;
while( read > 0 && !(buf[read-1] == '\n' || buf[read-1] == '\r')) read--;
return read;
}
public static void main(String[] args) throws IOException {
RandomAccessFile accessFile = new RandomAccessFile("C:/sri/test.log", "r");
readBufferedLines(0, accessFile.length(), accessFile);
accessFile.close();
}
I had a similar problem, and I created this class to take lines from BufferedStream, and count how many bytes you have read so far by using getBytes(). We assume the line separator has a single byte by default, and we re-instance the BufferedReader for seek() to work.
public class FileCounterIterator {
public Long position() {
return _position;
}
public Long fileSize() {
return _fileSize;
}
public FileCounterIterator newlineLength(Long newNewlineLength) {
this._newlineLength = newNewlineLength;
return this;
}
private Long _fileSize = 0L;
private Long _position = 0L;
private Long _newlineLength = 1L;
private RandomAccessFile fp;
private BufferedReader itr;
public FileCounterIterator(String filename) throws IOException {
fp = new RandomAccessFile(filename, "r");
_fileSize = fp.length();
this.seek(0L);
}
public FileCounterIterator seek(Long newPosition) throws IOException {
this.fp.seek(newPosition);
this._position = newPosition;
itr = new BufferedReader(new InputStreamReader(new FileInputStream(fp.getFD())));
return this;
}
public Boolean hasNext() throws IOException {
return this._position < this._fileSize;
}
public String readLine() throws IOException {
String nextLine = itr.readLine();
this._position += nextLine.getBytes().length + _newlineLength;
return nextLine;
}
}

Categories