Read the newly appended file content to an InputStream in Java

Read the newly appended file content to an InputStream in Java - java

I have a writer program that writes a huge serialized java object (at the scale of 1GB) into a binary file on local disk at a specific speed. Actually, the writer program (implemented in C language) is a network receiver that receives the bytes of the serialized object from a remote server. The implementation of the writer is fixed.
Now, I want to implement a Java reader program that reads the file and deserializes it to a Java object. Since the file could be very large, it is beneficial to reduce the latency of deserializing the object. Particularly, I want the Java reader starts to read/deserialize the object once the first byte of the object has been written to the disk file so that the reader can start to deserialize the object even before the entire serialized object has been written to the file. The reader knows the size of the file ahead of time (before the first byte is written to the file).
I think what I need is something like a blocking file InputStream that will be blocked when it reaches the EndOfFile but it has not read the expected number of bytes (the size of the file will be). Thus, whenever new bytes have been written to the file, the reader's InputStream could keep reading the new content. However, FileInputStream in Java does not support this feature.
Probably, I also need a file listener that monitoring the changes made to the file to achieve this feature.
I am wondering if there is any existing solution/library/package can achieve this function. Probably the question may be similar to some of the questions in monitoring the log files.
The flow of the bytes is like this:
FileInputStream -> SequenceInputStream -> BufferedInputStream -> JavaSerializer

You need two threads: Thread1 to download from the server and write to a File, and Thread2 to read the File as it becomes available.
Both threads should share a single RandomAccessFile, so access to the OS file can be synchronized correctly. You could use a wrapper class like this:
public class ReadWriteFile {
ReadWriteFile(File f, long size) throws IOException {
_raf = new RandomAccessFile(f, "rw");
_size = size;
_writer = new OutputStream() {
#Override
public void write(int b) throws IOException {
write(new byte[] {
(byte)b
});
}
#Override
public void write(byte[] b, int off, int len) throws IOException {
if (len < 0)
throw new IllegalArgumentException();
synchronized (_raf) {
_raf.seek(_nw);
_raf.write(b, off, len);
_nw += len;
_raf.notify();
}
}
};
}
void close() throws IOException {
_raf.close();
}
InputStream reader() {
return new InputStream() {
#Override
public int read() throws IOException {
if (_pos >= _size)
return -1;
byte[] b = new byte[1];
if (read(b, 0, 1) != 1)
throw new IOException();
return b[0] & 255;
}
#Override
public int read(byte[] buff, int off, int len) throws IOException {
synchronized (_raf) {
while (true) {
if (_pos >= _size)
return -1;
if (_pos >= _nw) {
try {
_raf.wait();
continue;
} catch (InterruptedException ex) {
throw new IOException(ex);
}
}
_raf.seek(_pos);
len = (int)Math.min(len, _nw - _pos);
int nr = _raf.read(buff, off, len);
_pos += Math.max(0, nr);
return nr;
}
}
}
private long _pos;
};
}
OutputStream writer() {
return _writer;
}
private final RandomAccessFile _raf;
private final long _size;
private final OutputStream _writer;
private long _nw;
}
The following code shows how to use ReadWriteFile from two threads:
public static void main(String[] args) throws Exception {
File f = new File("test.bin");
final long size = 1024;
final ReadWriteFile rwf = new ReadWriteFile(f, size);
Thread t1 = new Thread("Writer") {
public void run() {
try {
OutputStream w = new BufferedOutputStream(rwf.writer(), 16);
for (int i = 0; i < size; i++) {
w.write(i);
sleep(1);
}
System.out.println("Write done");
w.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
};
Thread t2 = new Thread("Reader") {
public void run() {
try {
InputStream r = new BufferedInputStream(rwf.reader(), 13);
for (int i = 0; i < size; i++) {
int b = r.read();
assert (b == (i & 255));
}
int eof = r.read();
assert (eof == -1);
r.close();
System.out.println("Read done");
} catch (IOException ex) {
ex.printStackTrace();
}
}
};
t1.start();
t2.start();
t1.join();
t2.join();
rwf.close();
}

Related

Reading large log files in real time in Java

What can I use to read log file in real time in Java 8?
I read blogs to understand BufferedReader is a good option for reading fine.
I tried below:
BufferedReader reader = new
BufferedReader(new
InputStreamReader(inputStream));
String line;
while(true) {
line = reader.readLine(); // blocks until next line
available
// do whatever You want with line
}
However it keeps printing null irrespective of file is updated or not. Any idea what can be going wrong.
Any other options?
Details are as below :
I am trying to create an utility in Java 8 or above, where I need to read log file of an application at real time (as live transactions are occurring and getting printed in logs).
I can access log file as I am on sme server, so that is not an issue.
So some of the specifics are below
-> I don't want to poll the log files for Changes, I want to keep it the bridge open to read log file in "while true" loop. So ideally i want to block my reader if there are no new lines getting printed.
-> I don't want to store the entire content of the file in memory at all time as I want it to be memory efficient.
-> my code will run as a separate application to read log file of another application.
-> only job of my code is to read log, match against a pattern, if matched then send a message with log content.
Kindly let me know if any detail is ambiguous.
Any help is appericiated, thanks.

For this to work, your inputStream must block until new data becomes available, which a standard FileInputStream does not when reaching the end-of-file.
I suppose, you initialize inputStream to just new FileInputStream("my-logfile.log");. This stream will only read to the current end of the log file and signal the "end of file" condition to the BufferedReader. This in turn will signal "end of file" by returning null from readLine().
Have a look at the utility org.apache.commons.io.input.Tailer. This allows to write programs like the Unix utility tail -f.
To make your code work, you would have to use an "infinite" input stream that could be realized using a RandomAccessFile as in the following example:
package test;
import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.PrintWriter;
import java.io.RandomAccessFile;
import java.nio.file.Files;
import java.nio.file.StandardOpenOption;
public class TestRead {
public static void main(String[] args) throws IOException, InterruptedException {
File logFile = new File("my-log.log");
// Make sure to start form a defined condition.
logFile.delete();
try (OutputStream out = Files.newOutputStream(logFile.toPath(), StandardOpenOption.CREATE)) {
// Just create an empty file to append later on.
}
Thread analyzer = Thread.currentThread();
// Simulate log file writing.
new Thread() {
#Override
public void run() {
try {
for (int n = 0; n < 16; n++) {
try (OutputStream out = Files.newOutputStream(logFile.toPath(), StandardOpenOption.APPEND)) {
PrintWriter printer = new PrintWriter(out);
String line = "Line " + n;
printer.println(line);
printer.flush();
System.out.println("wrote: " + line);
}
Thread.sleep(1000);
}
} catch (Exception ex) {
ex.printStackTrace();
} finally {
analyzer.interrupt();
}
}
}.start();
// The original code reading the log file.
try (InputStream inputStream = new InfiniteInputStream(logFile);) {
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream), 8);
String line;
while (true) {
line = reader.readLine();
if (line == null) {
System.out.println("End-of-file.");
break;
}
System.out.println("read: " + line);
}
}
}
public static class InfiniteInputStream extends InputStream {
private final RandomAccessFile _in;
public InfiniteInputStream(File file) throws IOException {
_in = new RandomAccessFile(file, "r");
}
#Override
public int read(byte[] b, int off, int len) throws IOException {
if (b == null) {
throw new NullPointerException();
} else if (off < 0 || len < 0 || len > b.length - off) {
throw new IndexOutOfBoundsException();
} else if (len == 0) {
return 0;
}
int c = read();
if (c == -1) {
return -1;
}
b[off] = (byte)c;
int i = 1;
try {
for (; i < len ; i++) {
c = readDirect();
if (c == -1) {
break;
}
b[off + i] = (byte)c;
}
} catch (IOException ee) {
}
return i;
}
#Override
public int read() throws IOException {
int result;
while ((result = readDirect()) < 0) {
// Poll until more data becomes available.
try {
Thread.sleep(500);
} catch (InterruptedException ex) {
return -1;
}
}
return result;
}
private int readDirect() throws IOException {
return _in.read();
}
}
}

read and store server logs in real tine

i am using following code for read log file and matched pattern store in database.
public class MIScript {
//DB
public static void db(String email, String ip, String pdate, String hostname, String im) {
// DATABASE INSERT
}
public void pop(File f, String IM) throws FileNotFoundException, IOException, InterruptedException {
int pos = 0;
RandomAccessFile file = new RandomAccessFile(f, "r");
pos = (int) file.length() - (int) Math.min(file.length() - 1, file.length());
file.seek(pos);
for (; true; Thread.currentThread().sleep(1000)) {
int l = (int) (file.length() - pos);
if (l <= 0) {
continue;
}
byte[] buf = new byte[l];
int read = file.read(buf, 0, l);
String out = new String(buf, 0, l);
// System.out.println(out);
InputStream is = new ByteArrayInputStream(out.getBytes());
BufferedReader in = new BufferedReader(new InputStreamReader(is));
String line = null;
while (((line = in.readLine()) != null)) {
if (line.contains("LOG")) {
// SOME CODE
//INSERT INTO DATABASE
MIScript.db(// parameters //);
}
}
}
}
public static void main(String[] args) {
try {
File pop = new File("d://ABC.log");
MIScript tail1 = new MIScript();
tail1.pop(pop, "TEST");
} catch (ArrayIndexOutOfBoundsException ar) {
System.out.println("Errrrr------" + ar);
System.exit(1);
} catch (Exception io) {
io.printStackTrace();
System.out.println("Errrrr2------" + io);
System.exit(1);
}
}
}
it works great on single file but i need to 4 file to read synchronously please give me the way to do this .
i tried to do this with 2 files but that's not working

You need to read each file in a separate thread, and ensure that the code to write to database is thread safe.
Edit: I put this in a comment, but actually it's part of the answer: from Java 7 you can get the filesystem to call you back when a file changes http://docs.oracle.com/javase/7/docs/api/java/nio/file/WatchService.html
That way you don't need to poll the file size like you're doing... but you do still need 1 thread per file.
Tutorial for WatchService is here: http://docs.oracle.com/javase/tutorial/essential/io/notification.html

compression on java nio direct buffers

The gzip input/output stream dont operate on Java direct buffers.
Is there any compression algorithm implementation out there that operates directly on direct buffers?
This way there would be no overhead of copying a direct buffer to a java byte array for compression.

I don't mean to detract from your question, but is this really a good optimization point in your program? Have you verified with a profiler that you indeed have a problem? Your question as stated implies you have not done any research, but are merely guessing that you will have a performance or memory problem by allocating a byte[]. Since all the answers in this thread are likely to be hacks of some sort, you should really verify that you actually have a problem before fixing it.
Back to the question, if you're wanting to compress the data "in place" in on a ByteBuffer, the answer is no, there is no capability to do that built into Java.
If you allocated your buffer like the following:
byte[] bytes = getMyData();
ByteBuffer buf = ByteBuffer.wrap(bytes);
You can filter your byte[] through a ByteBufferInputStream as the previous answer suggested.

Wow old question, but stumbled upon this today.
Probably some libs like zip4j can handle this, but you can get the job done with no external dependencies since Java 11:
If you are interested only in compressing data, you can just do:
void compress(ByteBuffer src, ByteBuffer dst) {
var def = new Deflater(Deflater.DEFAULT_COMPRESSION, true);
try {
def.setInput(src);
def.finish();
def.deflate(dst, Deflater.SYNC_FLUSH);
if (src.hasRemaining()) {
throw new RuntimeException("dst too small");
}
} finally {
def.end();
}
}
Both src and dst will change positions, so you might have to flip them after compress returns.
In order to recover compressed data:
void decompress(ByteBuffer src, ByteBuffer dst) throws DataFormatException {
var inf = new Inflater(true);
try {
inf.setInput(src);
inf.inflate(dst);
if (src.hasRemaining()) {
throw new RuntimeException("dst too small");
}
} finally {
inf.end();
}
}
Note that both methods expect (de-)compression to happen in a single pass, however, we could use slight modified versions in order to stream it:
void compress(ByteBuffer src, ByteBuffer dst, Consumer<ByteBuffer> sink) {
var def = new Deflater(Deflater.DEFAULT_COMPRESSION, true);
try {
def.setInput(src);
def.finish();
int cmp;
do {
cmp = def.deflate(dst, Deflater.SYNC_FLUSH);
if (cmp > 0) {
sink.accept(dst.flip());
dst.clear();
}
} while (cmp > 0);
} finally {
def.end();
}
}
void decompress(ByteBuffer src, ByteBuffer dst, Consumer<ByteBuffer> sink) throws DataFormatException {
var inf = new Inflater(true);
try {
inf.setInput(src);
int dec;
do {
dec = inf.inflate(dst);
if (dec > 0) {
sink.accept(dst.flip());
dst.clear();
}
} while (dec > 0);
} finally {
inf.end();
}
}
Example:
void compressLargeFile() throws IOException {
var in = FileChannel.open(Paths.get("large"));
var temp = ByteBuffer.allocateDirect(1024 * 1024);
var out = FileChannel.open(Paths.get("large.zip"));
var start = 0;
var rem = ch.size();
while (rem > 0) {
var mapped=Math.min(16*1024*1024, rem);
var src = in.map(MapMode.READ_ONLY, start, mapped);
compress(src, temp, (bb) -> {
try {
out.write(bb);
} catch (IOException e) {
throw new UncheckedIOException(e);
}
});
rem-=mapped;
}
}
If you want fully zip compliant data:
void zip(ByteBuffer src, ByteBuffer dst) {
var u = src.remaining();
var crc = new CRC32();
crc.update(src.duplicate());
writeHeader(dst);
compress(src, dst);
writeTrailer(crc, u, dst);
}
Where:
void writeHeader(ByteBuffer dst) {
var header = new byte[] { (byte) 0x8b1f, (byte) (0x8b1f >> 8), Deflater.DEFLATED, 0, 0, 0, 0, 0, 0, 0 };
dst.put(header);
}
And:
void writeTrailer(CRC32 crc, int uncompressed, ByteBuffer dst) {
if (dst.order() == ByteOrder.LITTLE_ENDIAN) {
dst.putInt((int) crc.getValue());
dst.putInt(uncompressed);
} else {
dst.putInt(Integer.reverseBytes((int) crc.getValue()));
dst.putInt(Integer.reverseBytes(uncompressed));
}
So, zip imposes 10+8 bytes of overhead.
In order to unzip a direct buffer into another, you can wrap the src buffer into an InputStream:
class ByteBufferInputStream extends InputStream {
final ByteBuffer bb;
public ByteBufferInputStream(ByteBuffer bb) {
this.bb = bb;
}
#Override
public int available() throws IOException {
return bb.remaining();
}
#Override
public int read() throws IOException {
return bb.hasRemaining() ? bb.get() & 0xFF : -1;
}
#Override
public int read(byte[] b, int off, int len) throws IOException {
var rem = bb.remaining();
if (rem == 0) {
return -1;
}
len = Math.min(rem, len);
bb.get(b, off, len);
return len;
}
#Override
public long skip(long n) throws IOException {
var rem = bb.remaining();
if (n > rem) {
bb.position(bb.limit());
n = rem;
} else {
bb.position((int) (bb.position() + n));
}
return n;
}
}
and use:
void unzip(ByteBuffer src, ByteBuffer dst) throws IOException {
try (var is = new ByteBufferInputStream(src); var gis = new GZIPInputStream(is)) {
var tmp = new byte[1024];
var r = gis.read(tmp);
if (r > 0) {
do {
dst.put(tmp, 0, r);
r = gis.read(tmp);
} while (r > 0);
}
}
}
Of course, this is not cool since we are copying data to a temporary array, but nevertheless, it is sort of a roundtrip check that proves that nio-based zip encoding writes valid data that can be read from standard io-based consumers.
So, if we just ignore crc consistency checks we can just drop header/footer:
void unzipNoCheck(ByteBuffer src, ByteBuffer dst) throws DataFormatException {
src.position(src.position() + 10).limit(src.limit() - 8);
decompress(src, dst);
}

If you are using ByteBuffers you can use some simple Input/OutputStream wrappers such as these:
public class ByteBufferInputStream extends InputStream {
private ByteBuffer buffer = null;
public ByteBufferInputStream( ByteBuffer b) {
this.buffer = b;
}
#Override
public int read() throws IOException {
return (buffer.get() & 0xFF);
}
}
public class ByteBufferOutputStream extends OutputStream {
private ByteBuffer buffer = null;
public ByteBufferOutputStream( ByteBuffer b) {
this.buffer = b;
}
#Override
public void write(int b) throws IOException {
buffer.put( (byte)(b & 0xFF) );
}
}
Test:
ByteBuffer buffer = ByteBuffer.allocate( 1000 );
ByteBufferOutputStream bufferOutput = new ByteBufferOutputStream( buffer );
GZIPOutputStream output = new GZIPOutputStream( bufferOutput );
output.write("stackexchange".getBytes());
output.close();
buffer.position( 0 );
byte[] result = new byte[ 1000 ];
ByteBufferInputStream bufferInput = new ByteBufferInputStream( buffer );
GZIPInputStream input = new GZIPInputStream( bufferInput );
input.read( result );
System.out.println( new String(result));

Limit number of bytes written in a PrintStream

I'm running some unsecure code which I have set its stdout and stderr streams to FileStreams wrapped in PrintStreams. (Standard output/error MUST be redirected.)
Is there any way to configure those redirected FileStreams/PrintStreams to set a maximum of say 10 MB written, so that, for example,
while (true) System.out.write("lots of bytes");
doesn't write excessive amounts of data to the server's disk.
The code does have a time limit of 15s, but I'd like a separate guard here.

One way to do it is to define a FilterOutputStream that you wrap the file stream in, which keeps an internal counter that it increments on every write, and after reaching a set threshold, starts throwing Exceptions or simply ignores the writes.
Something along the lines of:
import java.io.*;
class LimitOutputStream extends FilterOutputStream{
private long limit;
public LimitOutputStream(OutputStream out,long limit){
super(out);
this.limit = limit;
}
public void write(byte[]b) throws IOException{
long left = Math.min(b.length,limit);
if (left<=0)
return;
limit-=left;
out.write(b, 0, (int)left);
}
public void write(int b) throws IOException{
if (limit<=0)
return;
limit--;
out.write(b);
}
public void write(byte[]b,int off, int len) throws IOException{
long left = Math.min(len,limit);
if (left<=0)
return;
limit-=left;
out.write(b,off,(int)left);
}
}

I had similar task but reading InputStreams from a DB and made a small method.
Don't want to be the Captain Obvious but it also can be used with inpustreams like FileInputStream too :)
public static void writeBytes2File(InputStream is, String name,long limit) {
byte buf[] = new byte[8192];
int len = 0;
long size = 0;
FileOutputStream fos = null;
try {
fos = new FileOutputStream(name);
while ((len = is.read(buf)) != -1) {
fos.write(buf, 0, len);
size += len;
if (size > limit*1024*1024) {
System.out.println("The file size exceeded " + size + " Bytes ");
break;
}
}
System.out.println("File written: " +name);
}
catch (FileNotFoundException fnone) {
fnone.printStackTrace();
}
catch (IOException ioe) {
ioe.printStackTrace();
}
finally {
try {
if(is!=null){is.close();}
if (fos != null) {fos.flush();fos.close();
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
hope somebody might find it useful.

Multi threading in java

if i have a file contains 4000 bytes, can i have 4 threads read from the file at same time? and each thread access a different section of the file.
thread 1 read 0-999, thread 2 read 1000 - 2999, etc.
please give an example in java.

The file is very very small, and will be very fast to load. What I would do is create a thread-safe data class that loads the data. Each processing thread can then request an ID from the data class and receive a unique one with a guarantee of no other thread sending the same ID to your remote service.
In this manner, you remove the need to have all the threads accessing the file, and trying to figure out who has read and sent what ID.

RandomAccessFile or FileChannel will let you access bytes within a file. For waiting until your threads finish, look at CyclicBarrier or CountDownLatch.

Given this comment by the question's author:
I want to run a batch file , in which
it contains thousands of unique id's.
this each unique id will send as a
request to the remote system. So i
want to send requests parallely using
threads to speed up the process. But
if use multi threads then all the
threads reading complete data and
duplicate requests are sending. So i
want to avoid this duplicate requests.
I would suggest that you load the file into memory as some kind of data structure - an array of ids perhaps. Have the threads consume ids from the array. Be sure to access the array in a synchronized manner.
If the file is larger than you'd like to load in memory or the file is constantly being appended to then create a single producer thread that watches and reads from the file and inserts ids into a queue type structure.

Sorry, Here is the working code. Now i've test it self :-)
package readfilemultithreading;
import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class MultiThreadFileReader
{
public MultiThreadFileReader(File fileToRead, int numThreads, int numBytesForEachThread)
{
this.file = fileToRead;
this.numThreads = numThreads;
this.bytesForEachThread = numBytesForEachThread;
this.bytes = new byte[(int) file.length()];
}
private File file;
private int numThreads;
private byte[] bytes;
int bytesForEachThread;
public byte[] getResult()
{
return bytes;
}
public void startReading()
{
List<ReaderThread> readers = new ArrayList<ReaderThread>();
for (int i = 0; i < numThreads; i ++) {
ReaderThread rt = new ReaderThread(i * bytesForEachThread, bytesForEachThread, file);
readers.add(rt);
rt.start();
}
// Each Thread is Reading....
int resultIndex = 0;
for (int i = 0; i < numThreads; i++) {
ReaderThread thread = readers.get(i);
while (!thread.done) {
try {
Thread.sleep(1);
} catch (Exception e) {
}
}
for (int b = 0; b < thread.len; b++, resultIndex++)
{
bytes[resultIndex] = thread.rb[b];
}
}
}
private class ReaderThread extends Thread
{
public ReaderThread(int off, int len, File f)
{
this.off = off;
this.len = len;
this.f = f;
}
public int off, len;
private File f;
public byte[] rb;
public boolean done = false;
#Override
public void run()
{
done = false;
rb = readPiece();
done = true;
}
private byte[] readPiece()
{
try {
BufferedInputStream reader = new BufferedInputStream(new FileInputStream(f));
if (off + len > f.length()) {
len = (int) (f.length() - off);
if (len < 0)
{
len = 0;
}
System.out.println("Correct Length to: " + len);
}
if (len == 0)
{
System.out.println("No bytes to read");
return new byte[0];
}
byte[] b = new byte[len];
System.out.println("Length: " + len);
setName("Thread for " + len + " bytes");
reader.skip(off);
for (int i = off, index = 0; i < len + off; i++, index++)
{
b[index] = (byte) reader.read();
}
reader.close();
return b;
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
}
}
Here is usage code:
package readfilemultithreading;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
public class Main
{
public static void main(String[] args)
{
new Main().start(args);
}
public void start(String[] args)
{
try {
MultiThreadFileReader reader = new MultiThreadFileReader(new File("C:\\Users\\Martijn\\Documents\\Test.txt"), 4, 2500);
reader.startReading();
byte[] result = reader.getResult();
FileOutputStream stream = new FileOutputStream(new File("C:\\Users\\Martijn\\Documents\\Test_cop.txt"));
for (byte b : result) {
System.out.println(b);
stream.write((int) b);
}
stream.close();
} catch (IOException ex) {
System.err.println("Reading failed");
}
}
}
Can I get now my +1 back ;-)

You must somehow synchronize the read access to the file. I suggest to use the ExecutorService:
Your main thread reads the IDs from the file and passed them to the executor service one at a time. The executor will run N threads to process N IDs concurrently.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.