I have an InputStream of a file and i use apache poi components to read from it like this:
POIFSFileSystem fileSystem = new POIFSFileSystem(inputStream);
The problem is that i need to use the same stream multiple times and the POIFSFileSystem closes the stream after use.
What is the best way to cache the data from the input stream and then serve more input streams to different POIFSFileSystem ?
EDIT 1:
By cache i meant store for later use, not as a way to speedup the application. Also is it better to just read up the input stream into an array or string and then create input streams for each use ?
EDIT 2:
Sorry to reopen the question, but the conditions are somewhat different when working inside desktop and web application.
First of all, the InputStream i get from the org.apache.commons.fileupload.FileItem in my tomcat web app doesn't support markings thus cannot reset.
Second, I'd like to be able to keep the file in memory for faster acces and less io problems when dealing with files.
you can decorate InputStream being passed to POIFSFileSystem with a version that when close() is called it respond with reset():
class ResetOnCloseInputStream extends InputStream {
private final InputStream decorated;
public ResetOnCloseInputStream(InputStream anInputStream) {
if (!anInputStream.markSupported()) {
throw new IllegalArgumentException("marking not supported");
}
anInputStream.mark( 1 << 24); // magic constant: BEWARE
decorated = anInputStream;
}
#Override
public void close() throws IOException {
decorated.reset();
}
#Override
public int read() throws IOException {
return decorated.read();
}
}
testcase
static void closeAfterInputStreamIsConsumed(InputStream is)
throws IOException {
int r;
while ((r = is.read()) != -1) {
System.out.println(r);
}
is.close();
System.out.println("=========");
}
public static void main(String[] args) throws IOException {
InputStream is = new ByteArrayInputStream("sample".getBytes());
ResetOnCloseInputStream decoratedIs = new ResetOnCloseInputStream(is);
closeAfterInputStreamIsConsumed(decoratedIs);
closeAfterInputStreamIsConsumed(decoratedIs);
closeAfterInputStreamIsConsumed(is);
}
EDIT 2
you can read the entire file in a byte[] (slurp mode) then passing it to a ByteArrayInputStream
Try BufferedInputStream, which adds mark and reset functionality to another input stream, and just override its close method:
public class UnclosableBufferedInputStream extends BufferedInputStream {
public UnclosableBufferedInputStream(InputStream in) {
super(in);
super.mark(Integer.MAX_VALUE);
}
#Override
public void close() throws IOException {
super.reset();
}
}
So:
UnclosableBufferedInputStream bis = new UnclosableBufferedInputStream (inputStream);
and use bis wherever inputStream was used before.
This works correctly:
byte[] bytes = getBytes(inputStream);
POIFSFileSystem fileSystem = new POIFSFileSystem(new ByteArrayInputStream(bytes));
where getBytes is like this:
private static byte[] getBytes(InputStream is) throws IOException {
byte[] buffer = new byte[8192];
ByteArrayOutputStream baos = new ByteArrayOutputStream(2048);
int n;
baos.reset();
while ((n = is.read(buffer, 0, buffer.length)) != -1) {
baos.write(buffer, 0, n);
}
return baos.toByteArray();
}
Use below implementation for more custom use -
public class ReusableBufferedInputStream extends BufferedInputStream
{
private int totalUse;
private int used;
public ReusableBufferedInputStream(InputStream in, Integer totalUse)
{
super(in);
if (totalUse > 1)
{
super.mark(Integer.MAX_VALUE);
this.totalUse = totalUse;
this.used = 1;
}
else
{
this.totalUse = 1;
this.used = 1;
}
}
#Override
public void close() throws IOException
{
if (used < totalUse)
{
super.reset();
++used;
}
else
{
super.close();
}
}
}
What exactly do you mean with "cache"? Do you want the different POIFSFileSystem to start at the beginning of the stream? If so, there's absolutely no point caching anything in your Java code; it will be done by the OS, just open a new stream.
Or do you wan to continue reading at the point where the first POIFSFileSystem stopped? That's not caching, and it's very difficult to do. The only way I can think of if you can't avoid the stream getting closed would be to write a thin wrapper that counts how many bytes have been read and then open a new stream and skip that many bytes. But that could fail when POIFSFileSystem internally uses something like a BufferedInputStream.
If the file is not that big, read it into a byte[] array and give POI a ByteArrayInputStream created from that array.
If the file is big, then you shouldn't care, since the OS will do the caching for you as best as it can.
[EDIT] Use Apache commons-io to read the File into a byte array in an efficient way. Do not use int read() since it reads the file byte by byte which is very slow!
If you want to do it yourself, use a File object to get the length, create the array and the a loop which reads bytes from the file. You must loop since read(byte[], int offset, int len) can read less than len bytes (and usually does).
This is how I would implemented, to be safely used with any InputStream :
write your own InputStream wrapper where you create a temporary file to mirror the original stream content
dump everything read from the original input stream into this temporary file
when the stream was completely read you will have all the data mirrored in the temporary file
use InputStream.reset to switch(initialize) the internal stream to a FileInputStream(mirrored_content_file)
from now on you will loose the reference of the original stream(can be collected)
add a new method release() which will remove the temporary file and release any open stream.
you can even call release() from finalize to be sure the temporary file is release in case you forget to call release()(most of the time you should avoid using finalize, always call a method to release object resources). see Why would you ever implement finalize()?
public static void main(String[] args) throws IOException {
BufferedInputStream inputStream = new BufferedInputStream(IOUtils.toInputStream("Foobar"));
inputStream.mark(Integer.MAX_VALUE);
System.out.println(IOUtils.toString(inputStream));
inputStream.reset();
System.out.println(IOUtils.toString(inputStream));
}
This works. IOUtils is part of commons IO.
This answer iterates on previous ones 1|2 based on the BufferInputStream. The main changes are that it allows infinite reuse. And takes care of closing the original source input stream to free-up system resources. Your OS defines a limit on those and you don't want the program to run out of file handles (That's also why you should always 'consume' responses e.g. with the apache EntityUtils.consumeQuietly()). EDIT Updated the code to handle for gready consumers that use read(buffer, offset, length), in that case it may happen that BufferedInputStream tries hard to look at the source, this code protects against that use.
public class CachingInputStream extends BufferedInputStream {
public CachingInputStream(InputStream source) {
super(new PostCloseProtection(source));
super.mark(Integer.MAX_VALUE);
}
#Override
public synchronized void close() throws IOException {
if (!((PostCloseProtection) in).decoratedClosed) {
in.close();
}
super.reset();
}
private static class PostCloseProtection extends InputStream {
private volatile boolean decoratedClosed = false;
private final InputStream source;
public PostCloseProtection(InputStream source) {
this.source = source;
}
#Override
public int read() throws IOException {
return decoratedClosed ? -1 : source.read();
}
#Override
public int read(byte[] b) throws IOException {
return decoratedClosed ? -1 : source.read(b);
}
#Override
public int read(byte[] b, int off, int len) throws IOException {
return decoratedClosed ? -1 : source.read(b, off, len);
}
#Override
public long skip(long n) throws IOException {
return decoratedClosed ? 0 : source.skip(n);
}
#Override
public int available() throws IOException {
return source.available();
}
#Override
public void close() throws IOException {
decoratedClosed = true;
source.close();
}
#Override
public void mark(int readLimit) {
source.mark(readLimit);
}
#Override
public void reset() throws IOException {
source.reset();
}
#Override
public boolean markSupported() {
return source.markSupported();
}
}
}
To reuse it just close it first if it wasn't.
One limitation though is that if the stream is closed before the whole content of the original stream has been read, then this decorator will have incomplete data, so make sure the whole stream is read before closing.
I just add my solution here, as this works for me. It basically is a combination of the top two answers :)
private String convertStreamToString(InputStream is) {
Writer w = new StringWriter();
char[] buf = new char[1024];
Reader r;
is.mark(1 << 24);
try {
r = new BufferedReader(new InputStreamReader(is, "UTF-8"));
int n;
while ((n=r.read(buf)) != -1) {
w.write(buf, 0, n);
}
is.reset();
} catch(UnsupportedEncodingException e) {
Logger.debug(this.getClass(), "Cannot convert stream to string.", e);
} catch(IOException e) {
Logger.debug(this.getClass(), "Cannot convert stream to string.", e);
}
return w.toString();
}
Related
Let's assume that I have a java program that creates a report by multiple threads writing .to a file:
public File report = new File("C:\somewhere\file")
public FileWriter fileWriter = new FileWriter("C:\somewhere\file");
//Some thread executed the following statement
fileWriter.write("creating report for this thread");
Instead of using a file, I want to use some type of String buffer to create the report so I can return it in a rest response. What can I use that has the same outcome as if using a File.
Update: I want to completely omit the file implementation as I can't store it in cloud.
You can use the outputstream of the HttpServletResponse to send the file as stream. Don't forget to make your header relevant. You can write a method to process the output as file:
public static void writeFileToOutputStream(HttpServletResponse response, File file) {
String type = "application/octet-stream";
response.setContentType(type);
response.setHeader("Content-Disposition", String.format("inline;filename=\"" + file.getName() + "\""));
response.setContentLength((int) file.length());
InputStream inputStream = null;
try {
inputStream = new BufferedInputStream(new FileInputStream(file));
FileCopyUtils.copy(inputStream, response.getOutputStream());
} catch (IOException e) {
log.info("------couldn't write file------");
}
}
Several threads writing to the same would have one obvious solution: use java.util.logging. Writing to a log file. The content of a log file can also easily be returned as a REST response.
Using a string buffer, StringBuilder is faster, but not thread-safe. The older StringBuffer is thread-safe but not with twice appending, like in:
sb.append("The size is ").append(size); // Not thread-safe.
You could do:
private final StringBuilder sb = new StringBuilder(4096);
public void printf(String messageFormat, Object... args) {
String s = new MessageFormat(....);
synchronized(sb) {
sb.append(s);
}
}
public String extract() {
String s;
synchronized(sb) {
s = sb.toString();
sb.setLength(0);
}
return s;
}
If you want to stay implementation agnostic then you should design to an interface. I'd suggest just plain old Writer. You could have something like:
public abstract class AbstractReportWriter {
protected Writer writer;
public AbstractWriter(Writer w) {
writer = w;
}
public void write(String text) {
writer.write(text);
}
}
public class FileReportWriter extends AbstractReportWriter {
public FileReportWriter(String path) {
super(new FileWriter(path))
}
}
public class StringReportWriter extends AbstractReportWriter {
public StringReportWriter() {
super(new StringWriter())
}
public String getValue() {
return ((StringWriter) writer).toString()
}
}
public class CloudReportWriter extends AbstractReportWriter {
public CloudReportWriter() {
super(new YourCloudWriterClass());
}
}
Then you can pick and choose your writer by just swapping the implementation.
I'm looking at an open source Swift library that is able to split an InputStream into multiple BoundedInputStream objects, given a total stream size (to know when to stop creating bounded input streams). I don't see why there should be no option to stop the BoundedInputStream creation automatically once the initial InputStream is closed.
The code looks something like this:
protected Long segmentationSize = 5368709120L;
protected Long currentSegment = 0L;
private InputStream inputStream; // supplied externally
private long inputStreamSize; // supplied externally
public void uploadSegmentedObjects() {
InputStream segmentStream = getNextSegment();
while (segmentStream != null) {
// do something
}
}
public InputStream getNextSegment() {
if (done()) {
return null;
}
InputStream segment = createSegment();
currentSegment++;
return segment;
}
protected boolean done() {
return currentSegment * segmentationSize > inputStreamSize;
}
#Override
protected InputStream createSegment() throws IOException {
BoundedInputStream stream = new BoundedInputStream(inputStream, segmentationSize);
stream.setPropagateClose(false);
return stream;
}
Essentially, I need to know how to rewrite the done() method such that it is not reliant on the inputStreamSize variable and instead returns null when the stream closes.
I have a BufferedWriter as shown below:
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(
new GZIPOutputStream( hdfs.create(filepath, true ))));
String line = "text";
writer.write(line);
I want to find out the bytes written to the file with out querying file like
hdfs = FileSystem.get( new URI( "hdfs://localhost:8020" ), configuration );
filepath = new Path("path");
hdfs.getFileStatus(filepath).getLen();
as it will add overhead and I don't want that.
Also I cant do this:
line.getBytes().length;
As it give size before compression.
You can use the CountingOutputStream from Apache commons IO library.
Place it between the GZIPOutputStream and the file Outputstream (hdfs.create(..)).
After writing the content to the file you can read the number of written bytes from the CountingOutputStream instance.
If this isn't too late and you are using 1.7+ and you don't wan't to pull in an entire library like Guava or Commons-IO, you can just extend the GZIPOutputStream and obtain the data from the associated Deflater like so:
public class MyGZIPOutputStream extends GZIPOutputStream {
public MyGZIPOutputStream(OutputStream out) throws IOException {
super(out);
}
public long getBytesRead() {
return def.getBytesRead();
}
public long getBytesWritten() {
return def.getBytesWritten();
}
public void setLevel(int level) {
def.setLevel(level);
}
}
You can make you own descendant of OutputStream and count how many time write method was invoked
This is similar to the response by Olaseni, but I moved the counting into the BufferedOutputStream rather than the GZIPOutputStream, and this is more robust, since def.getBytesRead() in Olaseni's answer is not available after the stream has been closed.
With the implementation below, you can supply your own AtomicLong to the constructor so that you can assign the CountingBufferedOutputStream in a try-with-resources block, but still retrieve the count after the block has exited (i.e. after the file is closed).
public static class CountingBufferedOutputStream extends BufferedOutputStream {
private final AtomicLong bytesWritten;
public CountingBufferedOutputStream(OutputStream out) throws IOException {
super(out);
this.bytesWritten = new AtomicLong();
}
public CountingBufferedOutputStream(OutputStream out, int bufSize) throws IOException {
super(out, bufSize);
this.bytesWritten = new AtomicLong();
}
public CountingBufferedOutputStream(OutputStream out, int bufSize, AtomicLong bytesWritten)
throws IOException {
super(out, bufSize);
this.bytesWritten = bytesWritten;
}
#Override
public void write(byte[] b) throws IOException {
super.write(b);
bytesWritten.addAndGet(b.length);
}
#Override
public void write(byte[] b, int off, int len) throws IOException {
super.write(b, off, len);
bytesWritten.addAndGet(len);
}
#Override
public synchronized void write(int b) throws IOException {
super.write(b);
bytesWritten.incrementAndGet();
}
public long getBytesWritten() {
return bytesWritten.get();
}
}
I'm using BufferedWriter with the default size of 8192 characters to write lines to a local file. The lines are read from socket inputstream using BufferedReader readLine method, blocking I/O.
Average line length is 50 characters. It all works well and fast enough (over 1 mln lines per second) however if the client stops writing, lines that are currently stored in BufferedWriter buffer won't be flushed to disk. In fact the buffered characters won't be flushed to disk until the client resumes writing or the connection is closed. This translates into a delay between the time line is transmitted by client and the time this line is committed to file, so long-tail latency goes up.
Is there a way to flush incomplete BufferedWriter buffer on timeout, e.g. within 100 milliseconds?
What about something like this? It's not a real BufferedWriter, but it's a Writer. It works by periodically checking on on the last writer to the underlying, hopefully unbuffered writer, then flushing the BufferedWriter if it's been longer than the timeout.
public class PeriodicFlushingBufferedWriter extends Writer {
protected final MonitoredWriter monitoredWriter;
protected final BufferedWriter writer;
protected final long timeout;
protected final Thread thread;
public PeriodicFlushingBufferedWriter(Writer out, long timeout) {
this(out, 8192, timeout);
}
public PeriodicFlushingBufferedWriter(Writer out, int sz, final long timeout) {
monitoredWriter = new MonitoredWriter(out);
writer = new BufferedWriter(monitoredWriter, sz);
this.timeout = timeout;
thread = new Thread(new Runnable() {
#Override
public void run() {
long deadline = System.currentTimeMillis() + timeout;
while (!Thread.interrupted()) {
try {
Thread.sleep(Math.max(deadline - System.currentTimeMillis(), 0));
} catch (InterruptedException e) {
return;
}
synchronized (PeriodicFlushingBufferedWriter.this) {
if (Thread.interrupted()) {
return;
}
long lastWrite = monitoredWriter.getLastWrite();
if (System.currentTimeMillis() - lastWrite >= timeout) {
try {
writer.flush();
} catch (IOException e) {
}
}
deadline = lastWrite + timeout;
}
}
}
});
thread.start();
}
#Override
public synchronized void write(char[] cbuf, int off, int len) throws IOException {
this.writer.write(cbuf, off, len);
}
#Override
public synchronized void flush() throws IOException {
this.writer.flush();
}
#Override
public synchronized void close() throws IOException {
try {
thread.interrupt();
} finally {
this.writer.close();
}
}
private static class MonitoredWriter extends FilterWriter {
protected final AtomicLong lastWrite = new AtomicLong();
protected MonitoredWriter(Writer out) {
super(out);
}
#Override
public void write(int c) throws IOException {
lastWrite.set(System.currentTimeMillis());
super.write(c);
}
#Override
public void write(char[] cbuf, int off, int len) throws IOException {
lastWrite.set(System.currentTimeMillis());
super.write(cbuf, off, len);
}
#Override
public void write(String str, int off, int len) throws IOException {
lastWrite.set(System.currentTimeMillis());
super.write(str, off, len);
}
#Override
public void flush() throws IOException {
lastWrite.set(System.currentTimeMillis());
super.flush();
}
public long getLastWrite() {
return this.lastWrite.get();
}
}
}
#copeg is right - flush it after every line. It is easy to flush it at time period but what is the sense to have only half record and not be able to proceed it?
You might apply Observer, Manager, and Factory patterns here and have a central BufferedWriterManager produce your BufferedWriters and maintain a list of active instances. An internal thread might wake periodically and flush the active instances. This might also be an opportunity for Weak references so there is no requirement for your consumers to explicitly free the object. Instead, the GC will do the work and your Manager simply needs to handle the case when its internal reference becomes null (i.e. when all strong references are dropped).
Don't try this complex scheme, it's too hard. Just reduce the size of the buffer, by specifying it when constructing the BufferedWriter. Reduce it till you find the balance between performance and latency that you need.
I have this InputStream:
InputStream inputStream = new ByteArrayInputStream(myString.getBytes(StandardCharsets.UTF_8));
How can I convert this to ServletInputStream?
I have tried:
ServletInputStream servletInputStream = (ServletInputStream) inputStream;
but do not work.
EDIT:
My method is this:
private static class LowerCaseRequest extends HttpServletRequestWrapper {
public LowerCaseRequest(final HttpServletRequest request) throws IOException, ServletException {
super(request);
}
#Override
public ServletInputStream getInputStream() throws IOException {
ServletInputStream servletInputStream;
StringBuilder jb = new StringBuilder();
String line;
String toLowerCase = "";
BufferedReader reader = new BufferedReader(new InputStreamReader(super.getInputStream()));
while ((line = reader.readLine()) != null) {
toLowerCase = jb.append(line).toString().toLowerCase();
}
InputStream inputStream = new ByteArrayInputStream(toLowerCase.getBytes(StandardCharsets.UTF_8));
servletInputStream = (ServletInputStream) inputStream;
return servletInputStream;
}
}
I´m trying to convert all my request to lowercase.
My advice: don't create the ByteArrayInputStream, just use the byte array you got from the getBytes method already. This should be enough to create a ServletInputStream.
Most basic solution
Unfortunately, aksappy's answer only overrides the read method. While this may be enough in Servlet API 3.0 and below, in the later versions of Servlet API there are three more methods you have to implement.
Here is my implementation of the class, although with it becoming quite long (due to the new methods introduced in Servlet API 3.1), you might want to think about factoring it out into a nested or even top-level class.
final byte[] myBytes = myString.getBytes("UTF-8");
ServletInputStream servletInputStream = new ServletInputStream() {
private int lastIndexRetrieved = -1;
private ReadListener readListener = null;
#Override
public boolean isFinished() {
return (lastIndexRetrieved == myBytes.length-1);
}
#Override
public boolean isReady() {
// This implementation will never block
// We also never need to call the readListener from this method, as this method will never return false
return isFinished();
}
#Override
public void setReadListener(ReadListener readListener) {
this.readListener = readListener;
if (!isFinished()) {
try {
readListener.onDataAvailable();
} catch (IOException e) {
readListener.onError(e);
}
} else {
try {
readListener.onAllDataRead();
} catch (IOException e) {
readListener.onError(e);
}
}
}
#Override
public int read() throws IOException {
int i;
if (!isFinished()) {
i = myBytes[lastIndexRetrieved+1];
lastIndexRetrieved++;
if (isFinished() && (readListener != null)) {
try {
readListener.onAllDataRead();
} catch (IOException ex) {
readListener.onError(ex);
throw ex;
}
}
return i;
} else {
return -1;
}
}
};
Adding expected methods
Depending on your requirements, you may also want to override other methods. As romfret pointed out, it's advisable to override some methods, such as close and available. If you don't implement them, the stream will always report that there are 0 bytes available to be read, and the close method will do nothing to affect the state of the stream. You can probably get away without overriding skip, as the default implementation will just call read a number of times.
#Override
public int available() throws IOException {
return (myBytes.length-lastIndexRetrieved-1);
}
#Override
public void close() throws IOException {
lastIndexRetrieved = myBytes.length-1;
}
Writing a better close method
Unfortunately, due to the nature of an anonymous class, it's going to be difficult for you to write an effective close method because as long as one instance of the stream has not been garbage-collected by Java, it maintains a reference to the byte array, even if the stream has been closed.
However, if you factor out the class into a nested or top-level class (or even an anonymous class with a constructor which you call from the line in which it is defined), the myBytes can be a non-final field rather than a final local variable, and you can add a line like:
myBytes = null;
to your close method, which will allow Java to free memory taken up by the byte array.
Of course, this will require you to write a constructor, such as:
private byte[] myBytes;
public StringServletInputStream(String str) {
try {
myBytes = str.getBytes("UTF-8");
} catch (UnsupportedEncodingException e) {
throw new IllegalStateException("JVM did not support UTF-8", e);
}
}
Mark and Reset
You may also want to override mark, markSupported and reset if you want to support mark/reset. I am not sure if they are ever actually called by your container though.
private int readLimit = -1;
private int markedPosition = -1;
#Override
public boolean markSupported() {
return true;
}
#Override
public synchronized void mark(int readLimit) {
this.readLimit = readLimit;
this.markedPosition = lastIndexRetrieved;
}
#Override
public synchronized void reset() throws IOException {
if (markedPosition == -1) {
throw new IOException("No mark found");
} else {
lastIndexRetrieved = markedPosition;
readLimit = -1;
}
}
// Replacement of earlier read method to cope with readLimit
#Override
public int read() throws IOException {
int i;
if (!isFinished()) {
i = myBytes[lastIndexRetrieved+1];
lastIndexRetrieved++;
if (isFinished() && (readListener != null)) {
try {
readListener.onAllDataRead();
} catch (IOException ex) {
readListener.onError(ex);
throw ex;
}
readLimit = -1;
}
if (readLimit != -1) {
if ((lastIndexRetrieved - markedPosition) > readLimit) {
// This part is actually not necessary in our implementation
// as we are not storing any data. However we need to respect
// the contract.
markedPosition = -1;
readLimit = -1;
}
}
return i;
} else {
return -1;
}
}
Try this code.
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(myString.getBytes(StandardCharsets.UTF_8));
ServletInputStream servletInputStream=new ServletInputStream(){
public int read() throws IOException {
return byteArrayInputStream.read();
}
}
You can only cast something like this:
ServletInputStream servletInputStream = (ServletInputStream) inputStream;
if the inputStream you are trying to cast is actually a ServletInputStream already. It will complain if it's some other implementation of InputStream. You can't cast an object to something it isn't.
In a Servlet container, you can get a ServletInputStream from a ServletRequest:
ServletInputStream servletInputStream = request.getInputStream();
So, what are you actually trying to do?
EDIT
I'm intrigued as to why you want to convert your request to lower-case - why not just make your servlet case-insensitive? In other words, your code to lower-case the request data can be copied into your servlet, then it can process it there... always look for the simplest solution!