What's the best way to monitor an InputStream?

What's the best way to monitor an InputStream? - java

I'm reading a file in via apache.commons.FtpClient.
This works fine 99.9% of the time but sometimes it just dies in the read() method...
InputStream inStream = ftp.retrieveFileStream(path + file.getName());
String fileAsString = "";
if(inStream == null){
return;
}
while((c = inStream.read()) != -1){ //this is where the code sometimes just hangs
fileAsString += Character.valueOf((char)c);
}
My question is what is the most reliable way to protect against this locking up the system indefinitely. Should I be setting a timer in a separate thread? Or is there a simpler way to do it?

If your code hangs it means your FTP server has not sent the entire file. You can use a Timer, but I believe FtpClient allows you to set a timeout.
BTW: the way you read the file is very inefficient. If your file is larger than a few K it will use increasing amounts of CPU.
You are creating a Character from a byte (which is a bad idea in itself) and a String object for every byte in the file.
I suggest using the copy method provided or the one which comes with commons-io library to copy the data to a ByteArrayInputStream.

Just from a quick look at the docs, if you did...
while (inStream.available() > 0 && (c = inStream.read()) != -1)
It seems like it would double check that you can read without blocking before you actually read. I'm not certain on this though.

Related

Reading external process Error Stream heavily impacts performance

I have a Java program that (including other stuff) reads from an external Python application using Input Stream.
Here is the code I use to read it:
InputStreamReader isr = new InputStreamReader(p.getInputStream()),
isrError = new InputStreamReader(p.getErrorStream());
BufferedReader br = new BufferedReader(isr), brError = new BufferedReader(isrError);
new Thread() {
#Override
public void run() {
try {
while (brError.readLine() != null);
} catch (Exception e) {
}
}
}.start();
while ((line = br.readLine()) != null) { //line is a previously declared String
//do whatever with line
}
I create the thread to read the Error Stream too, because the Python application throws errors when something goes wrong (I can't edit it, it is third party software), and for some reason eventually the InputStream gets blocked if I don't read the ErrorStream.
Is there any way to make while (brError.readLine() != null); have less impact on performance?
Right now I am looking at performance with VisualVM, and while the Java software usually stays between 0-5% CPU usage, which is pretty nice, but around 60-65% of that usage is being used by this loop in this thread, which it's only function is to prevent the main loop from blocking. And I need to improve the performance as much as possible (This is going into industrial lines, so using resources correctly is really important).
Thank you all.

For easier handling (if you don't need the contents while running), use redirectError(File) in ProcessBuilder.
ProcessBuilder pb = new ProcessBuilder("foo", "-bar");
pb.redirectError(new File("/tmp/errors.log"));
pb.start();
If you're getting cpu spinning from while (brError.readLine() != null);, you should look at what the error stream is returning. Since readLine() is a blocking call, it would mean that the error stream is pumping a lot of lines out.

You're converting the throw-away stream to characters needlessly, which may be a bit costly, especially when you're using UTF-8 (depending on the platform encoding is usually wrong, anyway).
Drop the Reader, use BufferedInputStream for the throw-away stream.
However, for external processes, the redirection is surely superior as there's no processing in Java at all.

Usefulness of DELETE_ON_CLOSE

There are many examples on the internet showing how to use StandardOpenOption.DELETE_ON_CLOSE, such as this:
Files.write(myTempFile, ..., StandardOpenOption.DELETE_ON_CLOSE);
Other examples similarly use Files.newOutputStream(..., StandardOpenOption.DELETE_ON_CLOSE).
I suspect all of these examples are probably flawed. The purpose of writing a file is that you're going to read it back at some point; otherwise, why bother writing it? But wouldn't DELETE_ON_CLOSE cause the file to be deleted before you have a chance to read it?
If you create a work file (to work with large amounts of data that are too large to keep in memory) then wouldn't you use RandomAccessFile instead, which allows both read and write access? However, RandomAccessFile doesn't give you the option to specify DELETE_ON_CLOSE, as far as I can see.
So can someone show me how DELETE_ON_CLOSE is actually useful?

First of all I agree with you Files.write(myTempFile, ..., StandardOpenOption.DELETE_ON_CLOSE) in this example the use of DELETE_ON_CLOSE is meaningless. After a (not so intense) search through the internet the only example I could find which shows the usage as mentioned was the one from which you might got it (http://softwarecave.org/2014/02/05/create-temporary-files-and-directories-using-java-nio2/).
This option is not intended to be used for Files.write(...) only. The API make is quite clear:
This option is primarily intended for use with work files that are used solely by a single instance of the Java virtual machine. This option is not recommended for use when opening files that are open concurrently by other entities.
Sorry I can't give you a meaningful short example, but see such file like a swap file/partition used by an operating system. In cases where the current JVM have the need to temporarily store data on the disc and after the shutdown the data are of no use anymore. As practical example I would mention it is similar to an JEE application server which might decide to serialize some entities to disc to freeup memory.
edit Maybe the following (oversimplified code) can be taken as example to demonstrate the principle. (so please: nobody should start a discussion about that this "data management" could be done differently, using fixed temporary filename is bad and so on, ...)
in the try-with-resource block you need for some reason to externalize data (the reasons are not subject of the discussion)
you have random read/write access to this externalized data
this externalized data only is of use only inside the try-with-resource block
with the use of the StandardOpenOption.DELETE_ON_CLOSE option you don't need to handle the deletion after the use yourself, the JVM will take care about it (the limitations and edge cases are described in the API)
.
static final int RECORD_LENGTH = 20;
static final String RECORD_FORMAT = "%-" + RECORD_LENGTH + "s";
// add exception handling, left out only for the example
public static void main(String[] args) throws Exception {
EnumSet<StandardOpenOption> options = EnumSet.of(
StandardOpenOption.CREATE,
StandardOpenOption.WRITE,
StandardOpenOption.READ,
StandardOpenOption.DELETE_ON_CLOSE
);
Path file = Paths.get("/tmp/enternal_data.tmp");
try (SeekableByteChannel sbc = Files.newByteChannel(file, options)) {
// during your business processing the below two cases might happen
// several times in random order
// example of huge datastructure to externalize
String[] sampleData = {"some", "huge", "datastructure"};
for (int i = 0; i < sampleData.length; i++) {
byte[] buffer = String.format(RECORD_FORMAT, sampleData[i])
.getBytes();
ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
sbc.position(i * RECORD_LENGTH);
sbc.write(byteBuffer);
}
// example of processing which need the externalized data
Random random = new Random();
byte[] buffer = new byte[RECORD_LENGTH];
ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
for (int i = 0; i < 10; i++) {
sbc.position(RECORD_LENGTH * random.nextInt(sampleData.length));
sbc.read(byteBuffer);
byteBuffer.flip();
System.out.printf("loop: %d %s%n", i, new String(buffer));
}
}
}

The DELETE_ON_CLOSE is intended for working temp files.
If you need to make some operation that needs too be temporaly stored on a file but you don't need to use the file outside of the current execution a DELETE_ON_CLOSE in a good solution for that.
An example is when you need to store informations that can't be mantained in memory for example because they are too heavy.
Another example is when you need to store temporarely the informations and you need them only in a second moment and you don't like to occupy memory for that.
Imagine also a situation in which a process needs a lot of time to be completed. You store informations on a file and only later you use them (perhaps many minutes or hours after). This guarantees you that the memory is not used for those informations if you don't need them.
The DELETE_ON_CLOSE try to delete the file when you explicitly close it calling the method close() or when the JVM is shutting down if not manually closed before.

Here are two possible ways it can be used:
1. When calling Files.newByteChannel
This method returns a SeekableByteChannel suitable for both reading and writing, in which the current position can be modified.
Seems quite useful for situations where some data needs to be stored out of memory for read/write access and doesn't need to be persisted after the application closes.
2. Write to a file, read back, delete:
An example using an arbitrary text file:
Path p = Paths.get("C:\\test", "foo.txt");
System.out.println(Files.exists(p));
try {
Files.createFile(p);
System.out.println(Files.exists(p));
try (BufferedWriter out = Files.newBufferedWriter(p, Charset.defaultCharset(), StandardOpenOption.DELETE_ON_CLOSE)) {
out.append("Hello, World!");
out.flush();
try (BufferedReader in = Files.newBufferedReader(p, Charset.defaultCharset())) {
String line;
while ((line = in.readLine()) != null) {
System.out.println(line);
}
}
}
} catch (IOException ex) {
ex.printStackTrace();
}
System.out.println(Files.exists(p));
This outputs (as expected):
false
true
Hello, World!
false
This example is obviously trivial, but I imagine there are plenty of situations where such an approach may come in handy.
However, I still believe the old File.deleteOnExit method may be preferable as you won't need to keep the output stream open for the duration of any read operations on the file, too.

Best way to design Java file download manager

I would like to write simple Java downloader for my backup website. What is important, applet should be able to download many files at once.
So, here is my problem. Such applet seems to me easily to hack or infect. What is more, it for sure will need many system resources to run. So, I would like to hear your opinions what is the best, the most optimal and the most secure way to do it.
I thought about something like this:
//user chose directory to download his files
//fc is a FileChooser
//fc.showSaveDialog(this)==JFileChooser.APPROVE_OPTION
try {
for(i=0;i<=urls.length-1;i++){
String fileName = '...';//obtaining filename and extension
fileName=fileName.replaceAll(" ", "_");
//I am not sure if line above resolves all problems with names of files...
String path = file.getAbsolutePath() + File.separator + fileName;
try{
InputStream is=null;
FileOutputStream os=new FileOutputStream(path);
URLConnection uc = urls[i].openConnection();
is = uc.getInputStream();
int a=is.read();
while(a!=-1){
os.write(a);
a=is.read();
}
is.close();
os.close();
}
catch(InterruptedIOException iioe) {
//TODO User cancelled.
}
catch(IOException ioe){
//TODO
}
}
}
but I am sure that there is a better solution.
There is one more thing - when user wants to download really huge amount of files (e.g. 1000, between 10MB and 1GB), there will be several problems. So, I thought about setting a limit for it, but I don't really know how to decide how many files at once is OK. Should I check user's Internet connection or computer's load?
Thanks in advance
BroMan

I would like to write simple Java downloader for my backup website.
What is important, applet should be able to download many files at once.
I hope you mean sequentially like your code is written. There would be no advantage in this situation to run multiple download streams in parallel.
Such applet seems to me easily to hack or infect.
Make sure to encrypt your communication stream. Since it looks like you are just accessing URLs on the server, maybe configure your server to use HTTPS.
What is more, it for sure will need many system
resources to run.
Why do you assume that? The network bandwidth will be the limiting factor. You are not going to be taxing your other resources very much. Maybe you meant avoiding saturating user's bandwidth. You can implement simple throttling by giving user a configurable delay that you insert between every file or even every iteration of your read/write loop. Use Thread.sleep to implement the delay.
So, I thought about setting a limit for it, but I don't
really know how to decide how many files at once is OK.
Assuming you are doing download sequentially, setting limits isn't a technical question. More about what kind of service you want to provide. More files just means the download takes longer.
int a=is.read();
Your implementation of stream read/write is very inefficient. You want to read/write in chunks rather than single bytes. See the versions of read/write methods that take byte[].
Here is the basic logic flow to copy data from an input stream to an output stream.
InputStream in = null;
OutputStream out = null;
try
{
in = ...
out = ...
final byte[] buf = new byte[ 1024 ];
for( int count = in.read( buf ); count != -1; count = in.read( buf ) )
{
out.write( buf, 0, count );
}
}
finally
{
if( in != null )
{
in.close();
}
if( out != null )
{
out.close();
}
}

Java - RandomAccessFile (Emulating the Linux tail function)

Java IO implementation of unix/linux "tail -f" has a similar problem; but the solution is not viable for log files that generate about 50-100 lines per second.
I have an algorithm that emulates the tail functionality in Linux. For example,
File _logFile = new File("/tmp/myFile.txt");
long _filePtr = _logFile.length();
while (true)
{
long length = _logFile.length();
if (length < _filePtr)
{
// means file was truncated
}
else if (length > _filePtr)
{
// means something was added to the file
}
// we ignore len = _filePtr ... nothing was written to file
}
My problem is when: "something was added to the file" (referring to the else if() statement).
else if (length > _filePtr)
{
RandomAccessFile _raf = new RandomAccessFile(_logFile, "r");
raf.seek(_filePtr);
while ((curLine = raf.readLine()) != null)
myTextPane.append(curLine);
_filePtr = raf.getFilePointer();
raf.close();
}
The program blocks at while ((curLine = raf.readLine()).... after 15 seconds of run-time! (Note: that the program runs right for the first 15 seconds).
It appears that raf.readLine() is never hitting NULL, because I believe this log file is being written so fast that we go into an "endless cat and mouse" loop.
What's the best way to emulate Linux's tail?

I would think that you would be best served by grabbing a block of bytes based on the file's length, then release the file and parse a ByteArrayInputStream (instead of trying to read directly from the file).
So use RandomAccessFile#read(byte[]), and size the buffer using the returned file length. You won't always show the exact end of the file, but that is to be expected with this sort of polling algorithm.
As an aside, this algorithm is horrible - you are running IO operations in a crazy tight loop - the calls to File#length() will block, but not very much. Expect this routine to take your app to it's knees CPU-wise. I don't necessarily have a better solution for you (well - actually, I do - have the source application write to a stream instead of a file - but I recognize that isn't always feasible).
In addition to the above, you may want to introduce a polling delay (sleep the thread by 100ms each loop - it looks to me like you are displaying to a GUI - a 100ms delay won't hurt anyone, and will greatly improve the performance of the swing operations).
ok - final beef: You are adjusting a Swing component from what (I hope) is code not running on the EDT. Use SwingWorker#invokeLater() to update your text pane.

It appears I have found the problem and created a solution.
Under the else if statement:
while ((curLine = raf.readLine()) != null)
myTextPane.append(curLine);
This was the problem. the append(String) method of myTextPane (which is a derived class of JTextPane) evoked "setCaretPosition()" on every line append which IS BAD!!
That meant that setCaretPosition() was called 50-100 Hz trying to "scroll down." This caused a blocking overhead to the interface.
A simple solution was to create a StringBuffer class and append "curLine" until raf.readLine() read null.
Then, append the StringBuffer and voila ... no more blocking from setCaretPosition()!
Thanks to Kevin for bringing me towards the correct direction.

You could always exec the tail program:
BufferedReader in = new BufferedReader(new InputStreamReader(
Runtime.getRuntime().exec("tail -F /tmp/myFile.txt").getInputStream()));
String line;
while ((line = in.readLine()) != null) {
// process line
}

Java Heap Space (CMS with huge files)

EDIT:
Got the directory to live. Now there's another issue in sight:
The files in the storage are stored with their DB id as a prefix
to their file names. Of course I don't want the users to see those.
Is there a way to combine the response.redirect and the header setting
für filename and size?
best,
A
Hi again,
new approach:
Is it possible to create a IIS like virtual directory within tomcat in order
to avoid streaming and only make use of header redirect? I played around with
contexts but could'nt get it going...
any ideas?
thx
A
Hi %,
I'm facing a wired issue with the java heap space which is close
to bringing me to the ropes.
The short version is:
I've written a ContentManagementSystem which needs to handle
huge files (>600mb) too. Tomcat heap settings:
-Xmx700m
-Xms400m
The issue is, that uploading huge files works eventhough it's
slow. Downloading files results in a java heap space exception.
Trying to download a 370mb file makes tomcat jump to 500mb heap
(which should be ok) and end in an Java heap space exception.
I don't get it, why does upload work and download not?
Here's my download code:
byte[] byt = new byte[1024*1024*2];
response.setHeader("Content-Disposition", "attachment;filename=\"" + fileName + "\"");
FileInputStream fis = null;
OutputStream os = null;
fis = new FileInputStream(new File(filePath));
os = response.getOutputStream();
BufferedInputStream buffRead = new BufferedInputStream(fis);
while((read = buffRead.read(byt))>0)
{
os.write(byt,0,read);
os.flush();
}
buffRead.close();
os.close();
If I'm getting it right the buffered reader should take care of any
memory issue, right?
Any help would be highly appreciated since I ran out of ideas
Best regards,
W

If I'm getting it right the buffered
reader should take care of any memory
issue, right?
No, that has nothing to do with memory issues, it's actually unnecessary since you're already using a buffer to read the file. Your problem is with writing, not with reading.
I can't see anything immediately wrong with your code. It looks as though Tomcat is buffering the entire response instead of streaming it. I'm not sure what could cause that.
What does response.getBufferSize() return? And you should try setting response.setContentLength() to the file's size; I vaguely remember that a web container under certain circumstances buffers the entire response in order to determine the content length, so maybe that's what's happening. It's good practice to do it anyway since it enables clients to display the download size and give an ETA for the download.

Try using the setBufferSize and flushBuffer methods of the ServletResponse.

You better use java.nio for that, so you can read resources partially and free resources already streamed!
Otherwise, you end up with memory problems despite the settings you've done to the JVM environment.

My suggestions:
The Quick-n-easy: Use a smaller array! Yes, it loops more, but this will not be a problem. 5 kilobytes is just fine. You'll know if this works adequately for you in minutes.
byte[] byt = new byte[1024*5];
A little bit harder: If you have access to sendfile (like in Tomcat with the Http11NioProtocol -- documentation here), then use it
A little bit harder, again: Switch your code to Java NIO's FileChannel. I have very, very similar code running on equally large files with hundreds of concurrent connections and similar memory settings with no problem. NIO is faster than plain old Java streams in these situations. It uses the magic of DMA (Direct Memory Access) allowing the data to go from disk to NIC without ever going through RAM or the CPU. Here is a code snippet for my own code base...I've ripped out much to show the basics. FileChannel.transferTo() is not guaranteed to send every byte, so it is in this loop.
WritableByteChannel destination = Channels.newChannel(response.getOutputStream());
FileChannel source = file.getFileInputStream().getChannel();
while (total < length) {
long sent = source.transferTo(start + total, length - total, destination);
total += sent;
}

The following code is able to streaming data to the client, allocating only a small buffer (BUFFER_SIZE, this is a soft point since you may want to adjust it):
private static final int OUTPUT_SIZE = 1024 * 1024 * 50; // 50 Mb
private static final int BUFFER_SIZE = 4096;
#Override
protected void doGet(HttpServletRequest request,HttpServletResponse response)
throws ServletException, IOException {
String fileName = "42.txt";
// build response headers
response.setStatus(200);
response.setContentLength(OUTPUT_SIZE);
response.setContentType("text/plain");
response.setHeader("Content-Disposition",
"attachment;filename=\"" + fileName + "\"");
response.flushBuffer(); // write HTTP headers to the client
// streaming result
InputStream fileInputStream = new InputStream() { // fake input stream
int i = 0;
#Override
public int read() throws IOException {
if (i++ < OUTPUT_SIZE) {
return 42;
} else {
return -1;
}
}
};
ReadableByteChannel input = Channels.newChannel(fileInputStream);
WritableByteChannel output = Channels.newChannel(
response.getOutputStream());
ByteBuffer buffer = ByteBuffer.allocate(BUFFER_SIZE);
while (input.read(buffer) != -1) {
buffer.flip();
output.write(buffer);
buffer.clear();
}
input.close();
output.close();
}

Are you required to serve files using Tomcat? For this kind of tasks we have used separate download mechanism. We chained Apache -> Tomcat -> storage and then add rewrite rules for download. Then you just by-pass Tomcat and Apache will serve the file to client (Apache->storage). But if works only if you have files stored as files. If you read from DB or other type of non-file storage this solution cannot be used successfully. the overall scenario is that you generate download links for files as e.g. domain/binaries/xyz... and write redirect rule for domain/files using Apache mod_rewrite.

Do you have any filters in the application, or do you use the tcnative library? You could try to profile it with jvisualvm?
Edit: Small remark: Note that you have a HTTP response splitting attack possibility in the setHeader if you do not sanitize fileName.

Why don't you use tomcat's own FileServlet?
It can surely give out files much better than you can possible imagine.

A 2-MByte buffer is way too large! A few k should be ample. Megabyte-sized objects are a real issue for the garbage collector, since they often need to be treated separately from "normal" objects (normal == much smaller than a heap generation). To optimize I/O, your buffer only needs to be slightly larger than your I/O buffer size, i.e. at least as large as a disk block or network package.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

What's the best way to monitor an InputStream? - java

Just from a quick look at the docs, if you did... while (inStream.available() > 0 && (c = inStream.read()) != -1) It seems like it would double check that you can read without blocking before you actually read. I'm not certain on this though.

Related

Reading external process Error Stream heavily impacts performance

Usefulness of DELETE_ON_CLOSE

Best way to design Java file download manager

Java - RandomAccessFile (Emulating the Linux tail function)

Java Heap Space (CMS with huge files)

Categories

Resources