I have a piece of code that retrieves a file either from a remote server or from the local disk.
I understand that URLConnection can handle both cases, so I was wondering if there was any performance advantage if I used FileInputStream to read the local file rather than just handing it off to URLConnection to read from disk?
No, there's no performance advantage to using FileInputStream over a URLConnection (well, unless you're counting the milliseconds of a handful of extra method calls).
Reading a file via a file:// URL eventually gets you a FileURLConnection (note that this is not part of the official Java library spec, just the Sun-based JREs). If you look at the code, you'll see that it's creating a FileInputStream to work with the file on disk. So other than walking a few layers further down in the stack, the code ends up exactly the same.
The reason why you'd want to use a FileInputStream directly is for clarity of your code. Turning a file path into a URL is a little ugly, and it'd be confusing to do it if you were only ever going to work with files.
In your case, where you need to work with URLs some of the time, it's quite convenient that you can use a file URL and only work with URLs. I imagine you've abstracted nearly all of the interesting logic to work on URLs and can do the ugly business of constructing a file or non-file URL elsewhere.
A FileInputStream obtains input bytes from a file in a file system. FileInputStream is meant for reading streams of raw bytes such as image data.
FileReader is meant for reading streams of characters.
In general, creating a connection to a URL is a multistep process:
The connection object is created by invoking the openConnection method on a URL.
The setup parameters and general request properties are manipulated.
The actual connection to the remote object is made, using the connect method.
The remote object becomes available. The header fields and the contents of the remote object can be accessed.
I think a good rule of thumb is to use the simplest code (object) possible in order to remain the most efficient. Think minimalist!
P.S. Not sure if you're just moving the file or reading its contents.
Related
I have a question regarding reading images in Java. I am trying to read an image using threads and i was curious whether by doing this:
myInputFile = new FileInputStream(myFile);
I already read the whole data or not. I already read it in 4 chunks using threads and i am curious whether I just read it twice, once with threads and once with FileInputStream, or what does FileInputStream exactly do. Thanks in advance!
The FileInputStream is not reading your file yet, just by calling it like: myInputFile = new FileInputStream(myFile);.
It basically only gives you a handle to the underlying file and prepares to read from it by opening a connection to that file. Also it runs some basic checks including whether the file exists and if its a proper file and not a directory.
Following is stated in the JavaDocs which you can find here:
Creates a FileInputStream by opening a connection to an actual file,
the file named by the File object file in the file system. A new
FileDescriptor object is created to represent this file connection.
First, if there is a security manager, its checkRead method is called
with the path represented by the file argument as its argument.
If the named file does not exist, is a directory rather than a regular
file, or for some other reason cannot be opened for reading then a
FileNotFoundException is thrown.
Only by calling the FileInputStream.read methods it starts to read and return the contents of the file.
Thereby the FileInputStream.read() method will only read one single byte of the file and the FileInputStream.read(byte[] b) method will read as many bytes as the size of the byte array b.
Edit:
Because reading a file byte by byte is pretty slow and the usage of the plain FileInputStream.read(byte[] b) method can be a bit cumbersome it's a good practice to use the BufferedInputStream to process files in Java.
It'll read by default the next 8192 bytes of a file and buffer it in-memory for faster access. So the BufferedInputStream.read method will still only return a single byte per call, but in the BufferedInputStream it'll mainly be served from an internal buffer. As long the requested bytes are in this buffer, they'll be served from it. The underlying file will be accessed again only when really needed (-> the requested byte is not in the buffer anymore). This drastically reduces the number of read accesses to the hardware (which in comparison is the slowest operation in this process) and therefore boosts the reading performance a lot.
The initialization looks like this:
InputStream i = new BufferedInputStream(new FileInputStream(myFile));
The handling of it is exactly same as with the 'plain' FileInputStream, since they share the same InputStream interface.
I create PDF docs in memory as OutputStreams. These should be uploaded to S3. My problem is that it's not possible to create a PutObjectRequest from an OutputStream directly (according to this thread in the AWS dev forum). I use aws-java-sdk-s3 v1.10.8 in a Dropwizard app.
The two workarounds I can see so far are:
Copy the OutputStream to an InputStream and accept that twice the amount of RAM is used.
Pipe the OutputStream to an InputStream and accept the overhead of an extra thread (see this answer)
If i don't find a better solution I'll go with #1, because it looks as if I could afford the extra memory more easily than threads/CPU in my setup.
Is there any other, possibly more efficient way to achive this that I have overlooked so far?
Edit:
My OutputStreams are ByteArrayOutputStreams
I solved this by subclassing ConvertibleOutputStream:
public class ConvertibleOutputStream extends ByteArrayOutputStream {
//Craetes InputStream without actually copying the buffer and using up mem for that.
public InputStream toInputStream(){
return new ByteArrayInputStream(buf, 0, count);
}
}
What's the actual type of your OutputStream? Since it's an abstract class, there's no saying where the data actually goes (or if it even goes anywhere).
But let's assume that you're talking about a ByteArrayOutputStream since it at least keeps the data in memory (unlike many many others).
If you create a ByteArrayInputStream out of its buffer, there's no duplicated memory. That's the whole idea of streaming.
another workaround is to use presigned url feature of s3.
since presigned url allows you to upload files to s3 with http put or post, it is possible to send your output stream to HttpURLConnection.
sample code from amazon
I am using Spring-MVC and I need to send a MP4 file back to the user. The MP4 files are, of course, very large in size (> 2 GB).
I found this SO thread Downloading a file from spring controllers, which shows how to stream back a binary file, which should theoretically work for my case. However, what I am concerned about is efficiency.
In one case, an answer may implicate to load all the bytes into memory.
byte[] data = SomeFileUtil.loadBytes(new File("somefile.mp4"));
In another case, an answer suggest using IOUtils.
InputStream is = new FileInputStream(new File("somefile.mp4"));
OutputStream os = response.getOutputStream();
IOUtils.copy(is, os);
I wonder if either of these are more memory efficient than simply defining a resource mapping?
<resources mapping="/videos/**" location="/path/to/videos/"/>
The resource mapping may work, except that I need to protect all requests to these videos, and I do not think resource mapping will lend itself to logic that protects the content.
Is there another way to stream back binary data (namely, MP4)? I'd like something that's memory efficient.
I would think that defining a resource mapping would be the cleanest way of handling it. With regards to protecting access, you can simply add /videos/** to your security configuration and define what access you allow for it via something like
<security:intercept-url pattern="/videos/**" access="ROLE_USER, ROLE_ADMIN"/>
or whatever access you desire.
Also, you might consider saving these large mp4's to a cloud storage and/or CDN such as Amazon S3 (with our without CloudFront).
Then you can generate unique urls which will last as long as you want them to. Then the download is handled by Amazon rather than having to use the computing power, data space, and memory of your web server to serve up the large resource files. Also, if you use something like CloudFront, you can configure it for streaming rather than download.
Loading the entire file into memory is worse, as well as using more memory and being non-scalable. You don't transmit any data until you've loaded it all, which adds all that latency.
I have some text in a String that I can't write to disk. I need to pass it to a method which only accepts File as a type. What would be the chain of conversions through which I could do this? I imagine I start with ByteArrayInputStream, but where next?
The problem is that java.io.File does not provides any method to read from it, it's just a reference to a file in a file system, and it may not exist.
The method you are calling may just get the full address of the file using file.getAbsolutePath() and use that to open an InputStream.
If there's a method that receives an InputStream then you could send your ByteArrayInputStream.
Even if the method could receive an URL, then you could open up a little HTTP server and serve the data... But with a file it's kind of difficult.
I'm writing a web application and want the user to be able click a link and get a file download.
I have an interface is in a third party library that I can't alter:
writeFancyData(File file, Data data);
Is there an easy way that I can create a file object that I can pass to this method that when written to will stream to the HTTP response?
Notes:
Obviously I could just write a temporary file and then read it back in and then write it the output stream of the http response. However what I'm looking for is a way to avoid the file system IO. Ideally by creating a fake file that when written to will instead write to the output stream of the http response.
e.g.
writeFancyData(new OutputStreamBackedFile(response.getOutputStream()), data);
I need to use the writeFancyData method as it writes a file in a very specific format that I can't reproduce.
Assuming writeFancyData is a black box, it's not possible. As a thought experiment, consider an implementation of writeFancyData that did something like this:
public void writeFancyData(File file, Data data){
File localFile = new File(file.getPath());
...
// process data from file
...
}
Given the only thing you can return from any extended version of File is the path name, you're just not going to be able to get the data you want into that method. If the signature included some sort of stream, you would be in a lot better position, but since all you can pass in is a File, this can't be done.
In practice the implementation is probably one of the FileInputStream or FileReader classes that use the File object really just for the name and then call out to native methods to get a file descriptor and handle the actual i/o.
As dlawrence writes the API it is impossible to determine what the API is doing with the File.
A non-java approach is to create a named pipe. You could establish a reader for the pipe in your program, create a File on that path and pass it to API.
Before doing anything so fancy, I would recommend analyzing performance and verify that disk i/o is indeed a bottleneck.
Given that API, the best you can do is to give it the File for a file in a RAM disk filesystem.
And lodge a bug / defect report against the API asking for an overload that takes a Writer or OutputStream argument.