How can I read a Base64 file that comes as a chain? - java

I am currently developing a REST service which receives in its request a field where it is passed a file in base 64 format ("n" characters come). What I do within the service logic is to convert that character string to a File to save it in a predetermined path.
But the problem is that when the file is too large (3MB) the service becomes slow and takes a long time to respond.
This is the code I am using:
String filename = "TEXT.DOCX"
BufferedOutputStream stream = null;
// THE FIELD base64file IS WHAT A STRING IN BASE FORMAT COMES FROM THE REQUEST 64
byte [] fileByteArray = java.util.Base64.getDecoder (). decode (base64file);
// VALID FILE SIZE
if ((1 * 1024 * 1024 <fileByteArray.length) {
    logger.info ("The file [" + filename + "] is too large");
} else {
    stream = new BufferedOutputStream (new FileOutputStream (new File ("C: \" + filename)));
    stream.write (fileByteArray);
}
How can I do to avoid this inconvenience. And that my service does not take so long to convert the file to File.

Buffering does not improve your performance here, as all you are trying to do is simply write the file as fast as possible. Generally it looks fine, change your code to directly use the FileOutputStream and see if it betters things:
try (FileOutputStream stream = new FileOutputStream(path)) {
stream.write(bytes);
}
Alternatively you could also try using something like Apache Commons to do the task for you:
FileUtils.writeByteArrayToFile(new File(path), bytes);

Try the following, also for large files.
Path outPath = Paths.get(filename);
try (InputStream in = Base64.getDecoder ().wrap(base64file)) {
Files.copy(in, outPath);
}
This keeps only a buffer in memory. Your code might become slow because of taking more memory.
wrap takes an InputStream which you should provide, not the entire String.

From Network point of view:
Both json and xml can support large amount of data exchange. And, 3MB is not really huge. But, there is a limitation on how much browser can handle (if this call is from a user interface).
Also, web server like Tomcat has property to handle 2MB by default (check maxPostSize http://tomcat.apache.org/tomcat-6.0-doc/config/http.html#Common_Attributes)
You can also try chunking the request payload (although it shouldn't be required for a 3MB file)
From Implementation point of view:
Write operation on your disk could be slow. It also depends on your OS.
If your file size is really large, you can use Java FileChannel class with ByteBuffer.
To know the cause of slowness (network delay or code), check the performance with a simple Java program against the web service call.

Related

Java | method to write a datahandler to a file takes more time than expected

I am trying to read the mails from the MS Exchange by using camel and getting the attachments as DataHandler. A 10MB file takes around 3hrs to write into the location.
File outputFile = new File(someDirectory, someFileName);
DataHandler attachment_data = destination1Attachments.get("someFileName.txt");
try (FileOutputStream fos = new FileOutputStream(outputFile)) {
attachment_data.writeTo(fos);
}
I have also noticed that sometimes a 6 to 7Mb file takes around 2 to 3 minutes and when another mail comes just after that it takes more time than expected.
Because of GC ?
Trying to find the exact root cause or any other method to write the data to the file.
Update 1 :
Tried using BufferedOutputStream around FileOutputSteam as mentioned by #user207421 in the comment. No much change could find (just 1sec or little more).
This could be due to the default implementation of write mechanism.
attachment_data.writeTo(fos);
If the DataHandler.getDataSource()!=null then this theory will work
In this method implementation 8 bytes are getting read at a time and writing into the stream. The number of read and writes are more and this might be causing the issue.
Try reading the on your own from DataHandler.getInputStream and write to file by increasing the read content from the input stream.
One must assume that the object is loaded in memory or writeTo very inefficient. Hence specify the DataFlavor and inspect attachment_data.getTransferDataFlavors().
DataFlavor flavor = new DataFlavor(InputStream.class, "application/octetstream");
try (InputStream in = (InputStream) attachment_data.getTransferData(flavor)) {
Some fiddling needed.

Creating Zip file while client is downloading

I try to develop something like dropbox(very basic one). For one file to download, it's really easy. Just use servletoutputstream. what i want is: when client asks me multiple file, i zip files in server side then send to user. But if file is big it takes too many times to zip them and send to user.
is there any way to send files while they are compressing?
thanks for your help.
Part of the Java API for ZIP files is actually desgined to provide "on the fly" compression. It all fits nicely both with the java.io API and the servlet API, which means this is even... kind of easy (no multithreading required - even for performance reason, because usually your CPU will probably be faster at ZIPping than your network will be at sending contents).
The part you'll be interacting with is ZipOutputStream. It is a FilterOutputStream (which means it is designed to wrap an outputstream that already exists - in your case, that would be the respone's OutputStream), and will compress every byte you send it, using ZIP compression.
So, say you have a get request
protected void doGet(HttpServletRequest req, HttpServletResponse resp)
throws ServletException, IOException {
// Your code to handle the request
List<YourFileObject> responseFiles = ... // Whatever you need to do
// We declare that the response will contain raw bytes
response.setContentType("application/octet-stream");
// We open a ZIP output stream
try (ZipOutputStream zipStream = new ZipOutputStream(response.getOutputStream()) {// This is Java 7, but not that different from java 6
// We need to loop over each files you want to send
for(YourFileObject fileToSend : responseFiles) {
// We give a name to the file
zipStream.putNextEntry(new ZipEntry(fileToSend.getName()));
// and we copy its content
copy(fileToSend, zipStream);
}
}
}
Of course, you should do proper exception handling. A couple quick notes though :
The ZIP file format mandates that each file has a name, so you must create a new ZipEntry each time you start a new file (you'll probably get an IllegalStateException if you do not, anyway)
Proper use of the API would be that you close each entry once you are done writing to it (at the end of the file). BUT : the Java implementation does that for you : each time you call putNextEntry it closes the previous one (if need be) all by itself
Likewise, you must not forget to close the ZIP stream, beacuse, this will properly close the last entry AND flush everything that is needed to create a proper ZIP file. Failure to do so will result in a corrupt file. Here, the try with resources statement does this : it closes the ZipOutputStream once everything is written to it.
The copy method here is just what you would use to transfert all the bytes from the original file to the outputstream, there is nothing ZIP specific about it. Just call outputStream.write(byte[] bytes).
**EDIT : ** to clarify...
For example, given a YourFileType that has the following methods :
public interface YourFileType {
public byte[] getContent();
public InputStream getContentAsStream();
}
Then the copy method could look like (this is all very basic Java IO, you could maybe use a library such as commons io to not reinvent the wheel...)
public void copy(YourFileType file, OutputStream os) throws IOException {
os.write(file.getContent());
}
Or, for a full streaming implementation :
public void copy(YourFileType file, OutputStream os) throws IOException {
try (InputStream fileContent = file.getContentAsStream()) {
byte[] buffer = new byte[4096]; // 4096 is kind of a magic number
int readBytesCount = 0;
while((readBytesCount = fileContent.read(buffer)) >= 0) {
os.write(buffer, 0, readBytesCount);
}
}
}
Using this kind of implementation, your client will start receiveing a response almost as soon as you start writing to the ZIPOutputStream (the only delay would be that of internal buffers), meaning it should not timeout (unless you spent too long buliding the content to send - but that would not be the ZIPping part fault's).

Java- using an InputStream as a File

I'm trying to generate a PDF document from an uploaded ".docx" file using JODConverter.
The call to the method that generates the PDF is something like this :
File inputFile = new File("document.doc");
File outputFile = new File("document.pdf");
// connect to an OpenOffice.org instance running on port 8100
OpenOfficeConnection connection = new SocketOpenOfficeConnection(8100);
connection.connect();
// convert
DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
converter.convert(inputFile, outputFile);
// close the connection
connection.disconnect();
I'm using apache commons FileUpload to handle uploading the docx file, from which I can get an InputStream object. I'm aware that Java.io.File is just an abstract reference to a file in the system.
I want to avoid the disk write (saving the InputStream to disk) and the disk read (reading the saved file in JODConverter).
Is there any way I can get a File object refering to an input stream? just any other way to avoid disk IO will also do!
EDIT: I don't care if this will end up using a lot of system memory. The application is going to be hosted on a LAN with very little to zero number of parallel users.
File-based conversions are faster than stream-based ones (provided by StreamOpenOfficeDocumentConverter) but they require the OpenOffice.org service to be running locally and have the correct permissions to the files.
Try the doc to avoid disk writting:
convert(java.io.InputStream inputStream, DocumentFormat inputFormat, java.io.OutputStream outputStream, DocumentFormat outputFormat)
There is no way to do it and make the code solid. For one, the .convert() method only takes two Files as arguments.
So, this would mean you'd have to extend File, which is possible in theory, but very fragile, as you are required to delve into the library code, which can change at any time and make your extended class non functional.
(well, there is a way to avoid disk writes if you use a RAM-backed filesystem and read/write from that filesystem, of course)
Chances are that commons fileupload has written the upload to the filesystem anyhow.
Check if your FileItem is an instance of DiskFileItem. If this is the case the write implementation of DiskFileItem willl try to move the file to the file object you pass. You are not causing any extra disk io then since the write already happened.

Image IO strange behavior resets the input stream

I am seeing a strange behavior with ImageIO.read() method.
I pass the InputStream to this method and when I try to read it for the second time it fails to read and returns null.
I am trying to upload images to the Amazon S3 and I want to create 3 version of the image. The original and 2 thumbnails. My problem is that when I want to create the 2 thumbnails I need to read the InputStream using the ImageIO.read(). If I run this method 2 for the same InputStream I get the null for the second read.
I can circumvent this problem by reading only one and passing the same BufferedImage to the scaling method. However I still need the InputStream that my method gets to pass to the AmazonS3 services in other to upload the original file as well.
So my question is does anyone have any idea what happens to the input stream after ImageIO reads it for the first time?
Code sample below
public String uploadImage(InputStream stream, String filePath, String fileName, String fileExtension) {
try {
String originalKey = filePath + fileName + "." + fileExtension;
String smallThumbKey = filePath + fileName + ImageConst.Path.SMALL_THUMB + "." + fileExtension;
String largetThumbKey = filePath + fileName + ImageConst.Path.LARGE_THUMB + "." + fileExtension;
BufferedImage image = ImageIO.read(stream);
InputStream smallThumb = createSmallThumb(image, fileExtension);
InputStream largeThumb = createLargeThumb(image, fileExtension);
uploadFileToS3(originalKey, stream);
uploadFileToS3(smallThumbKey, smallThumb);
uploadFileToS3(largetThumbKey, largeThumb);
return originalKey;
} catch (IOException ex) {
Logger.getLogger(ManageUser.class.getName()).log(Level.SEVERE, null, ex);
}
return null;
}
ImageIO.read is going to read to the end of the input stream. Meaning there's no data left to read, which is why you're getting null when you try and read more data from it.
If you want to reuse the input stream, you'll have to call reset() on it; but that'll only work if the underlying InputStream implementation supports resetting, see markSupported() of InputStream.
That's the simple, but naive fix.
Keep in mind, you've already read the image into memory, so you don't really need to do that. This is a little clumsy, but you can write it out to a ByteArrayOutputStream, then build a new ByteArrayInputStream off of that.
If I were doing this, I'd probably read it into a byte array to begin with. Check out Commons IOUtils.read() for that. Then I'd build a new ByteArrayInputStream and reset() that as needed since it definitely supports marking.
I suppose, you could wrap your original input stream in a BufferedInputStream and use mark() to save the starting position within the stream, and reset() to return to it.
To fully answer your question: every stream gets read sequentially (often there is some kind of internal pointer pointing to the current position). After the first ImageIO.read(), the stream is read until its end, thus, any further read operation won't return any data (usually -1 to indicate the end of the stream). Using mark() you can save a certain position and later on return to it using reset().

Java Heap Space (CMS with huge files)

EDIT:
Got the directory to live. Now there's another issue in sight:
The files in the storage are stored with their DB id as a prefix
to their file names. Of course I don't want the users to see those.
Is there a way to combine the response.redirect and the header setting
für filename and size?
best,
A
Hi again,
new approach:
Is it possible to create a IIS like virtual directory within tomcat in order
to avoid streaming and only make use of header redirect? I played around with
contexts but could'nt get it going...
any ideas?
thx
A
Hi %,
I'm facing a wired issue with the java heap space which is close
to bringing me to the ropes.
The short version is:
I've written a ContentManagementSystem which needs to handle
huge files (>600mb) too. Tomcat heap settings:
-Xmx700m
-Xms400m
The issue is, that uploading huge files works eventhough it's
slow. Downloading files results in a java heap space exception.
Trying to download a 370mb file makes tomcat jump to 500mb heap
(which should be ok) and end in an Java heap space exception.
I don't get it, why does upload work and download not?
Here's my download code:
byte[] byt = new byte[1024*1024*2];
response.setHeader("Content-Disposition", "attachment;filename=\"" + fileName + "\"");
FileInputStream fis = null;
OutputStream os = null;
fis = new FileInputStream(new File(filePath));
os = response.getOutputStream();
BufferedInputStream buffRead = new BufferedInputStream(fis);
while((read = buffRead.read(byt))>0)
{
os.write(byt,0,read);
os.flush();
}
buffRead.close();
os.close();
If I'm getting it right the buffered reader should take care of any
memory issue, right?
Any help would be highly appreciated since I ran out of ideas
Best regards,
W
If I'm getting it right the buffered
reader should take care of any memory
issue, right?
No, that has nothing to do with memory issues, it's actually unnecessary since you're already using a buffer to read the file. Your problem is with writing, not with reading.
I can't see anything immediately wrong with your code. It looks as though Tomcat is buffering the entire response instead of streaming it. I'm not sure what could cause that.
What does response.getBufferSize() return? And you should try setting response.setContentLength() to the file's size; I vaguely remember that a web container under certain circumstances buffers the entire response in order to determine the content length, so maybe that's what's happening. It's good practice to do it anyway since it enables clients to display the download size and give an ETA for the download.
Try using the setBufferSize and flushBuffer methods of the ServletResponse.
You better use java.nio for that, so you can read resources partially and free resources already streamed!
Otherwise, you end up with memory problems despite the settings you've done to the JVM environment.
My suggestions:
The Quick-n-easy: Use a smaller array! Yes, it loops more, but this will not be a problem. 5 kilobytes is just fine. You'll know if this works adequately for you in minutes.
byte[] byt = new byte[1024*5];
A little bit harder: If you have access to sendfile (like in Tomcat with the Http11NioProtocol -- documentation here), then use it
A little bit harder, again: Switch your code to Java NIO's FileChannel. I have very, very similar code running on equally large files with hundreds of concurrent connections and similar memory settings with no problem. NIO is faster than plain old Java streams in these situations. It uses the magic of DMA (Direct Memory Access) allowing the data to go from disk to NIC without ever going through RAM or the CPU. Here is a code snippet for my own code base...I've ripped out much to show the basics. FileChannel.transferTo() is not guaranteed to send every byte, so it is in this loop.
WritableByteChannel destination = Channels.newChannel(response.getOutputStream());
FileChannel source = file.getFileInputStream().getChannel();
while (total < length) {
long sent = source.transferTo(start + total, length - total, destination);
total += sent;
}
The following code is able to streaming data to the client, allocating only a small buffer (BUFFER_SIZE, this is a soft point since you may want to adjust it):
private static final int OUTPUT_SIZE = 1024 * 1024 * 50; // 50 Mb
private static final int BUFFER_SIZE = 4096;
#Override
protected void doGet(HttpServletRequest request,HttpServletResponse response)
throws ServletException, IOException {
String fileName = "42.txt";
// build response headers
response.setStatus(200);
response.setContentLength(OUTPUT_SIZE);
response.setContentType("text/plain");
response.setHeader("Content-Disposition",
"attachment;filename=\"" + fileName + "\"");
response.flushBuffer(); // write HTTP headers to the client
// streaming result
InputStream fileInputStream = new InputStream() { // fake input stream
int i = 0;
#Override
public int read() throws IOException {
if (i++ < OUTPUT_SIZE) {
return 42;
} else {
return -1;
}
}
};
ReadableByteChannel input = Channels.newChannel(fileInputStream);
WritableByteChannel output = Channels.newChannel(
response.getOutputStream());
ByteBuffer buffer = ByteBuffer.allocate(BUFFER_SIZE);
while (input.read(buffer) != -1) {
buffer.flip();
output.write(buffer);
buffer.clear();
}
input.close();
output.close();
}
Are you required to serve files using Tomcat? For this kind of tasks we have used separate download mechanism. We chained Apache -> Tomcat -> storage and then add rewrite rules for download. Then you just by-pass Tomcat and Apache will serve the file to client (Apache->storage). But if works only if you have files stored as files. If you read from DB or other type of non-file storage this solution cannot be used successfully. the overall scenario is that you generate download links for files as e.g. domain/binaries/xyz... and write redirect rule for domain/files using Apache mod_rewrite.
Do you have any filters in the application, or do you use the tcnative library? You could try to profile it with jvisualvm?
Edit: Small remark: Note that you have a HTTP response splitting attack possibility in the setHeader if you do not sanitize fileName.
Why don't you use tomcat's own FileServlet?
It can surely give out files much better than you can possible imagine.
A 2-MByte buffer is way too large! A few k should be ample. Megabyte-sized objects are a real issue for the garbage collector, since they often need to be treated separately from "normal" objects (normal == much smaller than a heap generation). To optimize I/O, your buffer only needs to be slightly larger than your I/O buffer size, i.e. at least as large as a disk block or network package.

Categories