I have a Struts 1 web application that needs to upload fairly large files (>50 MBytes) from the client to the server. I'm currently using its built-in org.apache.struts.upload.DiskMultipartRequestHandler to handle the HTTP POST multipart/form-data requests. It's working properly, but it's also very very slow, uploading at about 100 KBytes per second.
Downloading the same large files from server to client occurs greater than 10 times faster. I don't think this is just the difference between the upload and download speeds of my ISP because using a simple FTP client to transfer the file to the same server takes less than 1/3 the time.
I've looked at replacing the built-in DiskMultipartRequestHandler with the newer org.apache.commons.fileupload package, but I'm not sure how to modify this to create the MultipartRequestHandler that Struts 1 requires.
Barend commented below that there is a 'bufferSize' parameter that can be set in web.xml. I increased the size of the buffer to 100 KBytes, but it didn't improve the performance. Looking at the implementation of DiskMultipartRequestHandler, I suspect that its performance could be limited because it reads the stream one byte at a time looking for the multipart boundary characters.
Is anyone else using Struts to upload large files?
Has anyone customized the default DiskMultipartRequestHandler supplied with Struts 1?
Do I just need to be more patient while uploading the large files? :-)
The page StrutsFileUpload on the Apache wiki contains a bunch of configuration settings you can use. The one that stands out for me is the default buffer size of 4096 bytes. If you haven't already, try setting this to something much larger (but not excessively large as a buffer is allocated for each upload). A value of 2MB seems reasonable. I suspect this will improve the upload rate a great deal.
Use Apache Commons,
this gives more flexibility to upload file . We can configure upload file size (Max file size) and temporary file location for swapping the file (this improves performance).
please visit this link http://commons.apache.org/fileupload/using.html
Related
I need to download multiple files through FTP in java. For this I wrote a code using FTPClient which is taking files one by one to download.
I need to take files from a server and download to another network. After I wrote the code, I found that downloading is taking more time to download each file as the file sizes are huge (more than 10GB). I decided to multithread the process i.e. run multiple files at a time. Can anybody help writing me FTP in multithreaded environment.
Although I feel that multithreading won't help as bandwidth of the network would remain same and would be divided among multiple threads and leading to slow download again. Please suggest!!
You have different stuff to check first:
your download speed
remote server's upload speed
maximum server upload speed for each connection
If the server limits the transfer speed for a single file to a threshold lower than it's maximum transfer speed, you can have some advantages by using multi-threading (e.g. with a limit of 10 Kb/s per connection and a maximum upload of 100 Kb/s, you can theoretically have 10 downloads in parallel). If not, multi-threading will not help you.
Also if your download is saturated (all your bandwidth is filled with a single download or the server's upload bandwidth is greater than your download) you will not have any kind of help by multi-threading.
If your multi-threading will be useful, just instantiate a new connection for each file and throw it in a separated thread.
I feel that multithreading won't help as bandwidth of the network would remain same and would be divided among multiple threads and leading to slow download again.
That could well be true. Indeed, if you have too many threads trying to download files at the same time, you are likely to either overload the FTP server or cause network congestion. Both can result in a net decrease in the overall data rate.
The solution is to use a bounded thread pool for the download threads, and tune the pool size.
It is also a good idea to reuse connections where possible, since creating a connection and authenticating the user take time ... and CPU resources at both ends.
I have been using Apache POI to process data coming from big files. Reading was done via the SAX event API which makes it efficient for large sets of data without consuming much memory.
However there's also a requirement that I need to update an existing template for the final report. This template may have more than 10MB (even 20MB in some cases).
Do you know a way to efficiently update a big template file (xslx)? Currently I am reading the entire contents of the template into memory and modify those contents (using XSSF from POI). My current method works for small files (under 5 MB) but for bigger files it fails with out of memory exception.
Is there a solution for this in Java? (not necessarily using Apache POI) Open source/free solutions are preferred but commercial is also good as long as it has a reasonable price.
Thank you,
Iulian
For large spreadsheet handling, it is advisable to use SXSSF
As far as what I can think of, Streaming classes is a bit slower compared to HSSF and XSSF, but is far more superior when it comes to memory management (Feel free to correct me).
This guy has made a few classes that can read Xlsx files and process them in XML. It returns an array with Strings, these Strings are practically the rows of an Xlsx file.
Link:
You could then use these arrays to load them line by line in a stream, instead of all of them at once.
Most likely the message you are facing is about heap space (java.lang.OutOfMemoryError: Java heap space), which will be triggered when you try to add more data into the heap space area in memory, but the size of this data is larger than the JVM can accommodate in the Java heap space. In many cases you are good to go with just increasing the heap size by specifying (or altering if present) the -Xmx parameter, similar to following:
-Xmx1024m
We are working with Enterprise application that involves the storing of XML logs to Secondary Storage as part each request processing. But it consuming more CPU time when writing logs to Secondary Storage. Thus effecting the entire application performance.
I tried to improve this with following know methods.
Using ZIPInputStream to write bytes to ZIP file. It is good in accordance to less file Size but it consumes more time than writing bytes to plain text file.
Using GZIPInputStream to write compressed bytes of data to GZIP file relatively this is also as far as performance is concerned.
So any body please suggest me the best way for achieving best performance with I/O and less in file size?
Server loading static resources too slowly - what server optimizations can I make?
Images + CSS content is loading way too slowly (relatively small files at that) are taking over 1 second each to load. What are some optimizations that I can do server-side to reduce those load times (Other than increasing server processing power/network speed).
The server is WebSphere.
There are plenty possibilies (sorted by importance):
Set proper Expires- and Last Modified-Header for all static resources. This can reduce overall requests for static resources dramatically. Thus reducing server load. No requests are the fastest requests with no payload.
Serve static resources from a separate cookie-less (sub-)domain.
Use CSS-Spites and combine often used graphics like Logos and Icons into one single large image.
Combine all your CSS in a single or just a few files. This reduces overall request count and increases frontend performance, too.
Optimize your image sizes lossless with tools like PngOut.
Pre-gzip your css (and js) files and serve them directly from memory. Do not read them from hard disc and compress on the fly.
Use a library like jawr if you do not want to do all these things on your own. Many of these things can jawr handle for you without having negative impacts on your development.
Let Apache webserver serve these static contents for you.
Use something like mod_proxy that relies on your Caching Headers to serve the contents for you. Apache is faster in serving static resources and more important it can be done from another system in front of your Websphere server.
Use a CDN for serving your static content.
Is it possible to wrap these file resources in a .jar file, then use the Java Zip and/or Java Jar APIs to read them?
If you employed a gzip filter to compress the output or static resources, make sure to exclude images as they render slow when gzipped on the server side before responding out.
You may want to read this Using IBM HTTP Server diagnostic capabilities with WebSphere
and this WebSphere tuning for the impatient: How to get 80% of the performance improvement with 20% of the effort
Make sure keep alive is on and functioning. Reduces the overall network overhead required.Please Refer this
Also, make sure you have enough memory allocated to the VM running the server. Using GC stats for logging memory usage and GC is a good idea...e.g. add these to the java VM:
-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
Simply put our system consists of a Server and an Agent. The Agent generates a huge binary file, which may be required to be transfered to the Server.
Given:
The system must cope with files up to 1G now, which is likely to grow to 10G in 2 years
The transfer must be over HTTP, because other ports may be closed.
This is not a file sharing system - the Agent just need to push the file to the Server.
Both the Agent and the Server are written in Java.
The binary file may contain sensitive information, so the transfer must be secure.
I am looking for techniques and libraries to help me with transfering huge files. Some of the topics, which I am aware of are:
Compression Which one to choose? We do not limit ourselves to gzip or deflate, just because they are the most popular for HTTP traffic. If there is some unusual compression scheme, which yields better results for our task - so be it.
Splitting Obviously, the file needs to be split and transfered in several parallel sessions.
Background Transfering a huge file takes a long time. Does it affect the solution, if at all?
Security Is HTTPS the way to go? Or should we take another approach, given the volume of data?
off-the-shelf I am fully prepared to code it myself (should be fun), but I cannot avoid the question whether there are any off-the-shelf solutions satisfying my demands.
Has anyone encountered this problem in their products and how was it dealt with?
Edit 1
Some may question the choice of HTTP as the transfer protocol. The thing is that the Server and the Agent may be quite remoted from each other, even if located in the same corporate network. We have already faced numerous issues related to the fact that customers keep only HTTP ports open on the nodes in their corporate networks. It does not leave us much choice, but use HTTP. Using FTP is fine, but it will have to be tunneled through HTTP - does it mean we still have all the benefits of FTP or will it cripple it to the point where other alternatives are more viable? I do not know.
Edit 2
Correction - HTTPS is always open and sometimes (but not always) HTTP is open as well. But that is it.
You can use any protocol on port 80. Using HTTP is a good choice, but you don't have to use it.
Compression Which one to choose? We do not limit ourselves to gzip or deflate, just because they are the most popular for HTTP traffic. If there is some unusual compression scheme, which yields better results for our task - so be it.
The best compression depends on the content. I would use Deflator for simplicity, however BZIP2 can give better results (requires a library)
For your file type you may find doing some compression specific to that type first, can make the data sent smaller.
Splitting Obviously, the file needs to be split and transfered in several parallel sessions.
This is no obvious to me. Downloading data in parallel improves performance by grabbing more of the available bandwidth (i.e. squeezing out other users of the same bandwidth) This may be undesirable or even pointless (if there are no other users)
Background Transfering a huge file takes a long time. Does it affect the solution, if at all?
You will want the ability to re-start the download at any point.
Security Is HTTPS the way to go? Or should we take another approach, given the volume of data?
I am sure its fine, regardless of the volume of data.
off-the-shelf I am fully prepared to code it myself (should be fun), but I cannot avoid the question whether there are any off-the-shelf solutions satisfying my demands.
I would try using existing web servers to see if they are up to the job. I would be surprised if there isn't a free web server which does all the above.
Here is a selection http://www.java-sources.net/open-source/web-servers