Using Rest webservices (WS) to upload file whose size is between 10 and 50 MB
At the moment, we use Java, Jax-RS and CXF for doing it.
The behavior of this stack is to buffer the uploaded file by writing them into a temporary file (because we have large files). This is fine for most users.
Is it possible to stream directly from the socket input?
(not from a whole file in memory nor from a temporary file)
My purpose is to have less overhead on IOs and CPUs (each file is written twice : 1 buffer and 1 final). The WS only have to write the files (sometimes several in the same HTTPrequest) to a path that I calculate from the HTTP query string.
Thanks for your attention
Related
I have a requirement where :
I have multiple files on the server location
My Response size limit is 100 MB, Max
So my use case here is to merge files and produce in zip format and send attachments to the client browser. But, here client browser has only one button "DOWNLOAD ALL".Based on the click of the button, all files which are located on the server should get downloaded as multiple zip files to the client.
For example, I have 7 files
1.txt - 24 MB
2.txt - 30 MB
3.txt - 30 MB
4.txt - 30 MB
5.txt - 40 MB
so, By clicking of button two zip files should get downloaded as 1.zip contains 1.txt,2.txt,3.txt because it. has around 100 MB, and the other 2.zip will contain 4.txt and 5.txt.
I came across multiple things on the web like zipping a file and sending it as a response, but it sends only a response once the channel gets closed after the response is transferred.
http://techblog.games24x7.com/2017/07/18/streaming-large-data-as-zipped-file-from-rest-services/
https://javadigest.wordpress.com/2012/02/13/downloading-multiple-files-using-multipart-response/
Moreover, UI can have multiple requests to the endpoint, but I have multiple users, so I may need to keep track of users and files. Any idea or implementation will be appreciated.
Thanks in advance.
"Moreover, UI can have multiple requests to the endpoint, but I have multiple users, so I may need to keep track of users and files. Any idea or implementation will be appreciated."
This can be solved by using spring-actuators and log4J. Using spring actuators you can monitor from where the request is coming.
Can be done using the dependency
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
and in application.properties file :-
management.endpoints.web.exposure.include=*
as by default only 2 endpoints are exposed.
Regarding the multiple file download... is there a specific limit? like the zip file must be of 100mb in size?
If yes, then you need to orchestrate your solution by calling multiple end points one by one and download the same.
Maybe I have a workaround regarding this, if you take a look into the video-streaming concept, it's the same as your use-case since a huge file size served for users. So; I would like to suggest this as a possible solution.
The concept of streaming is to load specific chunks of the required file based on the client's request till the stream ends. so assuming that the file size is 100MB then the client has the ability to specify in the request header the chunk bypassing the Range HTTP Header, and your backend should understand this header and respond with the required data till the file ends.
Take a look at the below link, maybe will give you more details:
https://saravanastar.medium.com/video-streaming-over-http-using-spring-boot-51e9830a3b8
I have a server-side application that runs through a large number of image URLs and uploads the images from these URLs to S3.
The files are served over HTTP. I download them using InputStream I get from an HttpURLConnection using the getInputStream method. I hand the InputStream to AWS S3 Client putObject method (AWS Java SDK v1) to upload the stream to S3. So far so good.
I am trying to introduce a new external image data source. The problem with this data source is that the HTTP server serving these images does not return a Content-Length HTTP header. This means I cannot tell how many bytes the image will be, which is a number required by the AWS S3 client to validate the image was correctly uploaded from the stream to S3.
The only ways I can think of dealing with this issue is to either get the server owner to add Content-Length HTTP header to their response (unlikely), or to download the file to a memory buffer first and then upload it to S3 from there.
These are not big files, but I have many of them.
When considering downloading the file first, I am worried about the memory footprint and concurrency implications (not being able to upload and download chunks of the same file at the same time).
Since I am dealing with many small files, I suspect that concurrency issues might be "resolved" if I focus on the concurrency of the multiple files instead of a single file. So instead of concurrently downloading and uploading chunks of the same file, I will use my IO effectively downloading one file while uploading another.
I would love your ideas on how to do this, best practices, pitfalls or any other thought on how to best tackle this issue.
We are using Commons FileUpload API for handling the files uploading. We use a disk item factory where the file is written at a temp location and then we get an InputStream from the file item to encrypt the file and write it to the final location. My question is that the encryption, when we run it as a standalone application, runs in 25 seconds (for a 1 GB file). But when we use the same in the web application it is taking 12 minutes. And stranger thing is that this works fine on a different server (both the standalone and web application take the same time to encrypt). So, is there any issue with FileUpload API that causes some kind of a lock on the file even after it's completely written to the temp location, which in turn slows down our encryption?
The issue was that the encryption block of the code had log statements, so for each chunk that was encrypted there was a log being sent out, it was really fast once it was commented out.
I was tasked at work to send our clients a file via finatra, directly from disk without loading into memory(these are very large files). Here are my questions:
0) How do I interact with the disk i/o without ever loading the information into memory?
1) When connecting a file inputstream to an http outputstream, does that actually load memory into ram?
2) I thought everything has to be loaded into memory to work with, transport, and send. How can one actually send contents directly to a network port w/o being loaded into memory?
3) Would the flow of memory be from the disk, to the cpu registers, onto network adapters buffer for it to be sent? How do I ensure that this is the flow without loading ram?
4) Is it possible to do this in Finatra
It's unfortunately not possible with Finatra 1.6. Streaming seems to be on the roadmap for 2.0, but there is no official word on its release. Right now, Request => Response relies on memorized inputs and outputs. The most efficient way to deal with bodies in Finatra is to keep them in ChannelBuffers so that there is at least only one instance of the bytes materialized.
I am trying to upload a file, My front end application is in PHP and backend engine is in Java. They both communicate through PHP-Java_bridge.
My first action was, when a file is posted to PHP page, it will retrieve its content.
$filedata= file_get_contents($tmpUploadedLocation);
and then pass this information to Java EJB façade which accepts byte array saveFileContents(byte[] contents)
Here is how in PHP I converted the $filedata into byte array.
$bytearrayData = unpack("C*",$filedata);
and finally called the Java service (Java service object was retrieved using php-java-bridge)
$javaService->saveFileContents($bytearrayData);
This works fine if file size is less, but if the size increase 2.9 MB, I receive an error and hence file contents are not saved on to the disk.
Fatal error: Allowed memory size of 134217728 bytes exhausted //This is PHP side error due to unpack
I am not sure how to do this, Above method is not accurate, Please I have few limits.
The engine(Java) is responsible for saving and retrieving the
contents.
PHP-HTML is the front end application, It could be any thing for now its just PHP
PHP communicate with Java using PHP-Java-Bridge
EJB's methods are accessed by PHP for saving and retrieving information.
Everything was working fine with above combination, but now its about upload and saving documents. It is EJB (Application Engine access point) that will be used for any front-end application (PHP or another java application through remote interface (lookups)).
My question is how File contents from PHP can be sent to Java, where it does not break any thing (Memory)?
Instead of converting a file into an array I'd try to pass it as string. Encode the string into base64 in PHP and decode it into array in Java.
Another option is to pass the file thru the filesystem. Some Linux systems have /dev/shm or /run/shm mounted to a tmpfs, which is often a good way to pass temporary data between programs without incurring a hard-drive overhead. A typical tmpfs algorithm is 1) create a folder; 2) remove old files from it (e.g. files older than a minute); 3) save the new file; 4) pass the file path to Java; 5) remove the file. Step 2 is important in order not to waste RAM if steps 3-5 are not completed for some reason.