I have a requirement where :
I have multiple files on the server location
My Response size limit is 100 MB, Max
So my use case here is to merge files and produce in zip format and send attachments to the client browser. But, here client browser has only one button "DOWNLOAD ALL".Based on the click of the button, all files which are located on the server should get downloaded as multiple zip files to the client.
For example, I have 7 files
1.txt - 24 MB
2.txt - 30 MB
3.txt - 30 MB
4.txt - 30 MB
5.txt - 40 MB
so, By clicking of button two zip files should get downloaded as 1.zip contains 1.txt,2.txt,3.txt because it. has around 100 MB, and the other 2.zip will contain 4.txt and 5.txt.
I came across multiple things on the web like zipping a file and sending it as a response, but it sends only a response once the channel gets closed after the response is transferred.
http://techblog.games24x7.com/2017/07/18/streaming-large-data-as-zipped-file-from-rest-services/
https://javadigest.wordpress.com/2012/02/13/downloading-multiple-files-using-multipart-response/
Moreover, UI can have multiple requests to the endpoint, but I have multiple users, so I may need to keep track of users and files. Any idea or implementation will be appreciated.
Thanks in advance.
"Moreover, UI can have multiple requests to the endpoint, but I have multiple users, so I may need to keep track of users and files. Any idea or implementation will be appreciated."
This can be solved by using spring-actuators and log4J. Using spring actuators you can monitor from where the request is coming.
Can be done using the dependency
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
and in application.properties file :-
management.endpoints.web.exposure.include=*
as by default only 2 endpoints are exposed.
Regarding the multiple file download... is there a specific limit? like the zip file must be of 100mb in size?
If yes, then you need to orchestrate your solution by calling multiple end points one by one and download the same.
Maybe I have a workaround regarding this, if you take a look into the video-streaming concept, it's the same as your use-case since a huge file size served for users. So; I would like to suggest this as a possible solution.
The concept of streaming is to load specific chunks of the required file based on the client's request till the stream ends. so assuming that the file size is 100MB then the client has the ability to specify in the request header the chunk bypassing the Range HTTP Header, and your backend should understand this header and respond with the required data till the file ends.
Take a look at the below link, maybe will give you more details:
https://saravanastar.medium.com/video-streaming-over-http-using-spring-boot-51e9830a3b8
Related
I have the most basic problem ever. The user wants to export some data which is around 20-70k records and can take from 20-40 seconds to execute and the file can be around 5-15MB.
Currently my code is as such:
User clicks a button which makes an API call to a Java Lambda
AWS Lambda Handler calls a method to get the data from DB and generate excel file using Apache POI
Set Response Headers and send the file as XLSX in the response body
I am now faced with two bottlenecks:
API Gateway times out after 29 seconds; if file takes longer to
generate it will not work and user get 504 in the browser
Response from lambda can only be 6MB, if file is bigger the user will
get 413/502 in the browser
What should be my approach to just download A GENERATED RUNTIME file (not pre-built in s3) using AWS?
If you want to keep it simple (no additional queues or async processing) this is what I'd recommend to overcome the two limitations you describe:
Use the new AWS Lambda Endpoints. Since that option doesn't use the AWS API Gateway, you shouldn't be restricted to the 29-sec timeout (not 100% sure about this).
Write the file to S3, then get a temporary presigned URL to the file and return a redirect (HTTP 302) to the client. This way you won't be restricted to the 6MB response size.
Here are the possible options for you.
Use Javascript skills to rescue. Accept the request from browser/client and immediately respond from server that your file preparation is in progress. Meanwhile continue preparing the file in the background (sperate job). Using java script, keep polling the status of file using separate request. Once the file is ready return it back.
Smarter front-end clients use web-sockets to solve such problems.
In case DB query is the culprit, cache the data on server side, if possible, for you.
When your script takes more than 30s to run on your server then you implement queues, you can get help from this tutorial on how to implement queues using SQS or any other service.
https://mikecroft.io/2018/04/09/use-aws-lambda-to-send-to-sqs.html
Once you implement queues your timeout issue will be solved because now you are fetching your big data records in the background thread on your server.
Once the excel file is ready in the background then you have to save it in your s3 bucket or hard disk on your server and create a downloadable link for your user.
Once the download link is created you will send that to your user via email. In this case, you should have your user email.
So the summary is Apply queue -> send a mail with the downloadable file.
Instead of some sophisticated solution (though that would be interesting).
Inventory. You will split the Excel in portions of say 10 k rows. Calculate the number of docs.
For every Excel generation called you have a reduced work load.
Whether e-mail, page with links, using a queue you decide.
The advantage is staying below e-mail limits, response time-outs, denial of service.
(In Excel one could also create a master document, but I have no experience.)
I have a servlet that accepts large (up to 4GB) binary file uploads. The submitted file is transmitted as the body of an HTTP POST.
The servlet has to perform some time-consuming processing as it receives the file, and it has to finish doing that before sending the response. As a result, it can appear to a fast client that the server has hung because the client can be waiting for a minute or two after sending the the last few bytes before getting the response.
Is there a way either within Tomcat or within the servlet API to throttle back the speed at which the server accepts the file? I would like it to appear to the client that the server is accepting the file at (for example) 10MB/second rather than it accepting the file at 50MB/second and then taking a few minutes after receiving the body to return a response.
Thanks.
I'm extending on the comment of Mark Thomas here because I feel that this is worth being an answer (or the answer), rather than a comment. Mark, let me know if you want to convert the comment yourself and I'll happily delete mine.
John, you're trying to solve your problem in a way that imposes severe limitations: What's the bandwidth that you want to throttle to? What happens when the server is upgraded to a beefier CPU and can process more quickly? What if multiple uploads happen at the same time?
You probably want to have an upload of 4G in as quick a time as possible - imagine the connection going down in the middle - in a web application this typically means you'll have to restart the upload from the beginning. Thus you should decouple your processing from the upload procedure as much as possible.
You also don't mention the file format that gets uploaded: If it happens to be a zip file, note that the server can't do anything with the file until it's fully transmitted, as zip files have the directory of contents at their end. (this might be old knowledge, but at least the old spec had it this way. Someone correct me if this changed)
So: The proper way: Accept the file for processing, signal that you received it and are processing. If you like: Implement Ajax updates once you're done. In the simplest case: "click here to see if processing finished" or frequently reload the page. Anything works and everything is better than throttling throughput on this layer.
I have done a backup and restore application for java phone including nokia, it works fine but pictures larger than 1 MB cannot be uploaded is that possible to upload a file larger than 1 MB, if so please suggest me whether it is possible on HTTP or FTP.
Thank you.
Have a look at this step by step tutorial. What you need is to send files in multiple parts over a persistent HTTP connection.
Uploading files to HTTP server using POST on Android.
Using Rest webservices (WS) to upload file whose size is between 10 and 50 MB
At the moment, we use Java, Jax-RS and CXF for doing it.
The behavior of this stack is to buffer the uploaded file by writing them into a temporary file (because we have large files). This is fine for most users.
Is it possible to stream directly from the socket input?
(not from a whole file in memory nor from a temporary file)
My purpose is to have less overhead on IOs and CPUs (each file is written twice : 1 buffer and 1 final). The WS only have to write the files (sometimes several in the same HTTPrequest) to a path that I calculate from the HTTP query string.
Thanks for your attention
I am very new to Apache camel and I am exploring how to create a rout which pulls data from ftp for instance each 15 minutes and pulls only new or updated files, so if some files were downloaded early and still the same (unchanged) ftp loader should not load them to the destination folder.
Any advices are warmly appreciated.
UPDATE #1
I've already noticed that I need to look at the FTP2, and actually I've already made a progress, the last thing that I want to clarify: consumer.dealy defines delay between each download attempt, for instance consumer.delay = 5s and at the first attempt ftp contains 5 files, consumer pulls data to somewhere and waites 5s at the second attempt ftp still the same and camel just does nothing, after that to ftp arrives additional 5 files and after 5 seconds ftp consumer downloads these just arrived new files or consumer.delay just makes consumer wait between each download of file (file#1 -> 5s -> file#2 -> 5s -> etc...)
I want to achieve first scenario.
Also, I observed that once some files were downloaded to the destination folder, I mean from ftp to local file system, this files will be ignored in subsequent data loads, even if this files were deleted on the local file system, how I can tell to camel to download again deleted files, how it stores information about already loaded files? And it seems that it downloads all files each time even files were downloaded at first data pull. Do I need to write a filter to exclude already downloaded files?
there is FTP component for apache camel http://camel.apache.org/ftp.html
use "consumer.delay" property to pull data for delay in milliseconds between each poll.
for implementation details look here http://architects.dzone.com/articles/apache-camel-integration