I need to read about millions of xmls (about few gbs
) and stream them over http via rest GET call with low latency. What would be the options to achieve this with java and/or open source tools.
Thank you
One option is to do streaming attachment over SOAP using MTOM. See link
Related
I have the most basic problem ever. The user wants to export some data which is around 20-70k records and can take from 20-40 seconds to execute and the file can be around 5-15MB.
Currently my code is as such:
User clicks a button which makes an API call to a Java Lambda
AWS Lambda Handler calls a method to get the data from DB and generate excel file using Apache POI
Set Response Headers and send the file as XLSX in the response body
I am now faced with two bottlenecks:
API Gateway times out after 29 seconds; if file takes longer to
generate it will not work and user get 504 in the browser
Response from lambda can only be 6MB, if file is bigger the user will
get 413/502 in the browser
What should be my approach to just download A GENERATED RUNTIME file (not pre-built in s3) using AWS?
If you want to keep it simple (no additional queues or async processing) this is what I'd recommend to overcome the two limitations you describe:
Use the new AWS Lambda Endpoints. Since that option doesn't use the AWS API Gateway, you shouldn't be restricted to the 29-sec timeout (not 100% sure about this).
Write the file to S3, then get a temporary presigned URL to the file and return a redirect (HTTP 302) to the client. This way you won't be restricted to the 6MB response size.
Here are the possible options for you.
Use Javascript skills to rescue. Accept the request from browser/client and immediately respond from server that your file preparation is in progress. Meanwhile continue preparing the file in the background (sperate job). Using java script, keep polling the status of file using separate request. Once the file is ready return it back.
Smarter front-end clients use web-sockets to solve such problems.
In case DB query is the culprit, cache the data on server side, if possible, for you.
When your script takes more than 30s to run on your server then you implement queues, you can get help from this tutorial on how to implement queues using SQS or any other service.
https://mikecroft.io/2018/04/09/use-aws-lambda-to-send-to-sqs.html
Once you implement queues your timeout issue will be solved because now you are fetching your big data records in the background thread on your server.
Once the excel file is ready in the background then you have to save it in your s3 bucket or hard disk on your server and create a downloadable link for your user.
Once the download link is created you will send that to your user via email. In this case, you should have your user email.
So the summary is Apply queue -> send a mail with the downloadable file.
Instead of some sophisticated solution (though that would be interesting).
Inventory. You will split the Excel in portions of say 10 k rows. Calculate the number of docs.
For every Excel generation called you have a reduced work load.
Whether e-mail, page with links, using a queue you decide.
The advantage is staying below e-mail limits, response time-outs, denial of service.
(In Excel one could also create a master document, but I have no experience.)
I have a central server, to which many distributed servers need to transmit data in the form of somewhat large files, 500MB - 10GB+. The servers are not on the same physical network and can't be connected to one another via a VPN. While we're trying to get other ports opened, currently we can only talk over 443, HTTPS, which is great for our REST services but terrible for file transfer between servers.
I know this isn't as specific a question as one would like for Stackoverflow, but I would like to know: what methods might work better than the ones I've tried?
Server A -> generate file -> transfer over https -> DMZ -> proxy pass -> receive at Server B
Both servers use Java 1.8, Tomcat, and Spring 4.1.4.RELEASE. The DMZ is just Apache and pretty much out of our control.
Things I've tried...
Make RPC calls to a service using Spring's HttpInvokerProxyFactoryBean (this works fine for smaller sites, but the larger sites often drop connections while transferring data)
Multipart form post using Apache HttpPost (this also works, but we have to configure file limits in apache/tomcat, plus its connection is unreliable as well)
Using a library called RMIIO, which basically simulates RMI over HTTP if configured properly. This seemed promising as it requests a stream from the server and writes to the stream from the remote server. I haven't really gotten this to work over HTTPS yet, and the library was written in 2007 (with some updates up through July 2016), but it feels very dated, not highly maintained and I suspect there are better ways to do this sort of thing now-a-days (not that I can find them)
Looked at gRPC but realized it's just a binary protocol and I'd have to basically handle chunking the files if I wanted to get a streaming effect.
Read an article about Developing non-blocking REST services with Spring MVC, http://callistaenterprise.se/blogg/teknik/2014/04/22/c10k-developing-non-blocking-rest-services-with-spring-mvc/ again looked interesting if we were receiving a lot of files at the same time, but I don't see how it helps with a single file transfer.
I've looked at a lot of other things and tried a few more, but it all seems wrong. When I read about big data and Spark streams or any of the million streaming options that I see, I feel like there should be something similar for transferring a single file from one server to another without a broker in the middle. Maybe there are, just not over HTTPS.
It would be nice to know the progress of the transfer (on both ends) and be able to recover should there be connectivity issues or transfer errors.
But any direction or thoughts would be immensely helpful. Thanks for your time and input.
I am working on a Java based web application.This application consumes REST API for its data via a custom made rest client which is a sub component of the application.Now I am stuck in a situation where I have to write code for downloading a large file.The rest client calls the api , obtains data stream from the Response and writes the file at a particular location at the server.
Should I read the file and write it to the output stream? Or is there any more efficient way to achieve this functionality?
Thanks in advance
Finally I am using the same approach of reading the large file and writing it to the output stream.The rest client sub component reads the file data from the rest api and writes it at a temporary location on the server.
After that I read that file and write it to the output stream.
I'm developing Java-applet down/uploading) large files from server. Server app can be written in any language, but now I'm interested in most popular ones - Java, C# and PHP.
Server applications will use my applet and must provide download/upload API for it. My goal is to specify requirements. As my files can be very large I want to transfer them in chunks. I invented "bicycle" - own protocol for chunking and now I'm wondering is there built-in in language/web-server tools for replacing my bicycle?
For clearence, short description of my bicycle -
Applet calls API method initDownload(fileName,chunkSize) and server sets up this property somewhere and returns to applet total number of chunks.
Applet calls API method getFilePart(fileName,chunkNumber) for each chunk. In the case of fail it may retry to get this chunk. Server's implementation is simply RandomAccessFile with seeking position to desired and returning appropriate bytes.
Applet calls API method setFilePart(fileName,chunkNumber) for each chunk.
Applet works via HTTPURLConnection.
I read about HTTPURLConnection's setRequestHeader("Transfer-Encoding","chunked");
and setChunkedStreamingMode(chunkSize) but didn't understand how configure consumer side to accept chunking mode? Where chunk size is setted up? Or chunk size is not necessary due to protocol? I mean, with chunking mode server can't simply open InputStream and read, AFAIK, he should turn on HTTP/1.1 or call some setter in RequestMapping-side, am I right?
So, is my hypotesis below correct? Hypothesis:
Upload to server - Applet side can simply include chunking with 2 rows of code - header for turning on chunking mode and setter for specifying chunk size. Server side can be simple open-InputStream-read-all code with simple configuration to HTTP/1.1. Is any web-server provides such configuration? I'm interested in famous servers like XAMPP for PHP; IIS for C#; Glassfish, Tomcat, JBOSS for JAVA and others which I don't know or forgot to mention.
Download to applet side - now, server should somehow turn on chunking in response. Does PHP, C# have methods like setChunkedStreamingMode? How turn on chunking in Java-HTTPURLConnection for receiving chunks? As I understand, setChunkedStreamingMode works only for sending data or this is incorrect?
In other words, I want just to open InputStream/OutputStream and rely on TCP like I rely on it for making "handshakes" - is it possible?
I am using the java aws sdk to transfer large files to s3. Currently I am using the upload method of the TransferManager class to enable multi-part uploads. I am looking for a way to throttle the rate at which these files are transferred to ensure I don't disrupt other services running on this CentOS server. Is there something I am missing in the API, or some other way to achieve this?
Without support in the API for this, one approach is to wrap the s3 command with trickle.