I have a Requirement to send Data from Server (Tomcat : Java Process, Odata API's) to Client (React Based)
Data can range from few KB's to Hundred's of MB (Say 700 MB) which is retrieved from DB : RedShift , Processed and Sent to Client.
There can be multiple clients accessing at the same time as well to keep more stress on the system.
We added Pagination so that data for that page alone is loaded, but we have a functionality to export complete data set in CSV format.
Processing of all the data is consuming lot of memory and application's heap gets exhausted sometime, Increasing heap is not the solution expected, I want to know from Application side anything can be done to Optimize system resources.
Kindly suggest what could be the best way to transfer data also whould like to see if there are any other kind of API(Streaming) which can help me here
Can you change the integration between client and your system?
Something like: the client sends the request to export a CSV with a callback url in payload.
You put this request in a queue (rabbitmq). The queue consumer process the request, generate the CSV and put it in a temporary area (S3 or behind a NGINX). Then your consumer notifies the client in the callback url with then new url for the client to download de full CSV.
This way, the system that process the incoming requests don't use too much heap. You only need to scale the queue consumer, but it's more easy because the concurrency is your configuration of how many consumers are consuming the messages, not the incoming requests from the clients.
Related
I have the most basic problem ever. The user wants to export some data which is around 20-70k records and can take from 20-40 seconds to execute and the file can be around 5-15MB.
Currently my code is as such:
User clicks a button which makes an API call to a Java Lambda
AWS Lambda Handler calls a method to get the data from DB and generate excel file using Apache POI
Set Response Headers and send the file as XLSX in the response body
I am now faced with two bottlenecks:
API Gateway times out after 29 seconds; if file takes longer to
generate it will not work and user get 504 in the browser
Response from lambda can only be 6MB, if file is bigger the user will
get 413/502 in the browser
What should be my approach to just download A GENERATED RUNTIME file (not pre-built in s3) using AWS?
If you want to keep it simple (no additional queues or async processing) this is what I'd recommend to overcome the two limitations you describe:
Use the new AWS Lambda Endpoints. Since that option doesn't use the AWS API Gateway, you shouldn't be restricted to the 29-sec timeout (not 100% sure about this).
Write the file to S3, then get a temporary presigned URL to the file and return a redirect (HTTP 302) to the client. This way you won't be restricted to the 6MB response size.
Here are the possible options for you.
Use Javascript skills to rescue. Accept the request from browser/client and immediately respond from server that your file preparation is in progress. Meanwhile continue preparing the file in the background (sperate job). Using java script, keep polling the status of file using separate request. Once the file is ready return it back.
Smarter front-end clients use web-sockets to solve such problems.
In case DB query is the culprit, cache the data on server side, if possible, for you.
When your script takes more than 30s to run on your server then you implement queues, you can get help from this tutorial on how to implement queues using SQS or any other service.
https://mikecroft.io/2018/04/09/use-aws-lambda-to-send-to-sqs.html
Once you implement queues your timeout issue will be solved because now you are fetching your big data records in the background thread on your server.
Once the excel file is ready in the background then you have to save it in your s3 bucket or hard disk on your server and create a downloadable link for your user.
Once the download link is created you will send that to your user via email. In this case, you should have your user email.
So the summary is Apply queue -> send a mail with the downloadable file.
Instead of some sophisticated solution (though that would be interesting).
Inventory. You will split the Excel in portions of say 10 k rows. Calculate the number of docs.
For every Excel generation called you have a reduced work load.
Whether e-mail, page with links, using a queue you decide.
The advantage is staying below e-mail limits, response time-outs, denial of service.
(In Excel one could also create a master document, but I have no experience.)
I have a huge text file which is continuously getting appended from a common place, which I need to read line by line from my java application and update in a SQL RDBMS such that if java application crashes, it should start from where it left and not from the beginning.
its a plain text file. Each row will contains:
<Datatimestamp> <service name> <paymentType> <success/failure> <session ID>
Also the data which is retrieved from database should also be real time without any performance, availability or availability issues in web application
Here is my approach:
Deploy application in two systems boxes with each contains heartbeat which pings the other system for service availability.
When you get a success response to heart beat,you also get the time stamp which is last successfully read.
When the next heartbeat response fails, application in another system can take over, based on:
1. failed response
2. Last successful time stamp.
Also, since the need for data retrieval is very real time and data is huge, can I crawl the database put that into Solr or Elastic search for faster retrieval, instead of making the database calls ?
There are various ways to do it, what is the best way.
I would put a messaging system in between the text file and the DB writing applications. (for example RabbitMQ) in this case, the messaging system functions as a queue. one application constantly reads the file and inserts the rows as messages to the broker. on the other side, multiple "DB writing applications" can read from the queue and write to DB.
the advantage of the messaging system is its support for multiple clients reading from the queue. the messaging system takes care of synchronizing between the clients, dealing with errors, dead letters, etc. the clients don't care about what payload was processed by other instances.
regarding maintaining multiple instances of "DB writing applications": I would go for ready made cluster solutions. perhaps docker cluster managed by kubernates?
another viable alternative is a streaming platform, like Apache Kafka.
You can use a software like FileBeat to read the file and direct the filebeat output to RabbitMQ or Kafka. From there a Java program can subscribe / consume the data and put it into a RDBMS system.
I have a requirement to send the huge data through the websockets. My actual requirement is, the client communicates with my server for huge test data. My server will send the data assume the data size is 1GB. Its very hard to send this 1GB of data in a single response. So I choose the websockets. I am very new to this topic. I read about the websocets in multiple blogs and everyone gave a chat application example. But in my case client will ask once and my server needs to send the continues data. Is it possible to send the continues test data to the client with the websocets.? Can any one help me on this and if possible can you please provide an example.?
Note: I am using JAVA.
Thanks & Regards,
Amar.T
You already have the chat example. Use it. Try it on single computer. Probably you will need to create own protocol for sending/receiving data. Try to send by limited sizes blocks (for example 10 kb). So you will have 2 applications: client and server used websockets. I think that the main problem here is - What to do if connection was lost?
An application has a JMS queue responsible for delivering audit logs. The application send logs to a JMS queue and this queue is consumed by a MDB.
However the messages sent are big XML files that vary from 20 MB to 100 MB. The problem is that the JMS queue take too long to consume the messages, leading to an OutOfMemory error.
What should I do to solve this problem?
This answer may of may not help jguilhermemv, just want to share an idea for those who can read this post, a work around for big messages.
First thing is try not to send to big messages, Now we have two options (But these require implementation changes, and can be done in starting or if system implementation changes are allowed in later stage):
Try to save the log in DB and send just log-ids in JMS msgs. (Saving logs in DB is not recommended as size and time to save will again be a problem in later stage.)
Save logs in form of files (Save them at a common location) and file names in DB and share those file name IDs via JMS. Consumer can then after consuming can read that log file.
I have several PC's on each of them I set small swing application that get data with JSON request to one web server. Can I receive the data from web server without to send request to the web server, with other words can the Web server send the data without the Java application to ask for this?
If you have enough server resources
you can consider usage of websockets.
Every PC can open a socket to the server.
When you open the socket you need to send to the server, the pc's unique ID.
Then you need to store this ID in some database or file that will contain all online pc's and sockets .
Then the Server will be aware which pc's are online and which socket to use to communicate with this pc. After this you can send whatever information you need to this PC depending on your application.
This can be implemented in several ways. One common way would be to open a connection and do blocking read in the client application. On receiving something it will look like push from the server. Then you process the push and do another blocking read.
Another option would be doing regular checks if there is something for you on the web server. You set the retry interval frequent enough so it will look like real time push from your app point of view.
If you use HTTP i think the smartest way is to drop the realtime requirement and use a thread that polls the server every 5 seconds. Keeping a HTTP Connection open all time is expensive as it blocks a request processor thread and limits the amount of clients you can have.
You might also consider moving to something like a registration mechanism if you really need near-realtime updates which is often not the case. You would have to open a Server on the clients and have the server push the updates after clients registered their Address with the server.