How to send large json data to the spring controller - java

How can i send large volume of json data to a spring controller. Say, i have large json data of about 100k or 1000k records which i need to send to my rest controller in spring or springboot then what is the best/most efficient approach to solve the problem.
I know that the data can be sent using request body but i think sending such a large volume of data in the request body of a REST api is not efficient. I may be wrong here, please correct me if i am.
And the data needs to be stored in the database as quickly as possible. So, i need a quick and reliable approach to the problem.

There are two parts to your problem.
1. How to receive such a huge volume: If there is a huge volume of data being received, its generally a good idea to save(from the input stream of the response) it locally as a file and process that data asynchronously.(Make sure you set an appropriately high read timeout, else the data stream might be interrupted ) .
2. How do you process such a huge file: With big files, memory footprint needs to be minimal. For XML , SaxParsers are a golden standard . I found this library which is very similar to sax parsing, but for Json
http://rapidjson.org/md_doc_sax.html

You can use reactive approach and streaming the data.
With Spring, use MediaType.APPLICATION_STREAM_JSON_VALUE producer and Flux as return type.
On the client side, subscribe to the stream and process the data, or you can use Spring Batch to save the data to the Database.

Related

Spark: sparkSession read from the result of an http response

Small question regarding Spark and how to read from the result of a http response please.
It is well known Spark can take as datasource some database, or CSV, etc...
sparkSession.read().format("csv").load("path/to/people.csv");
sparkSession.read().format("org.apache.spark.sql.cassandra").options(properties).load()
May I ask how to read from the result of an http call directly please?
Without having to dump the data back inside another intermediate csv / intermediate database table.
For instance the csv and database would contains millions of rows, and once read, the job needs to perform some kind of map reduce operation.
Now, the exact same data comes from the result of an http call. It is small enough for the network layer, but the information contained inside the payload is big, so I would like to apply the same map reduce.
How to read from the response of an http call please?
Thank you
You have two options for reading data in Spark:
Read directly to the driver and distribute to the executors (not scalable as everything passes through driver)
Read directly from the executors
The built in data sources like csv, parquet etc all implement reading from the executors so the job can scale with the data. They define how each partition of the data should be read - e.g. if we have 10 executors, how do you cut up the data source into 10 sections so each executor can directly read one section.
If you want to load from a HTTP request you will either have to read through the driver and distribute, which may be OK if you know the data is going to be less than ~10mb. Otherwise you would need to implement a custom data source to allow the executors to each read partition, can read here for more: https://aamargajbhiye.medium.com/speed-up-apache-spark-job-execution-using-a-custom-data-source-fd791a0fa4b0
Will finish by saying that this second option is almost definitely an anti-pattern. You will likely be much better off providing an intermediate staging environment (e.g. S3/GCS), calling the server to load the data to the intermediate store and then reading to Spark on completion. In scenario 2, you will likely end up putting too much load on the server, amongst other issues.
In previous lifetimes, I created a custom datasource. It is not the most trivial thing to do, but this GitHub repo explains it: https://github.com/jgperrin/net.jgp.books.spark.ch09.
When it comes to reading from a network stream, make sure that only one executor does it.

Read large data from database using JdbcTemplate and expose via api?

I have a requirement to read a large data set from a postgres database which needs to be accessible via a rest api endpoint. The client consuming the data will then need to transform the data into csv format(might need to support json and xml later on).
On the server side we are using Spring Boot v2.1.6.RELEASE and spring-jdbc v5.1.8.RELEASE.
I tried using paging and loop through all the pages and store the result into a list and return the list but resulted in OutOfMemory error as the data set does not fit into memory.
Streaming the large data set looks like a good way to handle memory limits.
Is there any way that I can just return a Stream of all the database entities and also have the rest api return the same to the client? How would the client deserialize this stream?
Are there any other alternatives other than this?
If your data is so huge that it doesn't fit into memory - I'm thinking gigabytes or more - then it's probably too big to reasonably provide as single HTTP response. You will hold the connection open for a very long time. If you have a problem mid-way through, the client will need to start all over at the beginning, potentially minutes ago.
A more user-friendly API would introduce pagination. Your caller could specify a page size and the index of the page to fetch as part of their request
For example
/my-api/some-collection?size=100&page=50
This would represent fetching 100 items, starting from the 5000th (5000 - 5100)
Perhaps you could place some reasonable constraints on the page size based on what you are able to load into memory at one time.

How to redesign use of Elastic search Scroll API, because of memory limits

I have a restriction on memory that my application uses.
I'm generating files consisting of data returned from Elastic Search.
As I need to get all the data that stored in ES I'm using Scroll API to get data and that put it to some Collection<Foo>. For now ES contains about several thousands records of Foo that is returned.
So it is more like imperative approach, because i'm loading all data like:
List<Foo> allFoos = FooSource.loadAllFoo();
And then do some processing, after which I'm saving results to different files.
Then this files are accessible from REST endpoint.
So what I'm looking for is some advice how to limit memory usage that this approach rises.
I was thinking of instead of putting all the data in memory and then operate on it do some sequential processing as the data comes available.
Do I need some Java Rx or Spring Batch technology in this case ? It helped me if you produce some code samples or even pseudo code will be alright.

Java - Batch Processing

I'm trying to generate a CSV file based on a list of objects returned by a web service method.
The problem is that I want to retrieve all of the objects available, but the call will 'fail' if I try to get more than 100 entries (the method has 2 parameters which give me the possibility to specify the interval of objects I want to retrieve, ex: from 10 to 50, from 45 to 120, etc.).
I thought of making sequential calls while incrementing the two indexes which represent the interval, but someone suggested that I should use batch processing for this. As far as I searched the internet I only found examples on how to export database data or xml files into csv, using Spring Batch.
Could someone explain me how should I handle this situation? Or at least point me to an example/tutorial similar to what I need? Thank you very much!!
If you try to load all data from a single request through a webservice , you are exposed to get a memory or timeout exception because data too much large in response, maybe you should try make some calls to your webservice, something like a paginated request, after each response you can insert response in your local database.
When all calls are over, call a process and build your csv file.
regards.

Android JSON limiting rows

Currently in my application I get a JSON object with an JSON Array in it. Is it possible to limit the rows (request header?) before data is returned from the server to client ?
Thanks,
David
There is no general way to limit the amount of data returned in an HTTP request.
If your API specifies a way to do it (such as a request header) then use that. If not, there is no way to prevent the server from sending more data than you want.

Categories