I want to build a reports based application, that retrieve very large amount of data from oracle DB and display it to user, so my solution was to put a java based web service that returns a large amount of data. Is there a standard way to stream a response rather than trying to return a huge chunk of data at once?
You can consider PAGING mechanism. Just display the required set of rows at once, then on request move to next set of rows.
From database end, you can do LIMIT and FETCH certain number of rows at a time.
If you are on 12c, the LIMIT TOP-n functionality is readily available.
Related
I have a requirement to read a large data set from a postgres database which needs to be accessible via a rest api endpoint. The client consuming the data will then need to transform the data into csv format(might need to support json and xml later on).
On the server side we are using Spring Boot v2.1.6.RELEASE and spring-jdbc v5.1.8.RELEASE.
I tried using paging and loop through all the pages and store the result into a list and return the list but resulted in OutOfMemory error as the data set does not fit into memory.
Streaming the large data set looks like a good way to handle memory limits.
Is there any way that I can just return a Stream of all the database entities and also have the rest api return the same to the client? How would the client deserialize this stream?
Are there any other alternatives other than this?
If your data is so huge that it doesn't fit into memory - I'm thinking gigabytes or more - then it's probably too big to reasonably provide as single HTTP response. You will hold the connection open for a very long time. If you have a problem mid-way through, the client will need to start all over at the beginning, potentially minutes ago.
A more user-friendly API would introduce pagination. Your caller could specify a page size and the index of the page to fetch as part of their request
For example
/my-api/some-collection?size=100&page=50
This would represent fetching 100 items, starting from the 5000th (5000 - 5100)
Perhaps you could place some reasonable constraints on the page size based on what you are able to load into memory at one time.
How does fetchLazy work in jooq?
Is it equivalent to doing paginated select with limit and offset?
They're different.
fetchLazy()
... returns a Cursor type, which is jOOQ's equivalent of the JDBC ResultSet type. The query will fully materialise in the database, but jOOQ (JDBC) will fetch rows one-by-one. This is useful
when large result sets need to be fetched without waiting for the data transfer between server and client to finish - as opposed to a simple fetch(), which loads all rows from the server in one go.
when the client doesn't know in advance how many rows they really want to fetch from the server.
LIMIT .. OFFSET
... will reduce the number of returned rows already in the database, without them ever surfacing in the client. This can heavily improve execution speed in the server, as the server
May choose a different execution plan - e.g. using nested loops instead of hash joins for low values of LIMIT
Doesn't need to keep an open cursor for a long data transfer time, as only few rows are transferred over the wire.
I have a Mongodb database that contains a Poll Collection.
The Poll collection has a number of Poll documents. This could be a large number of documents.
I am using Java Servlet for serving HTTP requests.
How can I implement a feed kind of retrieval mechanism at the server side?
For e.g., In the first request, I want to retrieve 1 to 10, documents, then 11 to 20 and so on...
As there is a scroll in the view, i want to get the data from server and send to client.
Does Mongodb provide a way to do this?
I think what you are looking for is a pagination. You could use the limit and skip methods with your find query.
First request
db.Poll.find().skip(0).limit(10)
Second request
db.Poll.find().skip(10).limit(10)
...
...
Note: You should also be sorting your find with some field.
db.Poll.find().skip(10).limit(10).sort({_id:-1})
For more info on the cursor methods you could look here: http://docs.mongodb.org/manual/reference/method/js-cursor/
I'm new to open source stacks and have been playing with hibernate/jpa/jdbc and memcache. I have a large data set per jdbc query and possibly will have a number these large data sets where I eventually bind to a chart.
However, I'm very focused on performance instead of hitting the database per page load to display it on my web page chart.
Are there some examples of how (memcache, redis, local or distributed) and where to cache this data (jSON or raw result data) to load in memory? Also I need to figure out how to refresh the cache unless it's a time based eviction marking algorithm (i.e. 30min expires so grab new data from data base query instead of using cache or perhaps an automated feed of data into the cache every xhrs/min/etc).?
Thanks!
This is typical problem and solution not straight forward. There are many factor which determine your design. Here is what we did sometime ago
Since our queries to extract data were a bit complex (took around a min to execute) and large dataset, we populated the memcache from a batch which used to pull data from database every 1 hour and push it to the memcached. By keeping the expiry cache larger than the batch interval, we made sure that where will always be data in cache.
There was another used case for dynamic caching, wherein on receiving the request for data, we checked first the memcached and if data not found, query the database, fetch the data, push it to memcached and return the results. But I would advise for this approach only when your database queries are simple and fast enough not to cause the poor overall response.
You can also used Hibernat's second level cache. It depends on your database schema, queries etc. to use this feature efficiently.
Hibernate has built-in support for 2nd level caching. Take a look at EhCache for example.
Also see: http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/performance.html#performance-cache
I have 3000 records in an employee table which I have fetched from my database with a single query. I can show 20 records per page. So there will be 150 pages for with each page showing 20 records. I have two questions on pagination and sortable column approach:
1) If I implement a simple pagination without sortable columns, should I send all 3000 records to client and do the pagination client side using javascript or jquery. So if user clicks second page, call will not go to server side and it will be faster. Though I am not sure what will be the impact of sending 3000 or more records on browser/client side? So what is the best approach either sending all the records to client in single go and do the sorting there or on click of page send the call to server side and then just return that specific page results?
2) In this scenario, I need to provide the pagination along with sortable columns (6 columns). So here user can click any column like employee name or department name, then names should be arranged in ascending or descending order. Again I want to know the best approach in terms of time response/memory?
Sending data to your client is almost certainly going to your bottleneck (especially for mobile clients), so you should always strive to send as little data as possible. With that said, it is almost definitely better to do your pagination on the server side. This is a much more scalable solution. It is likely that the amount of data will grow, so it's a safer bet for the future to just do the pagination on the server.
Also, remember that it is fairly unlikely that any user will actually bother looking through hundreds of result pages, so transferring all the data is likely wasteful as well. This may be a relevant read for you.
I assume you have a bean class representing records in this table, with instances loaded from whatever ORM you have in place.
If you haven't already, you should implement caching of these beans in your application. This can be done locally, perhaps using Guava's CacheBuilder, or remotely using calls to Memcached for example (the latter would be necessary for multiple app servers/load balancing). The cache for these beans should be keyed on a unique id, most likely the mapping to the primary key column of the corresponding table.
Getting to the pagination: simply write your queries to return only IDs of the selected records. Include LIMIT and OFFSET or your DB language's equivalent to paginate. The caller of the query can also filter/sort at will using WHERE, ORDER BY etc.
Back in the Java layer, iterate through the resulting IDs of these queries and build your List of beans by calling the cache. If the cache misses, it will call your ORM to individually query and load that bean. Once the List is built, it can be processed/serialized and sent to the UI.
I know this doesn't directly answer the client vs server side pagination, but I would recommend using DataTables.net to both display and paginate your data. It provides a very nice display, allows for sorting and pagination, built in search function, and a lot more. The first time I used it was for the first web project I worked on, and I, as a complete noobie, was able to get it to work. The forums also provide very good information/help, and the creator will answer your questions.
DataTables can be used both client-side and server-side, and can support thousands of rows.
As for speed, I only had a few hundred rows, but used the client-side processing and never noticed a delay.
USE SERVER PAGINATION!
Sure, you could probably get away with sending down a JSON array of 3000 elements and using JavaScript to page/sort on the client. But a good web programmer should know how to page and sort records on the server. (They should really know a couple ways). So, think of it as good practice :)
If you want a slick user interface, consider using a JavaScript grid component that uses AJAX to fetch data. Typically, these components pass back the following parameters (or some variant of them):
Start Record Index
Number of Records to Return
Sort Column
Sort Direction
Columns to Fetch (sometimes)
It is up to the developer to implement a handler or interface that returns a result set based on these input parameters.