I am trying to implement pagination using nextPageToken.
I have table:
CREATE TABLE IF NOT EXISTS categories
(
id BIGINT PRIMARY KEY,
name VARCHAR(30) NOT NULL,
parent_id BIGINT REFERENCES categories (id)
);
so I have entity Category:
#Id
#GeneratedValue(strategy = GenerationType.SEQUENCE)
private Long id;
#Column(name = "name")
private String name;
#OneToOne
#JoinColumn(name = "parent_id", referencedColumnName = "id")
private Category category;
I dont really understand what I have to do next.
client requests token(that keeps what?)
Assume I have controller:
#GetMapping
public ResponseEntity<CategoriesTokenResponse> getCategories(
#RequestParam String nextPageToken
) {
return ResponseEntity.ok(categoryService.getCategories(nextPageToken));
}
Service:
public CategoriesTokenResponse getCategories(String nextPageToken) {
return new CategoriesTokenResponse(categoryDtoList, "myToken");
}
#Data
#AllArgsConstructor
public class CategoriesTokenResponse {
private final List<CategoryDto> categories;
private final String token;
}
How I have to implement sql query for that? And how I have to generate nextPagetoken for each id?
SELECT * FROM categories WHERE parent_id = what?
AND max(category id from previous page = token?)
ORDER BY id LIMIT 20;
First, you need to understand what you are working with here. Every implementation has some sort of limitation or inefficiency. For example, using page tokens like that is only good for infinite scroll pages. You can't jump to any specific page. So if my browser crashes and I'm on page 100, I have to scroll through 100 pages AGAIN. It is faster for massive data sets for sure, but does that matter if you need access to all pages? Or if you limit the return to begin with? Such as only getting the first 30 pages?
Basically decide this: Do you only care about the first few pages because search/sort is always in use? (like a user never using more than the first 1-5 pages of google) and is that data set large? Then great use-case. Will a user select "all items in the last 6 months" and actually need all of them or is the sort/search weak? Or, will you return all pages and not limit max return of 30 pages? Or, is development speed more important than a 0.1-3 second (depends on data size) speed increase? Then go with built in JPA Page objects.
I have used Page objects on 700k records with less than a second speed change compared to 70k records. Based on that, I don't see removing offset adding a ton of value unless you plan for a huge data set. I just tested a new system I'm making with pageable, it returned 10 items on page 1 in 84 milliseconds with no page limiter for 27k records on a vpn into my work network from my house. A table with over 500k records took 131 milliseconds That's pretty fast. Want to make it faster? Force a total max return of 30 pages and a max of 100 results per page, because normally, they don't need all data in that table. They want something else? refine the search. The speed difference is less than a second between this and the seek/key stype paging. This is assuming a normal SQL database too. NoSQL is a bit different here. Baeldung has a ton of articles on jpa paging like the following: https://www.baeldung.com/rest-api-pagination-in-spring
JPA Paging should take no more than 30 minutes to learn and implement, it's extremely easy and comes stock on JPA repositories. I strongly suggest using that over the seek/key style paging as you likely aren't building a system like google's or facebook's.
If you absolutely want to go with the seek/key style paging there's a good informational page here:
https://blog.jooq.org/2013/10/26/faster-sql-paging-with-jooq-using-the-seek-method/
In general, what you are looking for is using JOOQ with spring. Implementation example here:
https://docs.spring.io/spring-boot/docs/1.3.5.RELEASE/reference/html/boot-features-jooq.html
Basically, create a DSL context:
private final DSLContext DSL;
#Autowired
public JooqExample(DSLContext dslContext) {
this.DSL= dslContext;
}
Then use it like so:
DSL.using(configuration)
.select(PLAYERS.PLAYER_ID,
PLAYERS.FIRST_NAME,
PLAYERS.LAST_NAME,
PLAYERS.SCORE)
.from(PLAYERS)
.where(PLAYERS.GAME_ID.eq(42))
.orderBy(PLAYERS.SCORE.desc(),
PLAYERS.PLAYER_ID.asc())
.seek(949, 15) // (!)
.limit(10)
.fetch();
Instead of explicitly phrasing the seek predicate, just pass the last record from the previous query, and jOOQ will see that all records before and including this record are skipped, given the ORDER BY clause.
Related
I have a performance issue when I'm loading entities that I'm looking at how to speed up. So I have the following classes:
#MappedSuperClass
public abstract class MySuperList {
...
#OneToMany(... fetch = fetchType.LAZY)
private Set<Attributes> myAttributes;
#OneToMany(... fetch = fetchType.LAZY)
private Set<Tag> myTags;
...
}
public class MyList extends MySuperList{
...
#OneToMany(... fetch = fetchType.LAZY)
private Set<MetadataItems> myMetadataItems;
...
}
When I load several of these items using a similar code below:
CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<MyList> cq = cb.createQuery(MyList.class);
Root<MyList> myList = cq.from(MyList.class);
cq.select(myList).distinct(true).where(buildPredicate()).orderBy(buildOrder());
List<MyList> result = cb.createQuery(cq).setFirstResult(1).setMaxResult(50).setHint(QueryHints.PASS_DISTINCT_THROUGH, false).getResultList();
Then I see 151 calls to the database:
One to get the MyList records - 50 records are returned.
50 calls to load the Attributes for the 50 MyList records returned.
50 calls to load the Tags for the 50 MyList records returned.
50 calls to load the MetadataItems for the 50 MyList records returned.
This seems to be a normal n+1 issue. It is quite slow so I decide to eagerly load using the fetch operation prior to the get result list:
myList.fetch("myAttributes", JoinType.LEFT)
myList.fetch("myTags", JoinType.LEFT)
myList.fetch("myMetadataItems", JoinType.LEFT)
This then does a Cartesian product and returns everything in the SQL call and loads all entities and the child entities. This is slightly faster to the n+1 calls but is still slow.
The SQL itself, when run manually, is fast.
So I think what if I load each of the children manually (I've tried both JPA query and native query) i.e. I don't fetch the children but I build 3 separate "criteria builders" that load the Attribute, Tag, MyMetadataItem using the MyList ids returned using the IN operator. So something like this:
SELECT * FROM Attribute WHERE MyListId IN ( <the 50 MyListIds> )
SELECT * FROM Tag WHERE MyListId IN ( <the 50 MyListIds> )
SELECT * FROM MyMetadataItem WHERE MyListId IN ( <the 50 MyListIds> )
So I find the initial load of MyList quite speedy and so is the loading of the Attribute, Tag and MyMetadataItem...
but unfortunately when I access a MyList record then once again it lazy loads the Attribute, Tag and MyMetadataItem. I was hoping that if I load the children entities manually that it would be in the Hibernate context and would magically link the MyList record to the children records (and vice versa).
I've even tried to manually "link" it. I.e. I loop through the MyList and call setMyAttribute() with the Attribute records returned that are specific to the MyList record but unfortunately still the same issue.
So I guess my questions is:
When we have a n+1 issue then I guess the best thing to do is eagerly load the children?
If I load children items manually then is there a way to link it to the parent somehow?
Is the above approach even right! :) or is there a better way to speed things up?
Java 1.8
Hibernate 5.4.15
Oracle database. ojdbc 18.3
Any words of wisdom would be greatly appreciated.
Thanks.
Thanks to Chris's comment I found the #BatchSize annotation in Hibernate. This batches the fetching of the child entities and has helped to speed up the calls.
!Try Constructing the JPA Query and return it as a List<MAP<String,Object>> and return directly to client. This saves lot of the time.
I have written an application to scrape a huge set of reviews. For each review i store the review itself Review_Table(User_Id, Trail_Id, Rating), the Username (Id, Username, UserLink) and the Trail which is build previously in the code (Id, ...60 other attributes)
for(Element card: reviewCards){
String userName = card.select("expression").text();
String userLink = card.select("expression").attr("href");
String userRatingString = card.select("expression").attr("aria-label");
Double userRating;
if(userRatingString.equals("NaN Stars")){
userRating = 0.0;
}else {
userRating = Double.parseDouble(userRatingString.replaceAll("[^0-9.]", ""));
}
User u;
Rating r;
//probably this is the bottleneck
if(userService.getByUserLink(userLink)!=null){
u = new User(userName, userLink, new HashSet<Rating>());
r = Rating.builder()
.user(u)
.userRating(userRating)
.trail(t)
.build();
}else {
u = userService.getByUserLink(userLink);
r = Rating.builder()
.user(u)
.userRating(userRating)
.trail(t)
.build();
}
i = i +1;
ratingSet.add(r);
userSet.add(u);
}
saveToDb(userSet, t, link, ratingSet);
savedEntities = savedEntities + 1;
log.info(savedEntities + " Saved Entities");
}
The code works fine for small-medium sized dataset but i encounter a huge bottleneck for larger datasets. Let's suppose i have 13K user entities already stored in the PostgresDB and another batch of 8500 reviews comes to be scraped, i have to check for every review if the user of that review is already stored. This is taking forever
I tried to define and index on the UserLink attribute in Postgres but the speed didn't improve at all
I tried to take and collect all the users stored in the Db inside a set and use the contains method to check if a particular user already exists in the set (in this way I thought I could bypass the database bottleneck of 8k write and read but in a risky way because if the users inside the db table were too much i would have encountered a memory overflow). The speed, again, didn't improve
At this point I don't have any other idea to improve this
Well for one, you would certainly benefit from not querying for each user individually in a loop. What you can do is query & cache for only the UserLink or UserName meaning get & cache the complete set of only one of them because that's what you seem to need to differentiate in the if-else.
You can actually query for individual fields with Spring Data JPA #Query either directly or even with Spring Data JPA Projections to query subset of fields if needed and cache & use them for the lookup. If you think the users could run into millions or billions then you could think of using a distributed cache like Apache Ignite where your collection could scale easily.
Btw, the if-else seem to be inversed is it not?
Next you don't store each review individually which the above code appears to imply. You can write in batches. Also since you are using Postgres you can use Postgres CopyManager provided by Postgres for bulk data transfer by using it with Spring Data Custom repositories. So you can keep writing to a new text/csv file locally at a set schedule (every x minutes) and use this to write that batched text/csv to the table (after that x minutes) and remove the file. This would be really quick.
The other option is write a stored procedure that combines the above & invoke it again in a custom repository.
Please let me know which one you had like elaborated..
UPDATE (Jan 12 2022):
One other item i missed is when you querying for UserLink or UserName you can use a very efficient form of select query that Postgres supports instead of using an IN clause like below,
#Select("select u from user u where u.userLink = ANY('{:userLinks}'::varchar[])", nativeQuery = true)
List<Users> getUsersByLinks(#Param("userLinks") String[] userLinks);
I am using objectify v5.1.11 from auto scaled app engine instances in java8 runtime environment.
I have an API which IOT devices call periodically to upload statistics information. In this API, I insert an entity to datastore to store the statistics information. This entity uses the auto generated IDs of datastore. Entity definition is as follows:
#Entity(name = "Stats")
public class StatsEntity {
#Id
private Long statisticsId;
#Index
private Long deviceId;
#Index
private String statsKey;
#Index
private Date creationTime;
}
But I had a requirement of checking for duplicates before inserting the entity. I switched to custom generated (String) Ids. I came up with a mechanism of appending the deviceId to statsKey (unique for each statistic within device) string provided by the device to generate the ID.
This is to avoid the eventual consistency behaviour if I use query to check if the entity already exists. Since get by ID is strongly consistent I can use it to check for duplicates.
There is another API to fetch the statistics uploaded by a device. In this API, I list the entities by filtering on deviceId and order by creationTime in descending order (newest first) with a page size of 100. This request times out since the request exceeds the 60s limit of appengine. I see the following exception in the logs:
Task was cancelled.
java.util.concurrent.CancellationException: Task was cancelled.
at com.google.common.util.concurrent.AbstractFuture.cancellationExceptionWithCause(AbstractFuture.java:1355)
at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:555)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:436)
at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:99)
at com.google.appengine.tools.development.TimedFuture.get(TimedFuture.java:42)
at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:62)
at com.google.appengine.api.utils.FutureWrapper.get(FutureWrapper.java:93)
at com.google.appengine.api.datastore.FutureHelper.getInternal(FutureHelper.java:69)
at com.google.appengine.api.datastore.FutureHelper.quietGet(FutureHelper.java:33)
at com.google.appengine.api.datastore.BaseQueryResultsSource.loadMoreEntities(BaseQueryResultsSource.java:243)
at com.google.appengine.api.datastore.BaseQueryResultsSource.loadMoreEntities(BaseQueryResultsSource.java:180)
at com.google.appengine.api.datastore.QueryResultIteratorImpl.ensureLoaded(QueryResultIteratorImpl.java:173)
at com.google.appengine.api.datastore.QueryResultIteratorImpl.hasNext(QueryResultIteratorImpl.java:70)
at com.googlecode.objectify.impl.KeysOnlyIterator.hasNext(KeysOnlyIterator.java:29)
at com.google.common.collect.Iterators$5.hasNext(Iterators.java:580)
at com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:42)
at com.googlecode.objectify.impl.ChunkIterator.hasNext(ChunkIterator.java:39)
at com.google.common.collect.MultitransformedIterator.hasNext(MultitransformedIterator.java:50)
at com.google.common.collect.MultitransformedIterator.hasNext(MultitransformedIterator.java:50)
at com.google.common.collect.Iterators$PeekingImpl.hasNext(Iterators.java:1105)
at com.googlecode.objectify.impl.ChunkingIterator.hasNext(ChunkingIterator.java:51)
at com.ittiam.cvml.dao.repository.PerformanceStatsRepositoryImpl.list(PerformanceStatsRepositoryImpl.java:154)
at com.ittiam.cvml.service.PerformanceStatsServiceImpl.listPerformanceStats(PerformanceStatsServiceImpl.java:227)
The statsKey provided by the device is based on time and hence monotonically increasing (step increase of 15 mins) which is bad as per this link.
But my traffic is not large enough to warrant this behaviour. Each device makes 2 to 3 requests every 15 minutes and there are about 300 devices.
When I try to list entities for devices which haven't made any request since I made this switch to custom ID, I still observe this issue.
Edit
My code to list the entity is as follows:
Query<StatsEntity> query = ofy().load().type(StatsEntity.class);
List<StatsEntity> entityList =
new ArrayList<StatsEntity>();
query = query.filter("deviceId", deviceId);
query = query.order("-creationTime");
query = query.limit(100);
QueryResultIterator<StatsEntity> iterator = query.iterator();
while (iterator.hasNext()) {
entityList.add(iterator.next());
}
This error usually occurs because of the write contention. The logic behind this is simple if you're having multiple transactions such as writing and reading some stuff from the same entity group concurrently.
There are various approaches to solve this problem:
A query lives for only 30 secs but you can extend it by converting your API into a task queue. Usually handling such write contention issues you should always use a task queue which lasts for around 10 mins.
If possible, make your entity group smaller.
You can find more approaches here.
Hope this answers your question!!!
I am a newbie developer for web server side. I have develop an app for shop to manage their orders. Here comes a question.
I have an order table like:
orderId,orderNumber …
orderProduct table like
orderProductId, productId, productNumber, productName, productDescription.
I have a search function, that get all orders by searchString.
The api is like
Get /api/orders?productNumberSearch={searchStr}&productNameSearch={searchStr2}&productDescriptionSearch={searchStr3}
My Impl is like
String queryStr1 = getParameterFromRequestWithDefault(“productNumberSearch”,”");
String queryStr2 = getParameterFromRequestWithDefault(“productNameSearch”,”");
String queryStr3 = getParameterFromRequestWithDefault(“productDescriptionSearch”,”");
List<OrderProduct> orderProducts = getAllOrderProductsFromDatabase();
List<Interger> filterOrderIds = orderProducts.stream().filter(item->{
return item.getName().contains(queryStr1) && item.getNumber().contains(queryStr2) && item.getDescription().contains(queryStr3)
}).collect(Collectors.toList());
List<Order> orders = getOrdersByIds(filterOrderIds);
I use spring mvc and mysql. Codes above works. However, if there are many requests arriving at the same time, out of memory exception will be thrown. Case there are Chinese character in database , mysql full text search does not work well?
So is there other way to implement the search function without elasticsearch
Using Spring Data 1.4.2 and Sprint Security 3.1.4.RELEASE.
DAO:
public interface NewsDao extends
JpaRepository<News, Long>, JpaSpecificationExecutor<News>{}
I want to get 5 newest news that user has access to:
#Transactional(readOnly = true)
#PostFilter("hasPermission(filterObject, 'VIEW')")
public List<News> findNewestGlobalNews() {
Sort orderByDate = getSort();
NewsDao newsDao = getDao();
PageRequest newestOnly = new PageRequest(0, 5, orderByDate);
List<News> news = newsDao.findAll(newestOnly).getContent();
// because the list returned by Page is immutable and we do the filtering
// according to ACL, return a copy of the list
return new ArrayList<>(news);
}
This code works, but it suffers from obvious problem: we select 5 items from database and then filters out the ones that user has not access to. It causes that one user sees 3 news and another sees 4 although there are at least 5 news in the database that both users could possibly see.
I can think of selecting all items from database, then filtering them out and selecting the top 5, but I wonder if there is any more elegant way to do this.
The clean solution would be to directly query only the last 5 for a specific user. This obviously works only if you have this information in the database too.
If you have this access info only in the service layer you are left with either querying more if the list is smaller than 5 after the first query until you reach 5 in total.
Assuming that the query for news returns fast it will not mather that much to query 25 or X results instead so that the chance of not reaching the final 5 for a user is low enough and you live with the consequence of not reaching 5 in some cases :)