Display N items that user has access to - java

Using Spring Data 1.4.2 and Sprint Security 3.1.4.RELEASE.
DAO:
public interface NewsDao extends
JpaRepository<News, Long>, JpaSpecificationExecutor<News>{}
I want to get 5 newest news that user has access to:
#Transactional(readOnly = true)
#PostFilter("hasPermission(filterObject, 'VIEW')")
public List<News> findNewestGlobalNews() {
Sort orderByDate = getSort();
NewsDao newsDao = getDao();
PageRequest newestOnly = new PageRequest(0, 5, orderByDate);
List<News> news = newsDao.findAll(newestOnly).getContent();
// because the list returned by Page is immutable and we do the filtering
// according to ACL, return a copy of the list
return new ArrayList<>(news);
}
This code works, but it suffers from obvious problem: we select 5 items from database and then filters out the ones that user has not access to. It causes that one user sees 3 news and another sees 4 although there are at least 5 news in the database that both users could possibly see.
I can think of selecting all items from database, then filtering them out and selecting the top 5, but I wonder if there is any more elegant way to do this.

The clean solution would be to directly query only the last 5 for a specific user. This obviously works only if you have this information in the database too.
If you have this access info only in the service layer you are left with either querying more if the list is smaller than 5 after the first query until you reach 5 in total.
Assuming that the query for news returns fast it will not mather that much to query 25 or X results instead so that the chance of not reaching the final 5 for a user is low enough and you live with the consequence of not reaching 5 in some cases :)

Related

Spring Data JPA: Efficiently Query The Database for A Large Dataset

I have written an application to scrape a huge set of reviews. For each review i store the review itself Review_Table(User_Id, Trail_Id, Rating), the Username (Id, Username, UserLink) and the Trail which is build previously in the code (Id, ...60 other attributes)
for(Element card: reviewCards){
String userName = card.select("expression").text();
String userLink = card.select("expression").attr("href");
String userRatingString = card.select("expression").attr("aria-label");
Double userRating;
if(userRatingString.equals("NaN Stars")){
userRating = 0.0;
}else {
userRating = Double.parseDouble(userRatingString.replaceAll("[^0-9.]", ""));
}
User u;
Rating r;
//probably this is the bottleneck
if(userService.getByUserLink(userLink)!=null){
u = new User(userName, userLink, new HashSet<Rating>());
r = Rating.builder()
.user(u)
.userRating(userRating)
.trail(t)
.build();
}else {
u = userService.getByUserLink(userLink);
r = Rating.builder()
.user(u)
.userRating(userRating)
.trail(t)
.build();
}
i = i +1;
ratingSet.add(r);
userSet.add(u);
}
saveToDb(userSet, t, link, ratingSet);
savedEntities = savedEntities + 1;
log.info(savedEntities + " Saved Entities");
}
The code works fine for small-medium sized dataset but i encounter a huge bottleneck for larger datasets. Let's suppose i have 13K user entities already stored in the PostgresDB and another batch of 8500 reviews comes to be scraped, i have to check for every review if the user of that review is already stored. This is taking forever
I tried to define and index on the UserLink attribute in Postgres but the speed didn't improve at all
I tried to take and collect all the users stored in the Db inside a set and use the contains method to check if a particular user already exists in the set (in this way I thought I could bypass the database bottleneck of 8k write and read but in a risky way because if the users inside the db table were too much i would have encountered a memory overflow). The speed, again, didn't improve
At this point I don't have any other idea to improve this
Well for one, you would certainly benefit from not querying for each user individually in a loop. What you can do is query & cache for only the UserLink or UserName meaning get & cache the complete set of only one of them because that's what you seem to need to differentiate in the if-else.
You can actually query for individual fields with Spring Data JPA #Query either directly or even with Spring Data JPA Projections to query subset of fields if needed and cache & use them for the lookup. If you think the users could run into millions or billions then you could think of using a distributed cache like Apache Ignite where your collection could scale easily.
Btw, the if-else seem to be inversed is it not?
Next you don't store each review individually which the above code appears to imply. You can write in batches. Also since you are using Postgres you can use Postgres CopyManager provided by Postgres for bulk data transfer by using it with Spring Data Custom repositories. So you can keep writing to a new text/csv file locally at a set schedule (every x minutes) and use this to write that batched text/csv to the table (after that x minutes) and remove the file. This would be really quick.
The other option is write a stored procedure that combines the above & invoke it again in a custom repository.
Please let me know which one you had like elaborated..
UPDATE (Jan 12 2022):
One other item i missed is when you querying for UserLink or UserName you can use a very efficient form of select query that Postgres supports instead of using an IN clause like below,
#Select("select u from user u where u.userLink = ANY('{:userLinks}'::varchar[])", nativeQuery = true)
List<Users> getUsersByLinks(#Param("userLinks") String[] userLinks);

How to implement pagination by nextPageToken?

I am trying to implement pagination using nextPageToken.
I have table:
CREATE TABLE IF NOT EXISTS categories
(
id BIGINT PRIMARY KEY,
name VARCHAR(30) NOT NULL,
parent_id BIGINT REFERENCES categories (id)
);
so I have entity Category:
#Id
#GeneratedValue(strategy = GenerationType.SEQUENCE)
private Long id;
#Column(name = "name")
private String name;
#OneToOne
#JoinColumn(name = "parent_id", referencedColumnName = "id")
private Category category;
I dont really understand what I have to do next.
client requests token(that keeps what?)
Assume I have controller:
#GetMapping
public ResponseEntity<CategoriesTokenResponse> getCategories(
#RequestParam String nextPageToken
) {
return ResponseEntity.ok(categoryService.getCategories(nextPageToken));
}
Service:
public CategoriesTokenResponse getCategories(String nextPageToken) {
return new CategoriesTokenResponse(categoryDtoList, "myToken");
}
#Data
#AllArgsConstructor
public class CategoriesTokenResponse {
private final List<CategoryDto> categories;
private final String token;
}
How I have to implement sql query for that? And how I have to generate nextPagetoken for each id?
SELECT * FROM categories WHERE parent_id = what?
AND max(category id from previous page = token?)
ORDER BY id LIMIT 20;
First, you need to understand what you are working with here. Every implementation has some sort of limitation or inefficiency. For example, using page tokens like that is only good for infinite scroll pages. You can't jump to any specific page. So if my browser crashes and I'm on page 100, I have to scroll through 100 pages AGAIN. It is faster for massive data sets for sure, but does that matter if you need access to all pages? Or if you limit the return to begin with? Such as only getting the first 30 pages?
Basically decide this: Do you only care about the first few pages because search/sort is always in use? (like a user never using more than the first 1-5 pages of google) and is that data set large? Then great use-case. Will a user select "all items in the last 6 months" and actually need all of them or is the sort/search weak? Or, will you return all pages and not limit max return of 30 pages? Or, is development speed more important than a 0.1-3 second (depends on data size) speed increase? Then go with built in JPA Page objects.
I have used Page objects on 700k records with less than a second speed change compared to 70k records. Based on that, I don't see removing offset adding a ton of value unless you plan for a huge data set. I just tested a new system I'm making with pageable, it returned 10 items on page 1 in 84 milliseconds with no page limiter for 27k records on a vpn into my work network from my house. A table with over 500k records took 131 milliseconds That's pretty fast. Want to make it faster? Force a total max return of 30 pages and a max of 100 results per page, because normally, they don't need all data in that table. They want something else? refine the search. The speed difference is less than a second between this and the seek/key stype paging. This is assuming a normal SQL database too. NoSQL is a bit different here. Baeldung has a ton of articles on jpa paging like the following: https://www.baeldung.com/rest-api-pagination-in-spring
JPA Paging should take no more than 30 minutes to learn and implement, it's extremely easy and comes stock on JPA repositories. I strongly suggest using that over the seek/key style paging as you likely aren't building a system like google's or facebook's.
If you absolutely want to go with the seek/key style paging there's a good informational page here:
https://blog.jooq.org/2013/10/26/faster-sql-paging-with-jooq-using-the-seek-method/
In general, what you are looking for is using JOOQ with spring. Implementation example here:
https://docs.spring.io/spring-boot/docs/1.3.5.RELEASE/reference/html/boot-features-jooq.html
Basically, create a DSL context:
private final DSLContext DSL;
#Autowired
public JooqExample(DSLContext dslContext) {
this.DSL= dslContext;
}
Then use it like so:
DSL.using(configuration)
.select(PLAYERS.PLAYER_ID,
PLAYERS.FIRST_NAME,
PLAYERS.LAST_NAME,
PLAYERS.SCORE)
.from(PLAYERS)
.where(PLAYERS.GAME_ID.eq(42))
.orderBy(PLAYERS.SCORE.desc(),
PLAYERS.PLAYER_ID.asc())
.seek(949, 15) // (!)
.limit(10)
.fetch();
Instead of explicitly phrasing the seek predicate, just pass the last record from the previous query, and jOOQ will see that all records before and including this record are skipped, given the ORDER BY clause.

Dynamodb AWS Java scan withLimit is not working

I am trying to use the DynamoDBScanExpression withLimit of 1 using Java aws-sdk version 1.11.140
Even if I use .withLimit(1) i.e.
List<DomainObject> result = mapper.scan(new DynamoDBScanExpression().withLimit(1));
returns me list of all entries i.e. 7. Am I doing something wrong?
P.S. I tried using cli for this query and
aws dynamodb scan --table-name auditlog --limit 1 --endpoint-url http://localhost:8000
returns me just 1 result.
DynamoDBMapper.scan will return a PaginatedScanList - Paginated results are loaded on demand when the user executes an operation that requires them. Some operations, such as size(), must fetch the entire list, but results are lazily fetched page by page when possible.
Hence, The limit parameter set on DynamoDBScanExpression is the maximum number of items to be fetched per page.
So in your case, a PaginatedList is returned and when you do PaginatedList.size it attempts to load all items from Dynamodb, under the hood the items were loaded 1 per page (each page is a fetch request to DynamoDb) till it get's to the end of the PaginatedList.
Since you're only interested in the first result, a good way to get that without fetching all the 7 items from Dynamo would be :
Iterator it = mapper.scan(DomainObject.class, new DynamoDBScanExpression().withLimit(1)).iterator();
if ( it.hasNext() ) {
DomainObject dob = (DomainObject) it.next();
}
With the above code, only the first item will fetched from dynamodb.
The take away is that : The limit parameter in DynamoDBQueryExpression is used in the pagination purpose only. It's a limit on the amount of items per page not a limit on the number of pages that can be requested.

Finding all entries where a collection field contains any of the given items

I have a Spring Boot application that contains an entity like this (some fields stripped for compactness):
#Entity
public class Message extends AbstractPersistable<Long> {
#ManyToMany(targetEntity = Tag.class, fetch = FetchType.EAGER)
private Set<Tag> tags;
// getter and setter for tags here
}
Tag is just a simple entity with a name field.
Now, I have another Set<Tag> in my code obtained from the user. I want to find all Messages that have any of the tags in this set. For example, if we have the following messages:
ID TAGS
1 1, 2, 3
2 2, 5, 7
3 2, 4, 7
Then the query should, given the set [3, 4] return messages 1 and 3.
I tried writing this repository:
public interface MessageRepository extends JpaRepository<Message, Long> {
List<Message> findByTags(Set<Tag> tag);
}
I enabled query logging and my code produced a query, which I've cleaned up a bit here. The query produces no results in the cases I tried, and I have no idea what the scalar = array comparison is doing.
SELECT message.id AS id FROM message
LEFT OUTER JOIN message_tags ON message.id = message_tags.message_id
LEFT OUTER JOIN tag ON message_tags.tags_id = tag.id
WHERE tag.id = (1,2,3,4,5) -- this is the input set
As suggested by #AliDehghani I tried writing the method as findByTagsIn(Set<Tag> tag). This replaced the last = with in in the query.
I got results, but there was another problem: the results were repeated for each matching tag as one might guess from the query. For example, searching for [2, 7] with the example messages above would return message 1, message 2 twice and message 3 twice.
As far as I know adding some kind of GROUP BY clause might help here.
The predefined query method keywords don't seem to have any features related to this either, so I think I have to write my own using #Query or something else.
I can't seem to figure out a query that would work either, and I'm not very experienced in H2 so I don't really want to guess how one might work either.
I don't want to write a method that find all messages with a single tag, call it for each tag and combine the results, since that would be ugly and, given a lot of tags as input, very slow. How should I write my query method?
List findByTags(Set tag);
As you can see from the query log, this query method will only find messages that are tagged with all tags in the given tag parameter, which is not what you want.
In order to find messages that are tagged with either of those tags, you can use the following query method:
List<Message> findByTagsIn(Set<Tag> tag);
I got results, but there was another problem: the results were
repeated for each matching tag as one might guess from the query.
In order to get rid of those repeated messages, you can fetch only distinct values:
List<Message> findDistinctByTagsIn(Set<Tag> tag);

Spring Data PageImpl not returning page with the correct size?

I am trying to create a new Page using a list of objects retrieved from the database. First I get all the elements from the DB, convert it to a Stream and then use lambda to filter the results. Then I need a Page with a set number of elements, however, instantiating a new PageImpl doesn't seem to return a page with the correct size.
Here is my code:
List<Produtos> listaFinal;
Stream<Produtos> stream = produtosRepository.findAll().stream();
listaFinal = stream.filter(p -> p.getProdNome().contains("uio")).collect(Collectors.toList());
long total = listaFinal.size();
Page<Produtos> imp = new PageImpl<>(listaFinal,pageable,total);
Here's a screenshot from debugging:
Note the size in the Pageable object is set to 20 and it understands that it needs 4 pages to render the 70 elements, but it returns the whole list.
What am I missing?
Edit answering the comment made by Thomas:
I understand how to use Page to return just a slice of the data. The code I showed was my attempt to use a lambda expression to filter my collection. The problem for me is I want to use Java 8's lambda to query the database via Spring Data JPA. Im used to VB.NET's and Entity function(x) query expressions and was wondering how to do the same with Spring JPA.
In my repository, Im using extends JpaRepository<Produtos, Integer>, QueryDslPredicateExecutor<Produtos> which gives me access to findAll(Predicate,Pageable). However, the Predicate is not typed so I cant simply use p -> p.getProdNome().contains("uio") in the query. I'm using SQL Server and Hibernate.
To extend stites' answer, a PagedListHolder is the way to go and here is how:
List<String> list = // ...
// Creation
PagedListHolder page = new PagedListHolder(list);
page.setPageSize(10); // number of items per page
page.setPage(0); // set to first page
// Retrieval
page.getPageCount(); // number of pages
page.getPageList(); // a List which represents the current page
If you need sorting, use another PagedListHolder constructor with a MutableSortDefinition.
PageImpl is not intended to perform any kind of pagination of your list. From the docs you can see that it's just the "basic Page implementation" which almost sounds like what you want, but it's really misleading.
Use PagedListHolder which is a simple state holder for handling lists of objects, separating them into pages.
After learning more about how Spring Data works I ended up using #Query annotations on my methods inside the JpaRepository implementations to properly query the DB and filter the results, eliminating the need to use a stream and then convert back to Page.
Here's how the code above would look in case anyone needs an example:
#Query("select p from Produtos p where p.prodNome = ?1")
public Page<Produtos> productsListByName(String prodNome, Pageable pageable)
Im aware of Spring's findBy methods but sometimes the method names become really difficult to read depending on the amount of parameters so I just stuck to JPQL.
Doing it this way the Page's content will always have up to the maximum amount of elements defined by you in the Spring configuration.
I also use a custom implementation of PageImpl, I'm not at work right now and don't have access to the code, but I'll post it whenever I can.
Edit: Custom implementation can be found here
If I understood your code right, then your intent is to load all records from the database and and split them into x buckets that are collected in the PageImpl, right?
Thats not how it used to work. The actual intent of the Pageable and Page abstraction is NOT
having to query all the data but just the "slice" of data that is needed.
In your case you could query the data via Page<X> page = repository.findAll(pageable); and simply return that.
Page holds the records for the current page alongside some additional information like e.g., the total number of records and whether there is a next page.
In your client code you can use that information to render a list of records and generating next / prev links appropriately.
Note that a query with Page<X> as result type issues 2 queries (1 to determine the overall total count for the query and 1 for the actual page data).
If you don't need the information about the total number of results but still want to be able to generate a next link you should
use Slice<X> as a return type - since it only issues 1 query.
I was also facing the same issue and found the way for it.
The SimpleJpaRepository has the method:
public Page<T> findAll(Specification<T> spec, Pageable pageable) {
TypedQuery<T> query = getQuery(spec, pageable);
return pageable == null ? new PageImpl<T>(query.getResultList())
: readPage(query, getDomainClass(), pageable, spec);
}
Which is used to return Page<T> in case you are extending JpaRepository. So we can use the same functionality here (need to rewrite the code, as Spring doesn't give you public method to have full pagination support).
If you look at the method PageImpl<>(List<T> content, Pageable pageable, long total); it just set's the value whatever you give in pageable. Here you are sending content as full list, but spring doesn't do it for it's internal purpose.
Need to replace Page<Produtos> imp = new PageImpl<>(listaFinal,pageable,total);
Following code:.
TypedQuery<User> query = entityManager.createQuery(criteriaQuery);
// Users type can be replaced with any other entity
query.setFirstResult(pageable.getOffset());
query.setMaxResults(pageable.getPageSize());
List<User> users = query.getResultList();
Page<User> result = PageableExecutionUtils.getPage(users,pageable,
() -> getCountForQuery(User.class));
Method getCountForQuery:
private Long getCountForQuery(Class<?> t) {
CriteriaBuilder criteriaBuilder=entityManager.getCriteriaBuilder();
CriteriaQuery<Long> countQuery = criteriaBuilder
.createQuery(Long.class);
countQuery.select(criteriaBuilder.count(
countQuery.from(t)));
Long count = entityManager.createQuery(countQuery)
.getSingleResult();
return count;
}
You can find the usage of PageableExecutionUtils.getPage in:
readPage(TypedQuery<S> query, final Class<S> domainClass,
Pageable pageable, final Specification<S> spec)
Method in SimpleJpaRepository which is mostly used by findAll internal method.
After applying a lot of methodes, this was the working solution in my case:
int pageSize = pageable.getPageSize();
long pageOffset = pageable.getOffset();
long total = pageOffset + list.size() + (list.size() == pageSize ? pageSize : 0);
Page<listType> page = new PageImpl<listType>(list, pageable,total)
previously my code was written like this
Pageable pageable = PageRequest.of(pageNo,size);
Query query = new Query().with(pageable);
and i was getting this for
"pageNumber": 0,
"pageSize": 5,
"size": 5,
"numberOfElements": 5,
"pageNumber": 0,
"pageSize": 10,
"size": 10,
"numberOfElements": 8,
8 is the total actual element i have in my DB
I changed that to this
Pageable pageable = PageRequest.of(pageNo,size);
Query query = new Query();
now i'm getting the actual number of item for any size of page
"pageNumber": 0,
"pageSize": 5,
"size": 5,
"numberOfElements": 8,

Categories