Elasticsearch grab all but limit to a certain number with Java API - java

There isnt an example online with the Java API to show how to limit the rows that come back from a search with ElasticSearch for all items. I tried the Filter Limit but it just wouldnt work because it would bring back more then the limit. I know its per shard to, but is there no way around this. I cant find the from/size query/filter either in the Java API
SearchQuery searchQuery = startQuery(limit, null).build();
Iterable<Statement> iterableStatements = esSpringDataRepository.search(searchQuery);
if (iterableStatements != null) {
return IteratorUtils.toList(iterableStatements.iterator());
}
private NativeSearchQueryBuilder startQuery(int limit, QueryBuilder query) {
NativeSearchQueryBuilder searchQueryBuilder = new NativeSearchQueryBuilder();
if(query != null) {
searchQueryBuilder = searchQueryBuilder.withQuery(query);
}
if(limit > 0) {
searchQueryBuilder = searchQueryBuilder.withFilter(FilterBuilders.limitFilter(limit));
}
return searchQueryBuilder;
}

Well I got it to work perfectly with this instead of the limit filter:
searchQueryBuilder = searchQueryBuilder.withPageable(new PageRequest(0, limit));

Related

Loading all contacts using the Microsoft Graph API sometimes looses/skips pages

We have an application that loads all contacts stored in an account using the Microsoft Graph API. The initial call we issue is https://graph.microsoft.com/v1.0/users/{userPrincipalName}/contacts$count=true&$orderBy=displayName%20ASC&$top=100, but we use the Java JDK to do that. Then we iterate over all pages and store all loaded contacts in a Set (local cache).
We do this every 5 minutes using an account with over 3000 contacts and sometimes, the count of contacts we received due to using $count does not match the number of contacts we loaded and stored in the local cache.
Verifying the numbers manually we can say, that the count was always correct, but there are contacts missing.
We use the following code to achieve this.
public List<Contact> loadContacts() {
Set<Contact> contacts = new TreeSet<>((contact1, contact2) -> StringUtils.compare(contact1.id, contact2.id));
List<QueryOption> requestOptions = List.of(
new QueryOption("$count", true),
new QueryOption("$orderBy", "displayName ASC"),
new QueryOption("$top", 100)
);
ContactCollectionRequestBuilder pageRequestBuilder = null;
ContactCollectionRequest pageRequest;
boolean hasNextPage = true;
while (hasNextPage) {
// initialize page request
if (pageRequestBuilder == null) {
pageRequestBuilder = graphClient.users(userId).contacts();
pageRequest = pageRequestBuilder.buildRequest(requestOptions);
} else {
pageRequest = pageRequestBuilder.buildRequest();
}
// load
ContactCollectionPage contactsPage = pageRequest.get();
if (contactsPage == null) {
throw new IllegalStateException("request returned a null page");
} else {
contacts.addAll(contactsPage.getCurrentPage());
}
// handle next page
hasNextPage = contactsPage.getNextPage() != null;
if (hasNextPage) {
pageRequestBuilder = contactsPage.getNextPage();
} else if (contactsPage.getCount() != null && !Objects.equals(contactsPage.getCount(), (long) contacts.size())) {
throw new IllegalStateException(String.format("loaded %d contacts but response indicated %d contacts", contacts.size(), contactsPage.getCount()));
} else {
// done
}
}
log.info("{} contacts loaded using graph API", contacts.size());
return new ArrayList<>(contacts);
}
Initially, we did not put the loaded contacts in a Set by ID but just in a List. With the List we very often got more contacts than $count. My idea was, that there is some caching going on and some pages get fetched multiple times. Using the Set we can make sure, that we only have unique contacts in our local cache.
But using the Set, we sometimes have less contacts than $count, meaning some pages got skipped and we end up in the condition that throws the IllegalStateException.
Currently, we use microsoft-graph 5.8.0 and azure-identiy 1.4.2.
Have you experienced similar issues and can help us solve this problem?
Or do you have any idea what could be causing these inconsistent results?
Your help is very much appreciated!

How can I paginate documents and handle a pageToken in Firestore?

I'm trying to implement a java REST API allowing the UI to list documents from Firestore (potentially ordering it on multiple fields).
I'm following the official documentation but I'm struggling about how to handle/generate a next page token (since the UI will potentially need to iterate over and over) from the response. Is there any way to implement this behavior with the GRPC client? Should I switch to the REST client (which seems to expose a nextPageToken field)?
Here is a workaround I found to mime a pagination-like behavior:
public <T extends InternalModel> Page<T> paginate(#NonNull Integer maxResults, #Nullable String pageToken) {
try (Firestore db = getFirestoreService()) {
CollectionReference collectionReference = db.collection(getCollection(type));
Query query = collectionReference
.limit(maxResults);
// The page token is the id of the last document
if (!Strings.isNullOrEmpty(pageToken)) {
DocumentSnapshot lastDocument = collectionReference.document(pageToken).get().get();
query = query.startAfter(lastDocument);
}
List<InternalModel> items = (List<InternalModel>) query.get().get().toObjects(type);
String nextPageToken = "";
if (!CollectionUtils.isEmpty(items) && maxResults.equals(items.size())) {
nextPageToken = items.get(items.size() - 1).getId();
}
return Page.create(items, nextPageToken);
}
}
I'm opened to any better solution since this might not be the most optimal way.

Correct way to implement paging for Cassandra with CassandraRepository from Spring Data

I'm looking for a solution to implement paging for our Spring Boot based REST-Service with a Cassandra (version 3.11.3) database. We are using Spring Boot 2.0.5.RELEASE with spring-boot-starter-data-cassandra as a dependency.
As Spring Data's CassandraRepository<T, ID> interface does not extend the PagingAndSortingRepository we don't get the full paging functionality like we have with JPA.
I read the Spring Data Cassandra documentation and could find a possible way to implement paging with Cassandra and Spring Data as the CassandraRepository interface has the following method available Slice<T> findAll(Pageable pageable);. I am aware that Cassandra is not able to get a specific page adhoc and always needs page zero to iterate through all pages as it is documented in the CassandraPageRequest:
Cassandra-specific {#link PageRequest} implementation providing access to {#link PagingState}. This class allows creation of the first page request and represents through Cassandra paging is based on the progress of fetched pages and allows forward-only navigation. Accessing a particular page requires fetching of all pages until the desired page is reached.
In my usecase we have > 1.000.000 database entries and want to display them paged in our single page application.
My current approach looks like the following:
#RestController
#RequestMapping("/users")
public class UsersResource {
#Autowired
UserRepository userRepository;
#GetMapping
public ResponseEntity<List<User>> getAllTests(
#RequestParam(defaultValue = "0", name = "page") #Positive int requiredPage,
#RequestParam(defaultValue = "500", name = "size") int size) {
Slice<User> resultList = userRepository.findAll(CassandraPageRequest.first(size));
int currentPage = 0;
while (resultList.hasNext() && currentPage <= requiredPage) {
System.out.println("Current Page Number: " + currentPage);
resultList = userRepository.findAll(resultList.nextPageable());
currentPage++;
}
return ResponseEntity.ok(resultList.getContent());
}
}
BUT with this approach I have to find the requested page while fetching all database entries to memory and iterate until I found the correct page. Is there a different approach to find the correct page or do I have to use my current solution?
My Cassandra table definition looks like the following:
CREATE TABLE user (
id int, firstname varchar,
lastname varchar,
code varchar,
PRIMARY KEY(id)
);
What I have done is to create a page object that has the content and the pagingState hash.
In the initial page, we have the simple paging
Pageable pageRequest = CassandraPageRequest.of(0,5);
once the find is performed we get the slice
Slice<Group> slice = groupRepository.findAll(pageRequest);
with the slice you can get the paging state
page.setPageHash(getPageHash((CassandraPageRequest) slice.getPageable()));
where
private String getPageHash(CassandraPageRequest pageRequest) {
return Base64.toBase64String(pageRequest.getPagingState().toBytes());
}
finally returning a Page object with the List content and the pagingState as pageHash
See this below code. It may help.
#GetMapping("/loadData")
public Mono<DataTable> loadData(#RequestParam boolean reset, #RequestParam(required = false) String tag, WebSession session) {
final String sessionId = session.getId();
IMap<String, String> map = Context.get(HazelcastInstance.class).getMap("companygrouping-pageable-map");
int pageSize = Context.get(EnvProperties.class).getPageSize();
Pageable pageRequest;
if (reset)
map.remove(sessionId);
String serializedPagingState = map.compute(sessionId, (k, v) -> (v == null) ? null : map.get(session.getId()));
pageRequest = StringUtils.isBlank(serializedPagingState) ? CassandraPageRequest.of(0, pageSize)
: CassandraPageRequest.of(PageRequest.of(0, pageSize), PagingState.fromString(serializedPagingState)).next();
Mono<Slice<TagMerge>> sliceMono = StringUtils.isNotBlank(tag)
? Context.get(TagMergeRepository.class).findByKeyStatusAndKeyTag(Status.NEW, tag, pageRequest)
: Context.get(TagMergeRepository.class).findByKeyStatus(Status.NEW, pageRequest);
Flux<TagMerge> flux = sliceMono.map(t -> convert(t, map, sessionId)).flatMapMany(Flux::fromIterable);
Mono<DataTable> dataTabelMono = createTableFrom(flux).doOnError(e -> log.error("{}", e));
if (reset) {
Mono<Long> countMono = Mono.empty();
if (StringUtils.isNotBlank(tag))
countMono = Context.get(TagMergeRepository.class).countByKeyStatusAndKeyTag(Status.NEW, tag);
else
countMono = Context.get(TagMergeRepository.class).countByKeyStatus(Status.NEW);
dataTabelMono = dataTabelMono.zipWith(countMono, (t, k) -> {
t.setTotalRows(k);
return t;
});
}
return dataTabelMono;
}
private List<TagMerge> convert(Slice<TagMerge> slice, IMap<String, String> map, String id) {
PagingState pagingState = ((CassandraPageRequest) slice.getPageable()).getPagingState();
if (pagingState != null)
map.put(id, pagingState.toString());
return slice.getContent();
}
Cassandra supports forward pagination which means you can fetch first n rows then you can fetch rows between n+1 and 2n and so on until your data ends but you can't fetch rows between n+1 and 2n directly.

itunes-search API Android

Due to unexpected circumstances I find myself coding an app for Android in Java(my experience is 0 with Android or Java), I´ve been doing so far very good, but now I´m stuck with something. I´m trying to Parse the iTunes Search API, and I´ve downloaded a library for doing so:
https://github.com/mdewilde/itunes-search
But I can´t manage to get the single parameters after making the Search(), this is my code:
Response response = new Search("La+vereda+de+la+puerta+de+atras").execute();
response.setResults(response.getResults());
Result result = new Result();
System.out.println(result.getArtistName().toString()); // Returns nil
Thank you very much in advanced
This would be the correct way of doing that:
Response response = new Search("La+vereda+de+la+puerta+de+atras").execute();
List<Result> results = response.getResults();
if (results != null && results.size() > 0) {
for(Result result : results) {
System.out.println(result.getArtistName().toString());
}
} else {
System.out.println("No results found :(");
}

Working Lucene SearchAfter Example

I'm trying to use Lucene 4.8.1's SearchAfter methods to implement paging of search results in a web application.
A similar question has been asked before, but the accepted answer given there does not work for me:
Stack Overflow Question: Lucene web paging
When I create a Lucene ScoreDoc from scratch in this way to use as an argument for SearchAfter:
ScoreDoc sd = new ScoreDoc(14526, 0.0f);
TopDocs td = indexSearcher.searchAfter(sd, query, null, PAGEHITS);
I get this exception:
java.lang.IllegalArgumentException: after must be a FieldDoc
This appears contrary to the documentation. But in any case, when I create a Field Doc instead, I get:
java.lang.IllegalArgumentException: after.fields wasn't set
after.fields is an Object array, so I can hardly set that with information I can pass in a URI!
I cannot find any working code examples using SearchAfter. My original plan was obviously to create a new ScoreDoc as the previous question suggests. Can anybody suggest what I might be doing wrong, or link to any working code examples of SearchAfter?
Thanks!
I don't believe you can create a scoredoc and then pass it to searchAfter. You need to use the ScoreDocs returned from a previous search.
can you have a try.
#Test
public void searchAfter() {
Object[] objects = new Object[]{"1"};
List<Map<String, Object>> data = new ArrayList<Map<String, Object>>();
boolean type = true;
while (type) {
SearchHits searchHits = searchAfter(objects);
SearchHit[] hits = searchHits.getHits();
if (hits != null && hits.length > 0){
objects = hits[hits.length-1].getSortValues();
if (hits.length < size) type = false;
for (SearchHit hit : hits) {
data.add(hit.getSourceAsMap());
System.out.println(JsonUtil.objectToJson(hit.getSourceAsMap()));
}
}
}
Iterator<Map<String, Object>> iterator = data.iterator();
while (iterator.hasNext()) {
System.out.println(iterator.next().toString());
}
System.out.println(data.size() + "-----------------");
}
public SearchHits searchAfter(Object[] objects) {
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.termQuery("age", "33"));
sourceBuilder.size(size);
sourceBuilder.sort("account_number", SortOrder.ASC);
sourceBuilder.searchAfter(objects);
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("bank");
searchRequest.source(sourceBuilder);
ActionFuture<SearchResponse> response = elasticsearchTemplate.getClient().search(searchRequest);
SearchHits searchHits = response.actionGet().getHits();
return searchHits;
}

Categories