I'm trying to use Lucene 4.8.1's SearchAfter methods to implement paging of search results in a web application.
A similar question has been asked before, but the accepted answer given there does not work for me:
Stack Overflow Question: Lucene web paging
When I create a Lucene ScoreDoc from scratch in this way to use as an argument for SearchAfter:
ScoreDoc sd = new ScoreDoc(14526, 0.0f);
TopDocs td = indexSearcher.searchAfter(sd, query, null, PAGEHITS);
I get this exception:
java.lang.IllegalArgumentException: after must be a FieldDoc
This appears contrary to the documentation. But in any case, when I create a Field Doc instead, I get:
java.lang.IllegalArgumentException: after.fields wasn't set
after.fields is an Object array, so I can hardly set that with information I can pass in a URI!
I cannot find any working code examples using SearchAfter. My original plan was obviously to create a new ScoreDoc as the previous question suggests. Can anybody suggest what I might be doing wrong, or link to any working code examples of SearchAfter?
Thanks!
I don't believe you can create a scoredoc and then pass it to searchAfter. You need to use the ScoreDocs returned from a previous search.
can you have a try.
#Test
public void searchAfter() {
Object[] objects = new Object[]{"1"};
List<Map<String, Object>> data = new ArrayList<Map<String, Object>>();
boolean type = true;
while (type) {
SearchHits searchHits = searchAfter(objects);
SearchHit[] hits = searchHits.getHits();
if (hits != null && hits.length > 0){
objects = hits[hits.length-1].getSortValues();
if (hits.length < size) type = false;
for (SearchHit hit : hits) {
data.add(hit.getSourceAsMap());
System.out.println(JsonUtil.objectToJson(hit.getSourceAsMap()));
}
}
}
Iterator<Map<String, Object>> iterator = data.iterator();
while (iterator.hasNext()) {
System.out.println(iterator.next().toString());
}
System.out.println(data.size() + "-----------------");
}
public SearchHits searchAfter(Object[] objects) {
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.termQuery("age", "33"));
sourceBuilder.size(size);
sourceBuilder.sort("account_number", SortOrder.ASC);
sourceBuilder.searchAfter(objects);
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("bank");
searchRequest.source(sourceBuilder);
ActionFuture<SearchResponse> response = elasticsearchTemplate.getClient().search(searchRequest);
SearchHits searchHits = response.actionGet().getHits();
return searchHits;
}
Related
this is my code, where I want to gain pagination ability, for not getting the whole bucket list and not to overload RAM memory. But anyway, it's unclear to me:
what is this actually 'afterKey' for, and what is the use case. Yeah I understand that here, as a key of that 'aggregateAfter' map should be set the field, on which it's going to be aggregated the results, but what about the value of that map, what to set there, this is what I don't understand at all. Please have a look on this code, and introduce the changes that should be done here, in order the pagination works for me.
Am I thinking correct way, that here the response(searchResponse) should contain only the paginated results, or I should do more from here to gain that miracle.
public BucketList getListOfBucket(final BucketListInfo bucketListInfo, int from, int size) {
final SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
for (final String aggrField : bucketListInfo.getAggrFieldList()) {
CompositeAggregationBuilder aggregationBuilder = AggregationBuilders
.composite(aggrField, List.of(new TermsValuesSourceBuilder(aggrField).field(aggrField)))
.aggregateAfter(Map.of(aggrField, aggrField))
.size(bucketListInfo.getTopResultsCount());
searchSourceBuilder
.from(from)
.size(size)
.aggregation(aggregationBuilder);
}
final SearchRequest searchRequest = new SearchRequest(bucketListInfo.getIndexName())
.source(searchSourceBuilder);
try {
final SearchResponse response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
// here with the response all buckets are coming, instead of only the specified amount(as pagination 'size', and 'from') of buckets to come
return extractBucketsFromResponse(bucketListInfo, response);
} catch (Exception e) {
log.error(e.getMessage(), e);
return null;
}
}
I am developing a project that integrates Java and Elasticsearch.
And I am using scroll api because of searching a large amount of data.
I want to see unique result( like distint in oracle).
How to remove duplicate search result in elasticsearch?
I searched, but couldn't find the Java version.
My code is like this (this is a just sample code):
final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));
SearchRequest searchRequest = new SearchRequest("posts");
searchRequest.scroll(scroll);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(matchQuery("title", "Elasticsearch"));
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
String scrollId = searchResponse.getScrollId();
SearchHit[] searchHits = searchResponse.getHits().getHits();
while (searchHits != null && searchHits.length > 0) {
SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
scrollRequest.scroll(scroll);
searchResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);
scrollId = searchResponse.getScrollId();
searchHits = searchResponse.getHits().getHits();
}
Is there any way to search the data on the elastic without duplication?
Scroll will return all the documents in elasticsearch. You cannot perform distinct operation in elasticsearch
There are two ways to resolve your issue
Field collapsing
It does a group by on different fields and return top 1 document
Terms Aggregation
{
"aggs": {
"t": {
"terms": {
"script": "doc['title.keyword'] + ' '+doc['description.keyword']"
}
}
}
}
Scroll is intended to fetch large number of documents and above options are not for bulk data. So you need to perform distinct operation client side(outside of elastic search)
I have a Query for getting lastSeenTime only for one user
but what I need is to get a map of ids by their last seen for a list of users in elastic search
can somebody help me with converting this query to find last seen of a list of users ssoIds?
static Map<String, Object> getLastSeen(String ssoId) {
SearchResponse response = transportClient.prepareSearch(ChatSettings.ELASTIC_LAST_SEEN_INDEX_NAME)
.setTypes(ChatSettings.ELASTIC_DB_NAME)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(QueryBuilders.idsQuery().addIds(ssoId))
.setFrom(0).setSize(1).setExplain(true)
.get();
checkResponse(response);
Map<String, Object> result = null;
if (response.getHits().getTotalHits() > 0) {
result = response.getHits().getAt(0).getSource();
}
return result;
}
actually I want something like this
static Map<String, Object> getLastSeens(List<String> ssoIdList)
{
//elsticQuery
}
You can use fetch in your elastic query to return only selected fields:
.setFetchSource(new String[]{"field1","field2}, null)
And for passing multiple IDs, you can pass the Array of ids to the idsQuery()
So, in your case it will become:
SearchResponse response = transportClient
//.prepareSearch(ChatSettings.ELASTIC_LAST_SEEN_INDEX_NAME) // you might need to pass the columns here
.setTypes(ChatSettings.ELASTIC_DB_NAME)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setFetchSource(new String[]{"id","lastSeenTime"}, null) // or you can pass the columns here
.setQuery(QueryBuilders.idsQuery().addIds(ssoIds)) // where ssoIds is a array of Ids
.setFrom(0).setSize(1).setExplain(true)
.get();
post this, rest of the code will work as it is.
I am trying to use ES as the index for my MongoDB. I've managed to integrate them successfully, but I find the search API rather complex and confusing. The Java API is not too helpful either.
I am able to find exact matches, but how can I get this result? Here is my code:
Node node = nodeBuilder().node();
SearchResponse sr = node.client().prepareSearch()
.addAggregation(
AggregationBuilders.terms("user").field("admin2san")
.subAggregation(AggregationBuilders.terms("SPT").field("64097"))
)
.execute().actionGet();
SearchHit[] results = sr.getHits().getHits();
List<Firewall> myfirewall = results.getSourceAsObjectList(Firewall.class);
for (Firewall info : myfirewall) {
System.out.println("search result is " + info);
}
I'm not quite sure I understood your question.
If you want to print the result of your searchResponse according to your example it should be something like this :
SearchHit[] results = sr.getHits().getHits();
for(SearchHit hit : results){
String sourceAsString = hit.getSourceAsString();
if (sourceAsString != null) {
Gson gson = new GsonBuilder().setDateFormat(dateFormat)
.create();
System.out.println( gson.fromJson(sourceAsString, Firewall.class));
}
}
I'm using Gson to convert from the Json response to the FireWall(POJO).
I hope it's what you were looking for.
response.getHits().getHits()[0].getSourceAsMap() you could try somwthing like this
When executing queries on a standalone Neo4J server using the RestCypherEngine, what is the best practice to retrieve a collection of nodes?
I have this code snippet running....
public DbService() {
gd = new RestGraphDatabase("http://neo4jbox:7474/db/data/");
engine = new RestCypherQueryEngine(gd.getRestAPI());
}
public String testData() {
try (Transaction tx = gd.beginTx()) {
QueryResult<Map<String, Object>> result;
result = engine.query(
"match (n:Person{username:'jomski2009'}) return n ",
null);
Iterator<Map<String, Object>> itr = result.iterator();
while (itr.hasNext()) {
Map<String, Object> item = itr.next();
log.info(item.get("n"));
}
tx.success();
return result.toString();
}
}
When I run the code, I get the following result...
services.DbService : http://neo4jbox:7474/db/data/node/177
which is a link to the node rather than the node itself. Now I know that if I return just a subset of the properties of the node in the same query that works well. What I'd like to know is how do I retrieve complete node object without necessarily specifying the properties in the query?
Thanks for your help guys.
It is just the to-string representation of a RestNode, it still has the properties. But not the relationships fetched those will be fetched on demand.
I would recommend to try to fetch primitive values over the wire with Cypher, works best as it minimizes the transferred data and you only get what you need.