How to query multi-valued array fields in Elasticsearch using Java client? - java

Using the Elasticsearch High Level REST Client for Java v7.3
I have a few fields in the schema that look like this:
{
"document_type" : ["Utility", "Credit"]
}
Basically one field could have an array of strings as the value. I not only need to query for a specific document_type, but also a general string query.
I've tried the following code:
QueryBuilder query = QueryBuilders.boolQuery()
.must(QueryBuilders.queryStringQuery(terms))
.filter(QueryBuilders.termQuery("document_type", "Utility"));
...which does not return any results. If I remove the ".filter()" part the query returns fine, but the filter appears to prevent any results from coming back. I'm suspecting it's because document_type is a multi-valued array - maybe I'm wrong though. How would I build a query query all documents for specific terms, but also filter by document_type?

I think, the reason is the wrong query. Consider using the terms query instead of term query. There is also a eqivalent in the java api.
Here is a good overview of the query qsl queries and their eqivalent in the high level rest client: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-query-builders.html

Related

Cannot use _source with pagination in Spring Data Elasticsearch

I am facing multiple weird problems when trying to use _source in a query with pagination.
If I use stream API then the sourceFilter is totally discarded. So this query will not generate _source json attribute in the query:
SourceFilter sourceFilter = new FetchSourceFilter(new String[]{"emails.sha256"}, null);
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(query)
.withSourceFilter(sourceFilter)
.withPageable(PageRequest.of(0, pageSize))
.build();
elasticsearchTemplate.stream(searchQuery, clazz)
On the other hand, if I change the stream method by queryForPage
elasticsearchTemplate.queryForPage(searchQuery, clazz)
The Elasticsearch query is properly generating the _source json attribute, but then I face issues with the pagination when the from attribute gets quite bigger. The error I get is:
{
"type": "query_phase_execution_exception",
"reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [10002]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
}
I cannot modify max_result_window because it will always be big (I have billions of documents).
I also tested the startScroll that should resolve the pagination problem but I got a weird NoSuchMethodError
java.lang.NoSuchMethodError: org.springframework.data.elasticsearch.core.ElasticsearchTemplate.startScroll(JLorg/springframework/data/elasticsearch/core/query/SearchQuery;Ljava/lang/Class;)Lorg/springframework/data/elasticsearch/core/ScrolledPage;
I am using Spring Data Elasticsearch 3.2.0.BUILD-SNAPSHOT and Elasticsearch 6.5.4
Any idea about how I can paginate a query but limiting the response data using _source?

SOLR distinct query

i have to select distinct values from a specific field in solr using Java. I have applied
SolrQuery solrQuery = new SolrQuery();
solrQuery.add("q", "*:*");
solrQuery.setParam("fl", "field_name");
solrQuery.add("facet", "on");
solrQuery.add("facet.field_name", "field_name");
And i have tried very diffrent methods but still don't work.
"facet.field_name" has to be "facet.field". A little less error prone is
query.setFacet(true);
query.addFacetField("field_name");
SolrJ has convinient methods to achieve this.
solrQuery.setQuery("your query field","value");
solrQuery.setFacet(true);
solrQuery.addFacetField("your specific field for distincts");
solrQuery.setRows(0);
Once you hit the server with this request, you can get the unique field values as a
List of FacetField objects from the QueryResponse object.
I have set solrQuery.setRows(0); to avoid query responses and make it more efficient. This doesn't affect the facet results.

Datastore query with IN operator

The new flexible environment datastore interface does not seem to support IN operation when running a query. I hope that I'm wrong, and if so, how can one use an IN operator in the new Java interface of Datastore?
A query like - WHERE color IN('RED', 'BLACK'), it is not supported by the Datastore (server side). Same is the case with OR operator (e.g. WHERE color='RED' OR color='BLACK'). Some client APIs have added this functionality by splitting the query into multiple and then merging the results from each query. The new google-cloud-java API does not support this yet. For now, you would have to run multiple queries for each value in the IN clause and merge the results.
Here’s an example from the documentation:
If you want to set more than one filter on a query, you must use CompositeFilter, which requires at least two filters.
Filter tooShortFilter = new FilterPredicate("height", FilterOperator.LESS_THAN, minHeight);
Filter tooTallFilter = new FilterPredicate("height", FilterOperator.GREATER_THAN, maxHeight);
Filter heightOutOfRangeFilter = CompositeFilterOperator.or(tooShortFilter, tooTallFilter);
Query q = new Query("Person").setFilter(heightOutOfRangeFilter);
You can also use .and(). The code here is for Java 7. For Java 8 you can find a corresponding code in the documentation referenced above. I hope that helps.
Now to IN. While I have not tried it myself recently, the current documentation states that it can still be used as an operator. According to it, something like the code below should work:
Filter propertyFilter = new FilterPredicate("height", FilterOperator.IN, minHeights);
Query q = new Query("Person").setFilter(propertyFilter);
Alternatively, you could use Google GQL. It will allow you to write SQL-like syntax, in which you can use in(...).
I tried using the repository query methods, but I got an error informing that it is not supported.
Only solved for me using the #Query annotation;
Example:
#Query("select * from UserGroup where name IN #names")
List<Company> findAllByName(List<String> names);

How do you execute multiple operations on a property using the Query Builder API?

In AEM 6.2 I created a Java servlet where I use QueryBuilder to query the JCR for relevant content.
I'm want to limit the search results to nodes that do not have the 'sling:resourceType' property or, if they do have it, it cannot be equal to 'social/qna/components/hbs/post'. Writing the XPATH query out like this works
//*[jcr:contains(., 'searchTerm') and ((not (#sling:resourceType)) or (#sling:resourceType != 'social/qna/components/hbs/post')) and ((#jcr:primaryType = 'cq:Page') or (#jcr:primaryType = 'social:asiResource'))]
But I can't figure out how to use the QueryBuilder API to create this query. This is what my QueryBuilder code looks like.
group.p.or=true
group.1_path=/content/myproject
group.2_path=/content/usergenerated/asi/jcr/content/myproject
1_group.p.or=true
1_group.1_type=cq:Page
1_group.2_type=social:asiResource
fulltext=searchTerm
property.p.or=true
property=sling:resourceType
property.operation=exists
property.value=false
property.1_operation=unequals
property.1_value=social/qna/components/hbs/post
How can I rewrite the property section so it only returns results where sling:resourceType property doesn't exist or does not equal 'social/qna/components/hbs/post'?

Stop the JPA (Hibernate) Criteria API creating repeated INs for a group of OR Predicates

I'm using the Criteria API with the Hibernate implementation of JPA. I want to improve the structure of my generated SQL query so that the same IN expression isn't repeated for each OR predicate.
I want to do this because the code is running on GAE and I get a StackOverflowError in the case where the list of names is long in the IN condition (It's due to Hibernate using StringBuilder to build the parameter list). I've pinned the problem down to this section of code as when I remove, say 8, of the OR predicates the code runs without error. The code runs fine on non-memory restrictive environments (my PC)....and yes I have increased the instance memory allocation on GAE but I still get the same error.
The Java code I have to build this part of the query is below (parameter names edited and I'm using #StaticMetamodel classes for the parameter names):-
private void buildNamesToQuery(List<String> names, Root<ARoot> theRoot,
Join<Entity1, Entity2> aJoin, List<Predicate> orPredicates) {
orPredicates.add(aJoin.get(Entity1_.name1).in(names));
orPredicates.add(aJoin.get(Entity1_.name2).in(names));
orPredicates.add(aJoin.get(Entity1_.name3).in(names));
orPredicates.add(aJoin.get(Entity1_.name4).in(names));
orPredicates.add(theRoot.get(Entity2_.name5).in(names));
orPredicates.add(theRoot.get(Entity2_.name6).in(names));
orPredicates.add(theRoot.get(Entity2_.name7).in(names));
orPredicates.add(theRoot.get(Entity2_.name8).in(names));
orPredicates.add(theRoot.get(Entity2_.name9).in(names));
orPredicates.add(theRoot.get(Entity2_.name10).in(names));
}
This generates SQL with the following structure (parameter names edited):-
select
//lots of detail for the select
where
//lots of condition detail
//but what I'm interested in is below
and (
Entity_.name1 in (?,?,?....
) //Lots of names
or Entity_.name2 in (?,?,?....
) //same list of names
or Entity_.name3 in (?,?,?...
) //same list of names again
//.
//continues like this for remainder of the query
//
);
How can I change the Criteria API code so the SQL becomes this?:-
and (
joinEntity_.name1 or joinEntity.name2 or joinEntity.name3 ... etc IN (?,?,?...)
with the same long list of names? Hopefully that's clear enough for you to suggest a solution.

Categories