i have to select distinct values from a specific field in solr using Java. I have applied
SolrQuery solrQuery = new SolrQuery();
solrQuery.add("q", "*:*");
solrQuery.setParam("fl", "field_name");
solrQuery.add("facet", "on");
solrQuery.add("facet.field_name", "field_name");
And i have tried very diffrent methods but still don't work.
"facet.field_name" has to be "facet.field". A little less error prone is
query.setFacet(true);
query.addFacetField("field_name");
SolrJ has convinient methods to achieve this.
solrQuery.setQuery("your query field","value");
solrQuery.setFacet(true);
solrQuery.addFacetField("your specific field for distincts");
solrQuery.setRows(0);
Once you hit the server with this request, you can get the unique field values as a
List of FacetField objects from the QueryResponse object.
I have set solrQuery.setRows(0); to avoid query responses and make it more efficient. This doesn't affect the facet results.
Related
Using the Elasticsearch High Level REST Client for Java v7.3
I have a few fields in the schema that look like this:
{
"document_type" : ["Utility", "Credit"]
}
Basically one field could have an array of strings as the value. I not only need to query for a specific document_type, but also a general string query.
I've tried the following code:
QueryBuilder query = QueryBuilders.boolQuery()
.must(QueryBuilders.queryStringQuery(terms))
.filter(QueryBuilders.termQuery("document_type", "Utility"));
...which does not return any results. If I remove the ".filter()" part the query returns fine, but the filter appears to prevent any results from coming back. I'm suspecting it's because document_type is a multi-valued array - maybe I'm wrong though. How would I build a query query all documents for specific terms, but also filter by document_type?
I think, the reason is the wrong query. Consider using the terms query instead of term query. There is also a eqivalent in the java api.
Here is a good overview of the query qsl queries and their eqivalent in the high level rest client: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-query-builders.html
I am facing multiple weird problems when trying to use _source in a query with pagination.
If I use stream API then the sourceFilter is totally discarded. So this query will not generate _source json attribute in the query:
SourceFilter sourceFilter = new FetchSourceFilter(new String[]{"emails.sha256"}, null);
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(query)
.withSourceFilter(sourceFilter)
.withPageable(PageRequest.of(0, pageSize))
.build();
elasticsearchTemplate.stream(searchQuery, clazz)
On the other hand, if I change the stream method by queryForPage
elasticsearchTemplate.queryForPage(searchQuery, clazz)
The Elasticsearch query is properly generating the _source json attribute, but then I face issues with the pagination when the from attribute gets quite bigger. The error I get is:
{
"type": "query_phase_execution_exception",
"reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [10002]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
}
I cannot modify max_result_window because it will always be big (I have billions of documents).
I also tested the startScroll that should resolve the pagination problem but I got a weird NoSuchMethodError
java.lang.NoSuchMethodError: org.springframework.data.elasticsearch.core.ElasticsearchTemplate.startScroll(JLorg/springframework/data/elasticsearch/core/query/SearchQuery;Ljava/lang/Class;)Lorg/springframework/data/elasticsearch/core/ScrolledPage;
I am using Spring Data Elasticsearch 3.2.0.BUILD-SNAPSHOT and Elasticsearch 6.5.4
Any idea about how I can paginate a query but limiting the response data using _source?
The new flexible environment datastore interface does not seem to support IN operation when running a query. I hope that I'm wrong, and if so, how can one use an IN operator in the new Java interface of Datastore?
A query like - WHERE color IN('RED', 'BLACK'), it is not supported by the Datastore (server side). Same is the case with OR operator (e.g. WHERE color='RED' OR color='BLACK'). Some client APIs have added this functionality by splitting the query into multiple and then merging the results from each query. The new google-cloud-java API does not support this yet. For now, you would have to run multiple queries for each value in the IN clause and merge the results.
Here’s an example from the documentation:
If you want to set more than one filter on a query, you must use CompositeFilter, which requires at least two filters.
Filter tooShortFilter = new FilterPredicate("height", FilterOperator.LESS_THAN, minHeight);
Filter tooTallFilter = new FilterPredicate("height", FilterOperator.GREATER_THAN, maxHeight);
Filter heightOutOfRangeFilter = CompositeFilterOperator.or(tooShortFilter, tooTallFilter);
Query q = new Query("Person").setFilter(heightOutOfRangeFilter);
You can also use .and(). The code here is for Java 7. For Java 8 you can find a corresponding code in the documentation referenced above. I hope that helps.
Now to IN. While I have not tried it myself recently, the current documentation states that it can still be used as an operator. According to it, something like the code below should work:
Filter propertyFilter = new FilterPredicate("height", FilterOperator.IN, minHeights);
Query q = new Query("Person").setFilter(propertyFilter);
Alternatively, you could use Google GQL. It will allow you to write SQL-like syntax, in which you can use in(...).
I tried using the repository query methods, but I got an error informing that it is not supported.
Only solved for me using the #Query annotation;
Example:
#Query("select * from UserGroup where name IN #names")
List<Company> findAllByName(List<String> names);
I am facing a strange issue. Looks like a bug in the SolrJ API:
When I try to run a search query with edismax, the "qf" field is not being encoded properly.
I am trying to use this as my "qf" value:
title^40+details_plain^20
SolrQuery.set() method adds this to the query as it is which doesn't work as it needs to be url encoded.
When I url encode it myself, it becomes:
qf=title%5E40+details_plain%5E20
However when I set that in the query, the resulting final query automatically encodes it again and makes it:
qf=title%255E40%2Bdetails_plain%255E20
Which is also wrong and the query fails saying "undefined field text" because Solr doesnt know what I want to search for so it tried to search on the default "text" field.
Here is a snippet from the code:
SolrClient solr=null;
SolrQuery query = new SolrQuery();
solr = new CloudSolrClient(zookeepers, "/" );
query.set("deftype", searchConfig.getDeftype());
//query.set("df", "details_plain"); //unless i uncomment it the query fails as qf is not correct
query.set("fl", searchConfig.getFl());
query.set("mm", searchConfig.getMm());
query.set("qf", searchConfig.getQf());
query.set("rows", searchConfig.getRows());
query.set("q", searchPhrase);
query.set("collection", searchConfig.getCollection_name());
query.set("indent", "on");
query.set("omitHeader", "true");
query.set("wt", "json");
QueryResponse response = solr.query(query);
Why doesn't it encode the original string, but encodes it again if I send it as an encoded string?
I might be overlooking something so let me know what you all think. Am I doing something wrong or should I just get Solr source code and try to fix this myself?
As far as I can remember you should not encode yourself any field. The encode/decode part is transparently handled by solrj.
Solved. Posting the solution here for anyone who might be unfortunate enough to have made the same silly mistake that I did.
The problem was in this line:
query.set("deftype", searchConfig.getDeftype());
the parameter name should be "defType" with a capital T instead of a small t like:
query.set("defType", searchConfig.getDeftype());
Ideally in such services parameter names should be all lowercase so as not to waste peoples time in issues like this but it is what it is. Maybe in a another SOLR version they will make the parameters name ignore case. One can hope!
I am trying to get the results from a Solr query, doing a simple /select?q=id:xx
The problem is that its not returning anything when i use solr directly, but when i use SolrJ, like:
SolrQuery query = new SolrQuery();
query.setQuery(queryStr);
query.setRows(10);
QueryResponse rsp = solrServer.getSolrServer().query(query);
It returns the document added with no problem.
How is that possible, i was thinking perhaps the SolrJ its sending an extra parameter internally but i couldnt find it.
I am using Solr 4.2.1
After doing some test i solved the problem, i had to use HttpSolrServer, instead of EmbeddedSolrServer, it seems EmbeddedSolrServer use their own data somehow, so i was managing 2 different datas.
Using HttpSolrServer was the solution.