Solr range not filtering - java

I am trying to use price.facet.range, but it doesn't work, it always returns unfiltered results
indexing:
<field name="price" type="sdouble" indexed="true" stored="false" multiValued="true" />
solrDoc.addField("price", 22.99);
creating query:
query.addNumericRangeFacet("price", 0.0 ,100.0, 0.01);
created query:
q=mobile+phone&
fl=productId+score&
payload=true&
payload.fl=full_text&
facet.range=price&
f.price.facet.range.start=0.0&
f.price.facet.range.end=100.0&
f.price.facet.range.gap=0.01&
facet=true&
/...

I found solution
query.addFilterQuery("price:[0 100]");

Faceting always operates on filtered results ('fq' & 'q' params), unless you explicitly exclude a filter query which you didn't. Can you reproduce the problem with the exampledocs data in Solr, or with a setup that is easy for another person to verify?

Related

Merge/combine BaseX databases with upserts in memory constrained environment

I have two databases in BaseX, source_db and target_db, and would like to merge them by matching on the id attribute of each element and upserting the element with a replace or an insert depending on whether the element was found in the target_db. source_db has about 100,000 elements, and target_db has about 1,000,000 elements.
<!-- source_db contents -->
<root>
<element id="1">
<element id="2">
</root>
<!-- target_db contents -->
<root>
<element id="1">
</root>
My query to merge the two databases looks like this:
for $e in (db:open("source_db")/root/element)
return (
if (exists(db:open("target_db")/root/element[#id = data($e/#id)]))
then replace node db:open("target_db")/root/element[#id = data($e/#id)] with $e
else insert node $e into db:open("target_db")/root
)
When running the query, however, I keep getting memory constraint errors. Using a POST request to BaseX's REST interface I get Out of Main Memory and using the BaseX java client I get java.io.IOException: GC overhead limit exceeded.
Ideally I would like to just process one element from source_db at a time to avoid memory issues, but it seems like my query isn't doing this. I've tried using the db:copynode false pragma but it did not make a difference. Is there any way to accomplish this?

How to map DateRangeField correctly to a Bean in SolrJ?

I'm asking myself how to map a DateRangeField in Solr's Index to a field in Java.
I'm trying this in Java:
#Field("validity_period")
private List<Date> validity_period;
And in the schema:
<field name="validity_period" type="my_daterangefield" multiValued="true" indexed="true" stored="true"/>
<fieldType name="my_daterangefield" class="solr.DateRangeField" multiValued="true"/>
If I index a sample it looks like this in the index:
"validity_period":["2018-04-19T22:00:00Z", "2022-09-23T22:00:00Z"],
This doesn't look like its correct. Shouldn't it be stored in a format like [from TO to]?
Querying it doesn't work either. For example searching for all documents that are valid today.
Is List<Date> really the right data type in java for DateRangeField? I also tried simple String and format it into this shape [from TO to] but didn't work.

Why do my AppEngine queries throw DatastoreNeedIndexExceptions although the required indices are 'serving'?

I am running a multi-tenant java high-replication web application on Google AppEngine. The application successfully uses multi-property indices (configured within the datastore-indexes.xml file). Well, at least up until now...
Since today there is at least one namespace that throws DatastoreNeedIndexExceptions when executing a query. The curious thing is that the same query works in other namespaces.
Here is the index configuration from the datastore-indexes.xml and the index status from the admin panel:
<?xml version="1.0" encoding="utf-8"?>
<datastore-indexes autoGenerate="false">
<datastore-index kind="Index_Asc_Asc_Asc_Asc" ancestor="false" source="manual">
<property name="components" direction="asc"/>
<property name="component_0" direction="asc"/>
<property name="component_1" direction="asc"/>
<property name="component_2" direction="asc"/>
<property name="component_3" direction="asc"/>
</datastore-index>
</datastore-indexes>
The corresponding query looks like this:
SELECT __key__ FROM Index_Asc_Asc_Asc_Asc WHERE components = '12340987hja' AND component_0 = 'asdfeawsefad' AND component_1 = '4FnlnSYiJuo25mNU' AND component_3 = 'cvxyvsdfsa' AND component_2 >= 0
When I execute this query within my application or the admin panel datastore view App Engine throws a DatastoreNeedIndexException with the following recommendation. Again, the same query works in other namespaces:
The suggested index for this query is:
<datastore-index kind="Index_Asc_Asc_Asc_Asc" ancestor="false">
<property name="component_0" direction="asc" />
<property name="component_1" direction="asc" />
<property name="component_3" direction="asc" />
<property name="components" direction="asc" />
<property name="component_2" direction="asc" />
</datastore-index>
Investigations:
I have tried to set autoGenerate="true", but I do get the same error and no new indexes have been added.
I have tried to execute the query in newly created namespaces: No problems.
The error does not occur in the development server.
Is there something I am missing? Has anyone else experienced the same problem? Why is the same query working in other namespaces but not in that one?
Thanksalot!
Tim is right. To help clarify the point you need to understand how datastore works.
Basically all datastore reads need to be sequential in the index you are looking at. In other words they need to be in adjacent rows. This is how datastore gains speed and how it can be sharded across multiple machines. (there are some exceptions for equality matching but just accept that smart people figured that one out for us for now).
So looking at a set of data with a num column, and alpha column and an id column like the following:
id Num Alpha
---------------------
1 1 A
2 1 Z
3 4 A
... ... ... <-- lots of data
100004 2 Z
100005 1 C
So when datastore comes through a query like yours it will look at the precomputed index and find the starting point of matches. It will then read until the rows no longer match the query. It never does a join like you are used to in SQL. The closes thing is a zipper merge which only applied to equality operators. ROWS MUST BE ADJACENT IN THE INDEX!
So index num asc, alpha asc looks like:
id Num Alpha
---------------------
... ... ... <- negative numbers
1 1 A
100005 1 C
2 1 Z
100004 2 Z
3 4 A
... ... ... <-- lots of data (assume all other num values were above 5)
and index alpha asc, num asc looks like:
id Num Alpha
---------------------
1 A 1
3 A 4
... ... ... <-- lots of data
100005 C 1
... ... ... <-- lots of data
2 Z 1
100004 Z 2
... ... ... <-- lots of data
This allows datastore to quickly zip through your data to get an answer very fast. It can then use the id to look up the rest of that row's data.
If for example you tried to look at all of the num=1 and wanted all alpha's ordered sequentially it would have to read all of the num=1 rows into memory (which could be 100s of millions of rows) then sort them based on A. Here it's all precomputed and much faster. This allows for far more throughput on reads. It's probably overkill for your application but the idea is that your app can scale to huge sizes this way.
Hope that made sense.

How to create a composite Key Field in Apache Solr?

I have an Apache Solr 3.5 setup that has a SchemaXml like this:
<field name="appid" type="string" indexed="true" stored="true" required="true"/>
<field name="docid" type="string" indexed="true" stored="true" required="true"/>
What I would need is a field that concatenates them together and uses that as <uniqueKey>. There seems nothing built-in, short of creating a multi-valued id field and using <copyField>, but it seems uniqueKey requires a single-valued field.
The only reason I need this is to allow clients to blindly fire <add> calls and have Solr figure out if it's an addition or update. Therefore, I don't care too much how the ID looks like.
I assume I would have to write my own Analyzer or Tokenizer? I'm just starting out learning Solr, so I'm not 100% sure what I'd actually need and would appreciate any hints towards what I need to implement.
I would personally give that burden to the users, since it's pretty easy for them adding a field to each document.
Otherwise, you would have to write a few lines of code I guess. You could write your own UpdateRequestProcessorFactory which adds the new field automatically to every input document based on the value of other existing fields. You can use a separator and keep it single value.
On your UpdateRequestProcessor you should override the processAdd method like this:
#Override
public void processAdd(AddUpdateCommand cmd) throws IOException {
SolrInputDocument doc = cmd.getSolrInputDocument();
String appid = (String)doc.getFieldValue( "appid" );
String docid = (String)doc.getFieldValue( "docid" );
doc.addField("uniqueid", appid + "-" + docid);
// pass it up the chain
super.processAdd(cmd);
}
Then you should add your UpdateProcessor to your customized updateRequestProcessorChain as the first processor in the chain (solrconfig.xml):
<updateRequestProcessorChain name="mychain" >
<processor class="my.package.MyUpdateRequestProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
<processor class="solr.LogUpdateProcessorFactory" />
</updateRequestProcessorChain>
Hope it works, I haven't tried it. I already did something like this but not with uniqueKey or required fields, that's the only problem you could find. But I guess if you put the updateProcessor at the beginning of the chain, it should work.

Inconsistent Apache Solr query results

I'm new to Apache Solr and trying to make a query using search terms against a field called "normalizedContents" and of type "text".
All of the search terms must exist in the field. Problem is, I'm getting inconsistent results.
For example, the solr index has only one document with normalizedContents field with value = "EDOUARD SERGE WILFRID EDOS0004 UNE MENTION COMPLEMENTAIRE"
I tried these queries in solr's web interface:
normalizedContents:(edouard AND une) returns the result
normalizedContents:(edouar* AND une) returns the result
normalizedContents:(EDOUAR* AND une) doesn't return anything
normalizedContents:(edouar AND une) doesn't return anything
normalizedContents:(edouar* AND un) returns the result (although there's no "un" word)
normalizedContents:(edouar* AND uned) returns the result (although there's no "uned" word)
Here's the declaration of normalizedContents in schema.xml:
<field name="normalizedContents" type="text" indexed="true" stored="true" multiValued="false"/>
So, wildcards and AND operator do not follow the expected behavior. What am I doing wrong ?
Thanks.
By default the field type text does stemming on the content (solr.SnowballPorterFilterFactory). Thus 'un' and 'uned' match une. Then you might not have the solr.LowerCaseFilterFactory filter on both, query and index analyzer, therefore EDUAR* does not match. And the 4th doesnt match as edouard is not stemmed to edouar. If you want exact matches, you should copy the data in another field that has a type with a more limited set of filters. E.g. only a solr.WhitespaceTokenizerFactory
Posting the <fieldType name="text"> section from your schema might be helpful to understand everything.

Categories