How to create a composite Key Field in Apache Solr?

How to create a composite Key Field in Apache Solr? - java

I have an Apache Solr 3.5 setup that has a SchemaXml like this:
<field name="appid" type="string" indexed="true" stored="true" required="true"/>
<field name="docid" type="string" indexed="true" stored="true" required="true"/>
What I would need is a field that concatenates them together and uses that as <uniqueKey>. There seems nothing built-in, short of creating a multi-valued id field and using <copyField>, but it seems uniqueKey requires a single-valued field.
The only reason I need this is to allow clients to blindly fire <add> calls and have Solr figure out if it's an addition or update. Therefore, I don't care too much how the ID looks like.
I assume I would have to write my own Analyzer or Tokenizer? I'm just starting out learning Solr, so I'm not 100% sure what I'd actually need and would appreciate any hints towards what I need to implement.

I would personally give that burden to the users, since it's pretty easy for them adding a field to each document.
Otherwise, you would have to write a few lines of code I guess. You could write your own UpdateRequestProcessorFactory which adds the new field automatically to every input document based on the value of other existing fields. You can use a separator and keep it single value.
On your UpdateRequestProcessor you should override the processAdd method like this:
#Override
public void processAdd(AddUpdateCommand cmd) throws IOException {
SolrInputDocument doc = cmd.getSolrInputDocument();
String appid = (String)doc.getFieldValue( "appid" );
String docid = (String)doc.getFieldValue( "docid" );
doc.addField("uniqueid", appid + "-" + docid);
// pass it up the chain
super.processAdd(cmd);
}
Then you should add your UpdateProcessor to your customized updateRequestProcessorChain as the first processor in the chain (solrconfig.xml):
<updateRequestProcessorChain name="mychain" >
<processor class="my.package.MyUpdateRequestProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
<processor class="solr.LogUpdateProcessorFactory" />
</updateRequestProcessorChain>
Hope it works, I haven't tried it. I already did something like this but not with uniqueKey or required fields, that's the only problem you could find. But I guess if you put the updateProcessor at the beginning of the chain, it should work.

Related

How to map DateRangeField correctly to a Bean in SolrJ?

I'm asking myself how to map a DateRangeField in Solr's Index to a field in Java.
I'm trying this in Java:
#Field("validity_period")
private List<Date> validity_period;
And in the schema:
<field name="validity_period" type="my_daterangefield" multiValued="true" indexed="true" stored="true"/>
<fieldType name="my_daterangefield" class="solr.DateRangeField" multiValued="true"/>
If I index a sample it looks like this in the index:
"validity_period":["2018-04-19T22:00:00Z", "2022-09-23T22:00:00Z"],
This doesn't look like its correct. Shouldn't it be stored in a format like [from TO to]?
Querying it doesn't work either. For example searching for all documents that are valid today.
Is List<Date> really the right data type in java for DateRangeField? I also tried simple String and format it into this shape [from TO to] but didn't work.

How to configure POJO so it index integer value as text instead of integer?

I am trying to load data in apache solr using code. Being relatively new to this i wrote the following Code to load the data but when i see the data in solr it has indexed the integer values on integer which it should do by default i want it to be indexed as text. How i can do the same?
public class ARHts {
#Field("section")
private String section;
#Field("hts_no")
private String hsNo;
#Field("description")
private String description;
}
here hts_no is getting indexed as integer i want it to be indexed as text.

You'll usually have to do that on the server side (might differ in some larger implementations, I'm not sure what the spring-boot-implementation allows) by defining the field hts_no as a string field (or as a text field if you want to do non-exact searches as well) before starting to index your content.
You can use the Schema API to define a field and its type:
curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-field":{
"name": "hts_no",
"type": "string",
"stored": true,
"indexed": true }
}' http://localhost:8983/solr/collectionname/schema
.. or you can use the older schema.xml configuration (be aware that this may have been copied over to managed-schema).
<field name="hts_no" type="string" indexed="true" stored="true" />

Convert External format to Internal format

I need a solution for this problem:
I'm a service provider and there are several service clients who I work with.
each service client sends me request by its own format, for instance:
service client 1 fields are --> f1 , f2 , f3
service client 2 fields are --> f2 , f3 , f4
service client 3 fields are --> f3 , f7 , f8
it is possible they add or remove new fields or change their current format, for example "service client 1" combines:
f1+f2 ==> f12 and adds f5
or client 3 :
decomposes f7 ---> f1,f2
I need an internal format for myself, for instance :
f1,f2,f3,f4,f5,f6,f7,f8,f9
this format should be configurable in a way that I can change it by xml configuration file so when a change happens on client side I fix it by changing xml without changing source code.
How can I do that?

Briefly, you need an API to which hand over your external message plus a “how to” file and it does some magic on it and deliver you the internal message. Let’s focus on the main duty of the API which is message conversion. As you mentioned it should be configurable by an XML config file. We need an element that can be called “Field” which has at least one attribute which I call “name”. I wrap a collection of these Field elements within a parent element. Every one of “Field” elements designates a field in the target internal message. Within the Field element I like to add another element which is responsible to gather my desired fields and does a function on them. Here’s a sample of an XML config:
<fields>
<field name="aLong">
<function name="add">
<arg>
<function name="readExternalField">
<arg>
f1
</arg>
</function>
</arg>
<arg>
<function name="readExternalField">
<arg>
f2
</arg>
</function>
</arg>
</function>
</field>
<field name="aStr">
<function name="getFromArray" index="0">
<arg>
<function name="splitStr" character=" ">
<arg>
<function name="readExternalField">
<arg>
f3
</arg>
</function>
</arg>
</function>
<arg>
</function>
</field>
</fields>
Imagine we have an internal object which has at least two fields called “aLong” and “aStr” and an external object which has at least three fields: “f1”, “f2”and “f3”. The point is I must make sure of using the functions that their return types are assignable to the target fields. The function “add” adds the value of fields “f1” and “f2” and the result must be assigned to the field “aLong” and the function “splitStr” splits the “f3” field and returns an array which the function “getFromArray” gets the first item of the array as the result. I prefer to use the JAXB API to unmarshal my XML file and to parse it easily, so we need an XSD document which can be generated from the XML file through the online tools. I suggest to utilize map based objects to eliminate the need of doing the reflection stuff. If you develop REST services, the received JSON message can be converted to a map object. In this way your API has a method which receives a map based object and returns the same. So, every field is a key in the map not a field in a class. But the functions can have certain parameters with specific types. The main body of the API must cast the got objects from the external map before passing them to the functions and put the returned value to the internal message with the specified field name in the XML file.
I hope this brief answer illuminates the way to a satisfactory solution and remember that writing an efficient API which you can share with your colleagues proudly is a skill and only practice gives it to you.

Solr range not filtering

I am trying to use price.facet.range, but it doesn't work, it always returns unfiltered results
indexing:
<field name="price" type="sdouble" indexed="true" stored="false" multiValued="true" />
solrDoc.addField("price", 22.99);
creating query:
query.addNumericRangeFacet("price", 0.0 ,100.0, 0.01);
created query:
q=mobile+phone&
fl=productId+score&
payload=true&
payload.fl=full_text&
facet.range=price&
f.price.facet.range.start=0.0&
f.price.facet.range.end=100.0&
f.price.facet.range.gap=0.01&
facet=true&
/...

I found solution
query.addFilterQuery("price:[0 100]");

Faceting always operates on filtered results ('fq' & 'q' params), unless you explicitly exclude a filter query which you didn't. Can you reproduce the problem with the exampledocs data in Solr, or with a setup that is easy for another person to verify?

Inconsistent Apache Solr query results

I'm new to Apache Solr and trying to make a query using search terms against a field called "normalizedContents" and of type "text".
All of the search terms must exist in the field. Problem is, I'm getting inconsistent results.
For example, the solr index has only one document with normalizedContents field with value = "EDOUARD SERGE WILFRID EDOS0004 UNE MENTION COMPLEMENTAIRE"
I tried these queries in solr's web interface:
normalizedContents:(edouard AND une) returns the result
normalizedContents:(edouar* AND une) returns the result
normalizedContents:(EDOUAR* AND une) doesn't return anything
normalizedContents:(edouar AND une) doesn't return anything
normalizedContents:(edouar* AND un) returns the result (although there's no "un" word)
normalizedContents:(edouar* AND uned) returns the result (although there's no "uned" word)
Here's the declaration of normalizedContents in schema.xml:
<field name="normalizedContents" type="text" indexed="true" stored="true" multiValued="false"/>
So, wildcards and AND operator do not follow the expected behavior. What am I doing wrong ?
Thanks.

By default the field type text does stemming on the content (solr.SnowballPorterFilterFactory). Thus 'un' and 'uned' match une. Then you might not have the solr.LowerCaseFilterFactory filter on both, query and index analyzer, therefore EDUAR* does not match. And the 4th doesnt match as edouard is not stemmed to edouar. If you want exact matches, you should copy the data in another field that has a type with a more limited set of filters. E.g. only a solr.WhitespaceTokenizerFactory
Posting the <fieldType name="text"> section from your schema might be helpful to understand everything.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to create a composite Key Field in Apache Solr? - java

Related

How to map DateRangeField correctly to a Bean in SolrJ?

How to configure POJO so it index integer value as text instead of integer?

Convert External format to Internal format

Solr range not filtering

Inconsistent Apache Solr query results

Categories

Resources