Solr search is returning partial string matches - java

Using Solr 3.6.1, I have this field in my schema.xml:
<field name="names" type="text_general" indexed="true" stored="false" multiValued="true"/>
<dynamicField name="names_*" type="text_general" indexed="true" stored="true"/>
The documentation in the schema.xml states that "text_general" should:
tokenize with StandardTokenizer
removes stop words from case-insensitive "stopwords.txt" (which is currently empty)
down cases the string.
At query time only, it also applies synonyms (which is also empty at this time)
I have two documents indexed in Solr with this data for the field:
<!-- doc 1 -->
<str name="names_data">Name ABC Dev Loc</str>
<!-- doc 2 -->
<str name="names_data">Name ABC Dev Location</str>
When I execute the following query:
id:(doc1 OR doc2) AND names:Dev+Location)
Both documents are returned. I would have expected that only doc2 would have been returned based on my understanding of how Solr's StandardTokenizer works.
Why does "Dev+Location" match "Dev Loc" and "Dev Location"?

The type text_general is probably configured to use a stemmer, which is treating Loc as a variant of Location.
You could configure the type to not use a stemmer, or try searching for the whole string using names:"Dev Location"

This might be why.
This part of the query names:Dev+Location is only searching where names:Dev since the Location term does not have a field name qualifier it is searching for Location against whatever the <defaultSearchField> is set to in schema.xml
So you could try to quote the field like names:"Dev Location" or prefix it names:Dev AND names:Location

Related

Solr: Using the Block Join Children Query Parser

Currently I evaluate the Block Join Children Query Parser as described here.
Therefore I have created the following collection:
curl "http://localhost:8983/solr/admin/collections?action=CREATE&name=nestedPerson&numShards=6"`
Then I have inserted these two documents:
curl http://localhost:8983/solr/nestedPerson/update?commitWithin=3000 -d '<add>
<doc>
<field name="id">p1</field>
<field name="deceased">false</field>
<doc>
<field name="id">c1</field>
<field name="firstName">Bob</field>
</doc>
</doc>
<doc>
<field name="id">p2</field>
<field name="deceased">true</field>
<doc>
<field name="id">c2</field>
<field name="firstName">Max</field>
</doc>
</doc>
</add>'
Now I issue this query:
{!child of="id:p1"}firstName:Bob
Unfortunately this results in this error:
"msg": "Parent query yields document which is not matched by parents filter, docID=0",
How can the parent query (I guess that the part id:p1 is meant) yield a document that is not matched by the filter?
Take a look at the Solr Wiki that you refer to again here. Note the following:
The syntax for this parser is: q={!child of=<allParents>}<someParents>. The parameter allParents is a filter that matches only parent documents
In your example, the query is {!child of="id:p1"}firstName:Bob. The field id as used in<allParents>, but id is contained in both parent and child documents.
You need to introduce a field that only parent documents have, such as <field name="content_type">parentDocument</field> from the wiki. Once all parent documents (and only parent documents) have this field, you could submit the query as:
q={!parent which="content_type:parentDocument"}firstName:Bob
This would match child documents for firstName:Bob and return their parents. In a similar fashion, use q={!child of=<allParents>}<someParents> to match parent documents and return their children.

How to properly outline an Apache Solr document?

What is the difference between delta import and full import in ApacheSolr?
What are all the naming conventions to be followed while writing deltaImportQuery and deltaQuery ( ID, TXT_ID etc), any references or tutorial explaining in detail about differences/relations between deltaImportQuery and deltaQuery, Like what it is and what it does etc, How to write deltaImportQuery and deltaQuery ?
How to configure multiple entities in one document, Suppose if there are three tables in database like T1, T2, T3, Then in schema.xml how to configure this, issue with only one <uniquekey>somename</uniquekey> been considered for each schema.xml file?
How to parse BLOB type input from mysql, following convert ( column_name using utf8 ) as alias_name solves this but what is the right convention ,some other methods are also available like using TikaEntityProcessor/ writing custom BLOBTRANSFORMERS etc ?
Just like ORM any concepts explaining how to denormalize and outline an Apache Solr document , Any showcase project including all use cases and features ?
How to configure entities like this in data-config.xml?
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/solrdemo" user="root" password="123" batchSize='1' />
<document name="user">
<entity name ="user1" query = "SELECT * FROM user1">
<field column='id' name='id1' />
</entity>
<entity name ="user2" query = "SELECT * FROM user2">
<field column='id' name='id2' />
</entity>
<entity name ="user3" query = "SELECT * FROM user3">
<field column='id' name='id3' />
</entity>
</document>
</dataConfig>
When the above kind of configuration is done then in schema.xml which id should be configured into <uniquekey></uniquekey> ?
The result of above configuration is
Indexing completed. Added/Updated: 2,866 documents. Deleted 0 documents. (Duration: 03s)
Indexing is successfully completed but 0 documents added / updated , How to resolve this issue ?
Overall any references available for proper conventions and configurations to work with Apache Solr?

Solr dismax highlighting not respecting analyzer

In the schema of Solr 3.6.2 there are two field declarations, text and exact
<field name="text" type="text" indexed="true" stored="true" />
<field name="exact" type="string" indexed="true" stored="true" />
The former using StandardTokenizer and the latter KeywordTokenizer.
Solr queries describing the problem:
?hl=true
&hl.fl=text,exact
&defType=edismax
&qf=text+exact <-------- here
&q=a-b
Highlight output for field exact:
<em>a</em>-<em>b</em>.
The problem is the summary for field exact is produced using the analyzer from text.
?hl=true
&hl.fl=text,exact
&defType=edismax
&qf=exact <-------- here
&q=a-b
Highlight output for field exact:
<em>a-b</em>.
By simply removing text from qf we get the correct analyzer, why?
With debugQuery on
+DisjunctionMaxQuery(((exact:a-b) | ((text:a text:b)~2)))
Solr highlighter after finding a match in exact also seem to match a and b only based on the presence in the query. hl.requireFieldMatch=true does seem to combat that!

Solr query only returns Id's

I have wanting to import data from a table and index it using solr..
I am using solr-tomcat admin panel.
But whenever I query it returns to me only the id's and value.
I have also tried adding FIELDS to fl , but that also does not help.
here is my data-config.xml file:
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://127.0.0.1:3306/{DB_NAME}"
user="{DB_USER}"
password="{DB_PASSS}"
/>
<document>
<entity name="id" query="select s3_location,file_name from video">
<field column="s3_location" name="s3_location"/>
<field column="file_name" name="file_name"/>
</entity>
</document>
</dataConfig>
Is there any way to get the above s3_location and file_name fields also.
You need to specify the actual field names in the fl parameter or use * to indicate all fields. Also, please note that the fields must have been defined with stored=true in your schema.xml file for them to be returned/visible during a query.
fl=id,s3_location,file_name
fl=*
Are you sure you are importing the data at all? If you start with empty index, do you get anything?
The reason I ask is because you are not mapping the id field explicitly. Now, I believe there is implicit mapping of the fields by Jdbc data source based on names, but relying on it is risky when you are just starting.
Otherwise, like Paige said, make sure you defined those fields in your schema and that they are actually stored.

How to get last indexed record in Solr?

I want to know how to get/search last indexed record in Apache Solr..?
When the existing record is updated then it goes to end of all the records...so I want to get that last indexed record.
thanks..
You could add a 'timestamp' field to your Solr schema that puts the current date/time into the record when it is added.
<field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
Then, do a sort in descending order by this field and the first record will be the latest one. A query like this should do it:-
http://localhost:8080/solr/core-name/select/q=*%3A*&start=0&rows=1&sort=timestamp+desc
You can sort the documents by the indexed order using the following query.
http://localhost:8983/solr/select?q=*:*&sort=_docid_ asc
or
http://localhost:8983/solr/select?q=*:*&sort=_docid_ desc

Categories