Apache Solr search is not displaying indexed result - java

Although the data from mysql is properly indexed using dataimport handler but while searching through solr admin it shows zero result, please suggest whats the problem would be
right side display that indexing have been done completely, but not displaying in search results.
and when I used to search "programing" it display

I would suggest using Luke to examine your index contents and verify that you have indexed your data correctly. http://www.getopt.org/luke/
There's also a debugging tab in Solr for queries to determine whether or not your query would actually match given the way that you have setup your tokenizers.

Related

Lucene Search Luke vs Hibernate Search different result

I am running the following lucene query phrase in luke:
+(debtorNumber:10200000 originalDebtorNumber:10200000) +(serviceName:"skype
for"^840.0 (serviceName:for* serviceId:for*) (serviceName:skype*
serviceId:skype*))
shows at the beginning expected result for ex.:
Skype for Business for Managers
Microsoft Skype for Business Conferencing (Plan2)
Telephone dial-in for Skype for Business Conferencing
and so on.
The same query executed with hibernate search shows different result :/
I am getting for example the following result:
antivirus protection for your PC, notebook or server
central administration for thin clients
skype for comes on the 3rd or 4th page.
The java code is:
SearchManager = Search.getSearchManager(cache)
CacheQuery<MyType> query = searchManager.getQuery(booleanQuery, MyType.class)
List<MyType> pagedResulat = query
.maxResults(criteria.getPageSize())
.firstResult(Math.toIntExact(criteria.getOffset()))
.list()
This logs the above query which I used in Luke
log.info("Lucene Search boolean query:" + booleanQuery);
Please advise.
There might be multiple reasons for the difference, let me try compile a checklist.
Different index
The main difference I can think of is that Luke will always target a single index: the one you opened explicitly.
Hibernate Search will actually run the query on a composite view of all indexes containing MyType and indexed subclasses (and any shards you might have). Often that's just one index, but you possibly have multiple indexes opened?
That will affect the results, and definitely the scores.
Different Lucene version
Verify that the Luke version you're using is using the exact same version of Lucene.
Check the scoring
You can use a Projection query to have Infinispan Query / Hibernate Search explain the scores of all results it produced; this can be very useful to understand what is going on.
See FullTextQuery.EXPLANATION and FullTextQuery.SCORE in section Projections, and Example 105.
IndexReader
You can also use the SearchManager to get the low-level IndexReader(s) and run the query directly, by-passing Infinispan and Hibernate Search code.
SearchIntegrator si searchManager.unwrap(SearchIntegrator.class);
si.getIndexReaderAccessor(). ...
that might help narrow down which component is affecting your expected scoring.
The IndexReaderAccessor can open an index by type or by name. When opened by name it will open the single index, when opened by type it will apply the rules to satisfy polymorphic queries and might return an aggregate: might be interesting to experiment with both of them to verify they return the same results.
...and check the basics
Make sure you're opening the same physical index :-)
In particular recent versions of Infinispan might apply sharding transparently to improve data distribution in the cluster, this might be confusing when debugging scoring - especially when you're not aware of it.

When to do indexing in lucene

I have REST service which works with data from database (mongodb). I want to add apache lucene library to implement full text search.
I never used Lucene before so trying to understand how it works be checking tutorials, but still one thing is unclear for me:
When to do indexing of DB data? I have DB, some data is added and removed more often, some is updated rarely. What should be structure that I could do search requests by all up to date data.
Should I update indexes on every data update, or it will be done automatically, and enough to index once? If reindexing should be made, so how often?
If you want live data to be searched then you should add, update and delete data in lucene index at the same time you perform add, update and delete data in your database.
It will perfectly fine just for indexing but do not optimize your index for every operation.
You can optimize your index once in a day or according to your use. Optimizing index will help you for faster search result.
Refer this tutorial to just begin with basic application of lucene.
You can try MongoDBs own Feature for this (see Mongo Docs). This has probably not the flexibility and is not as mighty as Lucene, but it Comes for free.
You really asked the problematic question: "When do indexing?". And the answer depends heavy on your requirements. However, you can look at this post to see how it is technically done: offline, i.e. you will always be more or less behind in indexing.

How to index a XML file with Lucene [duplicate]

I am new in lucene I want to indexing with lucene of large xml files(15GB) that contain plain text as well as attribute and so many xml tags. how to parse and indexing this xml file using lucene with any sample and if we use lucene we need any database
How to parse and index huge xml file using lucene ? Any sample or links would be helpful to me to understand the process. Another one, if I use lucene, will I need any database, as I have seen and done indexing with Databases..
Your indexing would be build as you would have done using a database, just iterate through all data you want to index and write it to the index. Just go with the XmlReader class to parse your xml in a forward-only fashion. You will, just as with a database, need to index some kind of primary-key so you know what the search result represents.
A database helps when it comes to looking up the indexed data from the primary-key. It will be messy to read the data for a primary-key if you need to iterate a 15 GiB xml file at every request.
A database is not required, but it helps a lot. I would build this as an import tool that reads your xml, dumps it into your database, and then use your "normal" database indexing code you've built before.
You might like to look at Michael Sokolov's Lux product, which combines Lucene and Saxon:
http://www.mail-archive.com/solr-user#lucene.apache.org/msg84102.html
I haven't used it myself and can't claim to fully understand its capabilities.

Querying a Lucene index file

I'm trying to query a Lucene index file through QueryParser. However I would like to see the format of the index file before querying it. Is there a way to lookup the structure of a Lucene index file, sort of like how I'm able to lookup the structure of a regular SQL table?
The reason is that I haven't built this index file myself and would like to get my way around it before querying it.
You can use Luke, or programmatically IndexReader.getFieldNames().
Luke - Lucene Index Toolbox
Luke is a handy development and diagnostic tool, which accesses already existing Lucene indexes and allows you to display and modify their content in several ways

How to use Lucene for getting tags for tagcloud?

I'm a Lucene newbie and I have it working where I can create an index from a column of a database table storing tags, and search this index for a particular tag and display the results and their scores. I'm looking at this as an alternative to a MySQL fulltext index which I've heard mixed comments about.
But, what I want is to get from the most popular tags in the index along with their counts and then use this data to create a tagcloud.
Does anyone know if and how Lucene can be queried to get the most popular tags in an index and their counts at all?
Thanks
Mr Morgan.
very detailed tutorial
basically you get all the terms from the document, the get the term frequency

Categories