Querying a Lucene index file - java

I'm trying to query a Lucene index file through QueryParser. However I would like to see the format of the index file before querying it. Is there a way to lookup the structure of a Lucene index file, sort of like how I'm able to lookup the structure of a regular SQL table?
The reason is that I haven't built this index file myself and would like to get my way around it before querying it.

You can use Luke, or programmatically IndexReader.getFieldNames().

Luke - Lucene Index Toolbox
Luke is a handy development and diagnostic tool, which accesses already existing Lucene indexes and allows you to display and modify their content in several ways

Related

How to index a XML file with Lucene [duplicate]

I am new in lucene I want to indexing with lucene of large xml files(15GB) that contain plain text as well as attribute and so many xml tags. how to parse and indexing this xml file using lucene with any sample and if we use lucene we need any database
How to parse and index huge xml file using lucene ? Any sample or links would be helpful to me to understand the process. Another one, if I use lucene, will I need any database, as I have seen and done indexing with Databases..
Your indexing would be build as you would have done using a database, just iterate through all data you want to index and write it to the index. Just go with the XmlReader class to parse your xml in a forward-only fashion. You will, just as with a database, need to index some kind of primary-key so you know what the search result represents.
A database helps when it comes to looking up the indexed data from the primary-key. It will be messy to read the data for a primary-key if you need to iterate a 15 GiB xml file at every request.
A database is not required, but it helps a lot. I would build this as an import tool that reads your xml, dumps it into your database, and then use your "normal" database indexing code you've built before.
You might like to look at Michael Sokolov's Lux product, which combines Lucene and Saxon:
http://www.mail-archive.com/solr-user#lucene.apache.org/msg84102.html
I haven't used it myself and can't claim to fully understand its capabilities.

Any impact of changing the index in cloudant

I have cloudant database with some already populated documents in use... I'm using a cloudant java client to fetch data from that. I plan to change the indexes that are used currently. Basically I plan to change over from using createIndex() to https://github.com/cloudant/java-cloudant#cloudant-search. Also would like to change the fields on which the documents are indexed.
Would changing the index impact the underlying data or cause any migration issues with existing data when I start to use the new Index?
It sounds like you want to change from using Cloudant Query to Cloudant Search. This should be straight forward and safe.
Adding a new index will not change or affect the existing data -- the main thing to be careful of is not deleting your old index before you've migrated your code. The easiest way to do this is by using a new design document for your new search indexes:
Create a new design document containing your search index and upload it to Cloudant (https://github.com/cloudant/java-cloudant#creating-a-search-index).
Migrate your app to use the new search index.
(Optionally) remove the design document containing the indexes that you no longer need. Cloudant will then clean up the index files that are no longer needed (https://github.com/cloudant/java-cloudant#comcloudantclientapidatabaseremovedoc-idrev-id).
I included links to the relevant parts of the Java API, but obviously you could do this through the dashboard.

can Lucene be used to search inside db?

Can we use Lucene to search text stored in DB?
I saw this article that shows how to use it for normal articles stored as files
http://javatechniques.com/blog/lucene-in-memory-text-search-example/
Can someone suggest?
Look at the below question from their FAQ. If you are using Hibernate then I recommend you to consider Hibernate Search.
How can I use Lucene to index a database?
You should use the Compass Framework. It's built upon Lucene and integrates nicely with several ORMs
Update: you should now use ElasticSearch instead (thanks Pangea)
Can we use Lucene to search text stored in DB?
Yes, you can. Lucene is able to read different kind of database-tables (like mysql,etc). In order to search stored text in an DB, lucene needs to index all the data you like to search.
But don't forgett: lucene is just an index. To access lucene - that mens to search inseide or to start import (whatever) you need an 2nd part oft software, to "use" (control,...) the data inside lucene.
This could be solr, for example http://lucene.apache.org/solr/
On the RDBMS you don't need an fulltext index for that anymore.

Apache Solr search is not displaying indexed result

Although the data from mysql is properly indexed using dataimport handler but while searching through solr admin it shows zero result, please suggest whats the problem would be
right side display that indexing have been done completely, but not displaying in search results.
and when I used to search "programing" it display
I would suggest using Luke to examine your index contents and verify that you have indexed your data correctly. http://www.getopt.org/luke/
There's also a debugging tab in Solr for queries to determine whether or not your query would actually match given the way that you have setup your tokenizers.

How to use Lucene for getting tags for tagcloud?

I'm a Lucene newbie and I have it working where I can create an index from a column of a database table storing tags, and search this index for a particular tag and display the results and their scores. I'm looking at this as an alternative to a MySQL fulltext index which I've heard mixed comments about.
But, what I want is to get from the most popular tags in the index along with their counts and then use this data to create a tagcloud.
Does anyone know if and how Lucene can be queried to get the most popular tags in an index and their counts at all?
Thanks
Mr Morgan.
very detailed tutorial
basically you get all the terms from the document, the get the term frequency

Categories