can Lucene be used to search inside db? - java

Can we use Lucene to search text stored in DB?
I saw this article that shows how to use it for normal articles stored as files
http://javatechniques.com/blog/lucene-in-memory-text-search-example/
Can someone suggest?

Look at the below question from their FAQ. If you are using Hibernate then I recommend you to consider Hibernate Search.
How can I use Lucene to index a database?

You should use the Compass Framework. It's built upon Lucene and integrates nicely with several ORMs
Update: you should now use ElasticSearch instead (thanks Pangea)

Can we use Lucene to search text stored in DB?
Yes, you can. Lucene is able to read different kind of database-tables (like mysql,etc). In order to search stored text in an DB, lucene needs to index all the data you like to search.
But don't forgett: lucene is just an index. To access lucene - that mens to search inseide or to start import (whatever) you need an 2nd part oft software, to "use" (control,...) the data inside lucene.
This could be solr, for example http://lucene.apache.org/solr/
On the RDBMS you don't need an fulltext index for that anymore.

Related

How to index a XML file with Lucene [duplicate]

I am new in lucene I want to indexing with lucene of large xml files(15GB) that contain plain text as well as attribute and so many xml tags. how to parse and indexing this xml file using lucene with any sample and if we use lucene we need any database
How to parse and index huge xml file using lucene ? Any sample or links would be helpful to me to understand the process. Another one, if I use lucene, will I need any database, as I have seen and done indexing with Databases..
Your indexing would be build as you would have done using a database, just iterate through all data you want to index and write it to the index. Just go with the XmlReader class to parse your xml in a forward-only fashion. You will, just as with a database, need to index some kind of primary-key so you know what the search result represents.
A database helps when it comes to looking up the indexed data from the primary-key. It will be messy to read the data for a primary-key if you need to iterate a 15 GiB xml file at every request.
A database is not required, but it helps a lot. I would build this as an import tool that reads your xml, dumps it into your database, and then use your "normal" database indexing code you've built before.
You might like to look at Michael Sokolov's Lux product, which combines Lucene and Saxon:
http://www.mail-archive.com/solr-user#lucene.apache.org/msg84102.html
I haven't used it myself and can't claim to fully understand its capabilities.

Performing fuzzy matching

In my requirement i need to match names from table to another table.
Source table might contain names as Tony, Bill, Rob
Target Table might contain names as Anthony, William, Robert
Basically source table may contain nick/short names.
Is there any fuzzy logic/AI tool available in Java/SQL to perform such matches.
I know it can be done using SQL sever fuzzy logic package, but this package comes with SQL server Enterprise Edition, and my client doesnt want to upgrade to it.
Is there any other alternative, preferably open source/free of cost.
Search frameworks like Apache Lucene provides Fuzzy matching queries. Try FuzzyQuery from Lucene

Querying a Lucene index file

I'm trying to query a Lucene index file through QueryParser. However I would like to see the format of the index file before querying it. Is there a way to lookup the structure of a Lucene index file, sort of like how I'm able to lookup the structure of a regular SQL table?
The reason is that I haven't built this index file myself and would like to get my way around it before querying it.
You can use Luke, or programmatically IndexReader.getFieldNames().
Luke - Lucene Index Toolbox
Luke is a handy development and diagnostic tool, which accesses already existing Lucene indexes and allows you to display and modify their content in several ways

How to use Lucene for getting tags for tagcloud?

I'm a Lucene newbie and I have it working where I can create an index from a column of a database table storing tags, and search this index for a particular tag and display the results and their scores. I'm looking at this as an alternative to a MySQL fulltext index which I've heard mixed comments about.
But, what I want is to get from the most popular tags in the index along with their counts and then use this data to create a tagcloud.
Does anyone know if and how Lucene can be queried to get the most popular tags in an index and their counts at all?
Thanks
Mr Morgan.
very detailed tutorial
basically you get all the terms from the document, the get the term frequency

Caching solutions and Querying

Are there any in-memory/caching solutions for java that allow for a form of Querying for specific attributes of objects in the Cache?
I realize this is something that a full blown database would be used for, but I want to be able to have the speed/performance of a cache with the Querying ability of a database.
JBoss Cache has search functionality. It's called JBossCacheSearchable. From the site:
This is the integration package
between JBoss Cache and Hibernate
Search.
The goal is to add search capabilities
to JBoss Cache. We achieve this by
using Hibernate Search to index user
objects as they are added to the cache
and modified. The cache is queried by
passing in a valid Apache Lucene query
which is then used to search through
the indexes and retrieve matching
objects from the cache.
Main JBoss Cache page: http://www.jboss.org/jbosscache/
JBossCacheSearch: http://www.jboss.org/community/docs/DOC-10286
Nowadays the answer should be updated to Infinispan, the successor of JBoss Cache and having much improved Search technology.
Terracotta or GBeans or POJOCache
At first, HSQLDB came to mind, but that's an in-memory relational database rather than an object database. Might want to look at this list. There's a few object databases there, one of which might meet your needs.
Look at db4oat rather lightweight java object database. You can even query the data using regular java code:
List students = database.query( new Predicate(){
public boolean match(Student student){
return student.getAge() < 20
&& student.getGrade().equals(gradeA);}})
(From this article).
Another idea is to use Lucene and a RAMDirectory implementation of Directory to index what you put into your cache. That way, you can query using all the search engine query features which Lucene provides.
In your case, you will probably index the relevant properties of your objects as-is (without using an Analyzer) and query using a boolean equality operator.
Lucene is very lightweight, performant, thread-safe and memory consumption is low.
You might want to check out this library:
http://casperdatasets.googlecode.com
this is a dataset technology. it supports tabular data (either from a database or constructed in code), and you can then construct queries and filters against the dataset (and sort), all in-memory. its fast and easy-to-use. MOST IMPORTANTLY, you can perform queries against ANY column or attribute on the dataset.

Categories