Can lucene only sort and search for nothing?

Can lucene only sort and search for nothing? - java

I want to list the lastest 10 rows order by id DESC
Sort sort = new Sort(new SortField[]{new SortField("id",SortField.INT,true)});
TopDocs topDocs=indexSearch.search(null,null,10,sort);//no need Query,only sort
...
I got a 500 exception because the Query parameter is null
How can I implement it in a best way?
btw:id field is a NumericField,write using:
new NumericField("id",Integer.MAX_VALUE,Field.Store.YES,true)

You should use the MatchAllDocsQuery for that.
Lucene Query is a peculiar object that isn't only the specification of the query semantics, but also the implementation of the most efficient execution strategy for each particular query type. That's why there must be a special Query even for this "no-op"

BTW: if you want to search the latest X rows it's better you add a new date field with the time this doc was added to repository and not to rely on the counter (id on your case).
try to think what happen if you update an existed doc or you reach Integer.MAX_VALUE

Related

Hibernate criteria and comparator

I'm using hibernate template and it's
findByCriteria(criteria, offset, maxResults) method
to get results paginated.
To get results ordered before calling findByCriteria
I set in criteria OrderBy property. The problem is
a want to order this column not just as simple string, but take into
account that it may contain numbers and sort it in alphanumeric way:
entity 2
entity 19
entity 22
not like this:
entity 19
entity 2
entity 22
To do this I'm using comparator it works ok with Collections.sort. But I need a way to bind it to criteria and get already ordered after firing findByCriteria method. Is there is a way to accomplish this?
Thanks!

I think what you need is to sort in your criteria query the elements itself by some "criteria". Basically you will be using a SQL query (build internally by Hibernate when you specify your order criteria) to sort the elements themselves, instead of sorting them after being retrieved> You will end up having a better performant, cleaner and straightforward solution.
Remember you are fetching a chunk of data every time and you can't tell how to sort the whole set, but that small one you already have plus the previous ones...what you might get (in the next ones) is foreseen.

Sort the values by Date in mongodb

I am new to mongodb and I am trying to sort all my rows by date. I have records from mixed sources and I trying to sort it separately. I didn't update the dateCreated while writing into db for some records. Later I found and I added dateCreated to all my records in the db. Say I have total of 4000 records, first 1000 I don't have dateCreated. Latest 3000 has that column. Here I am trying to get the last Updated record using dateCreated column. Here is my code.
db.person.find({"source":"Naukri"}&{dateCreated:{$exists:true}}).sort({dateCreated: 1}).limit(10)
This code retruns me some results (from that 1000 records) where I can't see that dateCreated column at all. Moreover if I change (-1) here {dateCreated: -1} I am getting results from some other source, but not Naukri.
So I need help this cases,
How do I sort by dateCreated to get the latest updated record and by sources also.
I am using Java API to get the records from Mongo. I'd be grateful if someone helps me to find how I will use the same query with java also.
Hope my question is clear. Thanks in advance.

From the documentation you will (and you will, won't you - nod yes) read, you will find that the first argument to the find command you are using is what is called a query document. In this document you are specifying a list of fields and conditions, "comma" separated, which is the equivalent of an and condition in declarative syntax such as SQL.
The problem with your query is it was not valid, and did not match anything. The correct syntax would be as follows:
db.person.find({"source":"Naukri", dateCreated:{$exists:true}})
.sort({dateCreated: -1})
.limit(10)
So now this will filter by the value provided for "source" and where the "dateCreated" field exists, meaning it is there and it contains something.
I recommend looking at the links below, the first of the two concerned with structuring mongoDB queries and the find method and it's arguments. All of the functionality translates to every language implementation.
As for the Java API and how to use, there are different methods depending on which you are comfortable with. The API provides a BasicDBObject class which is more or less equivalent to the JSON document notation, and is sort of a hashmap concept. For something a bit more along the lines of the shell methods and a helper to be a little more like some of the dynamic languages approach, there is the QueryBuilder class which the last two links give example and information on. These allow chaining to make your query more readable.
There are many examples on Stack Overflow alone. I suggest you take a look.
http://docs.mongodb.org/manual/tutorial/query-documents/
http://docs.mongodb.org/manual/reference/method/db.collection.find/
How to do this MongoDB query using java?
http://api.mongodb.org/java/2.2/com/mongodb/QueryBuilder.html

Your query is not correct.Update it as follows :
db.person.find({"source":"Naukri", dateCreated:{$exists:true}}).sort({dateCreated: 1}).limit(10)
In Java, you can do it as follows :
Mongo mongo = ...
DB db = mongo.getDB("yourDbName");
DBCollection coll = db.getCollection("person");
DBObject query = new BasicDBObject();
query.put("source", "Naukri");
query.put("dateCreated", new BasicDBObject($exists : true));
DBCursor cur = coll.find(query).sort(new BasicDBObject("dateCreated", 1)).limit(10);
while(cur.hasNext()) {
DBObject obj = cur.next();
// Get data from the result object here
}

How to return max of identity column via Hibernate accessing Oracle DB?

Since it seems that HQL (createQuery) doesn't support scalar queries, I used a raw query via createSQLQuery:
session.beginTransaction();
Query query = session.createSQLQuery("select nvl(max(id), 0) + 1 from empty_table_for_now");
session.getTransaction().commit();
List listL = query.list();
Iterator iterator = listL.iterator();
while (iterator.hasNext()) {
LOG.info("Infinite loop. This is disgusting!");
}
The query itself doesn't fail, but it seems to be returning a list with an infinite number of results? This doesn't make sense... Any idea why I am getting this?
Even better yet, is there a better, more elegant way, of getting the intended scalar, without having to confuse whoever maintains my code with a "list" (that should always contain one element)?

If you don't move the iterator the loop is definitely infinite (as long as there is at least one element).
Move the iterator with Iterator.next()
while (iterator.hasNext()) {
Object nextElement = iterator.next();
LOG.info("Next element is: " + nextElement);
}
Maybe you had a ResultSet.next() in mind when writing the code, but an iterator is different. Take a look at the javadocs.

When using Oracle DB, I recommend the use of sequences.
To get the sequence to work, you should define the column and the sequence name in the mapping file and Hibernate will do the job of fetching the next sequence value.
FYI:
SQL Server - Use Identity for autoincrement
MySQL/Maria DB - Use autoincrement
PostgreSQL - Use sequences or alternatively ..
PostgreSQL Autoincrement
Alert: Trying to find the maximum value in a column and bouncing it up is not an acceptable design.

Identify existence of keywords in document from list

I want to create a tag list for a Lucene document based on a pre-determined list.
So, if we have a document with the text
Looking for a Java programmer with experience in Lucene
and we have the keyword list (about 1000 items)
java, php, lucene, c# [...]
I want to identify that the keywords Java and Lucene exist in the document.
Just doing a java OR php OR lucene will not work because then I will not know which keyword generated the hit.
Any suggestions on how to implement this in Lucene?

I assume that you have one or more indexed fields, and you want to build your tag cloud based on the intersection of your keywords and the indexed terms for a document.
Your problem is very similar to highlighting, so the same ideas apply, you can either:
re-analyze the stored fields of your Lucene document,
use term vectors for fast access to your documents' stored fields.
Note that if you want to use term vectors, you need to enable them at compile time (see Field.TermVector.YES documentation and Field constructor).

Yes, this works
FullTextSession fts = Search.getFullTextSession(getSessionFactory().getCurrentSession());
Query q = fts.getSearchFactory().buildQueryBuilder()
.forEntity(Offer.class).get()
.keyword()
.onField("id")
.matching(myId)
.createQuery();
Object[] dId = (Object[]) fts.createFullTextQuery(q, Offer.class)
.setProjection(ProjectionConstants.DOCUMENT_ID)
.uniqueResult();
if(dId != null){
IndexReader indexReader = fts.getSearchFactory().getIndexReaderAccessor().open(Offer.class);
TermFreqVector freq = indexReader.getTermFreqVector((Integer) dId[0], "description");
}
You have to remember to index the field with TermVector.YES in your hibernate search annotation for the field.

Reverse search in Hibernate Search

I'm using Hibernate Search (which uses Lucene) for searching some Data I have indexed in a directory. It works fine but I need to do a reverse search. By reverse search I mean that I have a list of queries stored in my database I need to check which one of these queries match with a Data object each time Data Object is created. I need it to alert the user when a Data Object matches with a Query he has created. So I need to index this single Data Object which has just been created and see which queries of my list has this object as a result.
I've seen Lucene MemoryIndex Class to create an index in memory so I can do something like this example for every query in a list (though iterating in a Java list of queries would not be very efficient):
//Iterating over my list<Query>
MemoryIndex index = new MemoryIndex();
//Add all fields
index.addField("myField", "myFieldData", analyzer);
...
QueryParser parser = new QueryParser("myField", analyzer);
float score = index.search(query);
if (score > 0.0f) {
System.out.println("it's a match");
} else {
System.out.println("no match found");
}
The problem here is that this Data Class has several Hibernate Search Annotations #Field,#IndexedEmbedded,... which indicated how fields should be indexed, so when I invoke index() method on the FullTextEntityManager instance it uses this information to index the object in the directory. Is there a similar way to index it in memory using this information?
Is there a more efficient way of doing this reverse search?

Just index the new object (if you use automatic indexing you don't have to do anything besides committing the current transaction), then retrieve the queries you want to run and run all of them in a boolean query, combining the stored query with the id of the new object. Something like this:
...
BooleanQuery query = new BooleanQuery();
query.add(storedQuery, BooleanClause.Occur.MUST);
query.add(new TermQuery(ProjectionConstants.ID, id), BooleanClause.Occur.MUST);
...
If you get a result you know the query matched.

Since MemoryIndex is a completely separate component that doesn't extend or implement Lucene's Directory or IndexReader, I don't think there's a way you can plug this into Hibernate Search Annotations. I'm guessing that if you choose to use MemoryIndex, you'll need to write your addField() calls which basically mirrors what you're doing in the annotations.
How many queries are we talking about here? Depending on how many there are you might be able to get away with just running the queries on the main index that Hibernate maintains, ensuring to constrain the search to the document ID you just added. Or for every document that's added, create a one-document in-memory index using RAMDirectory and run the queries through that.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Can lucene only sort and search for nothing? - java

BTW: if you want to search the latest X rows it's better you add a new date field with the time this doc was added to repository and not to rely on the counter (id on your case). try to think what happen if you update an existed doc or you reach Integer.MAX_VALUE

Related

Hibernate criteria and comparator

Sort the values by Date in mongodb

How to return max of identity column via Hibernate accessing Oracle DB?

Identify existence of keywords in document from list

Reverse search in Hibernate Search

Categories

Resources