Sort the values by Date in mongodb - java

I am new to mongodb and I am trying to sort all my rows by date. I have records from mixed sources and I trying to sort it separately. I didn't update the dateCreated while writing into db for some records. Later I found and I added dateCreated to all my records in the db. Say I have total of 4000 records, first 1000 I don't have dateCreated. Latest 3000 has that column. Here I am trying to get the last Updated record using dateCreated column. Here is my code.
db.person.find({"source":"Naukri"}&{dateCreated:{$exists:true}}).sort({dateCreated: 1}).limit(10)
This code retruns me some results (from that 1000 records) where I can't see that dateCreated column at all. Moreover if I change (-1) here {dateCreated: -1} I am getting results from some other source, but not Naukri.
So I need help this cases,
How do I sort by dateCreated to get the latest updated record and by sources also.
I am using Java API to get the records from Mongo. I'd be grateful if someone helps me to find how I will use the same query with java also.
Hope my question is clear. Thanks in advance.

From the documentation you will (and you will, won't you - nod yes) read, you will find that the first argument to the find command you are using is what is called a query document. In this document you are specifying a list of fields and conditions, "comma" separated, which is the equivalent of an and condition in declarative syntax such as SQL.
The problem with your query is it was not valid, and did not match anything. The correct syntax would be as follows:
db.person.find({"source":"Naukri", dateCreated:{$exists:true}})
.sort({dateCreated: -1})
.limit(10)
So now this will filter by the value provided for "source" and where the "dateCreated" field exists, meaning it is there and it contains something.
I recommend looking at the links below, the first of the two concerned with structuring mongoDB queries and the find method and it's arguments. All of the functionality translates to every language implementation.
As for the Java API and how to use, there are different methods depending on which you are comfortable with. The API provides a BasicDBObject class which is more or less equivalent to the JSON document notation, and is sort of a hashmap concept. For something a bit more along the lines of the shell methods and a helper to be a little more like some of the dynamic languages approach, there is the QueryBuilder class which the last two links give example and information on. These allow chaining to make your query more readable.
There are many examples on Stack Overflow alone. I suggest you take a look.
http://docs.mongodb.org/manual/tutorial/query-documents/
http://docs.mongodb.org/manual/reference/method/db.collection.find/
How to do this MongoDB query using java?
http://api.mongodb.org/java/2.2/com/mongodb/QueryBuilder.html

Your query is not correct.Update it as follows :
db.person.find({"source":"Naukri", dateCreated:{$exists:true}}).sort({dateCreated: 1}).limit(10)
In Java, you can do it as follows :
Mongo mongo = ...
DB db = mongo.getDB("yourDbName");
DBCollection coll = db.getCollection("person");
DBObject query = new BasicDBObject();
query.put("source", "Naukri");
query.put("dateCreated", new BasicDBObject($exists : true));
DBCursor cur = coll.find(query).sort(new BasicDBObject("dateCreated", 1)).limit(10);
while(cur.hasNext()) {
DBObject obj = cur.next();
// Get data from the result object here
}

Related

Parse sql query using antlr parsetree to mongo bson document in Java

I have a SQL like query example:
Select id,name from employee where age > 30 and department = 'IT' limit 200
The SQL query grammer is defined in an ANTLR4 grammar file. Is there any implementation that converts the parse tree of this query to a bson document?
The bson document will then be used to query a mongo db.
In one of my previous jobs I did something similar: got a query (not an sql, but pretty similar) and translated it to mongo query with antlr.
I don't have a code to share, However I can share my thoughts:
Mongo is not SQL compliant, so you can't just take a sql grammar. What about JOINs and all the relational algebra? What about aggregations that are pretty tricky in mongo with their aggregation framework? In the opposite direction, how do you generate SQL that gets translated to "exists" clause in mongo. There are many things like this, some are small, some are huge, but bottom line you must be talking about some kind of subset of sql ,some DSL that is allowed to be used as a query language and looks "like" an sql because people are used to SQL.
With that in mind, you should create your own grammar and Antlr will generate a lexer/parser for you. You'll also get for granted a syntax check of the query in Runtime. Antlr won't be able to parse the query if its not in a correct format obviously, some grammar rule will fail. This is an another reason to not take SQL "as is".
So far so good, you've created your own listener / visitor. In my case I've opted for creating an object representation of the query with internal state and everything.
So the query
Select id,name
from employee
where age > 30
and department = 'IT'
limit 200
Was translated to objects of type:
class Query {
private SelectClause select;
private FromClause from;
private WhereClause where;
private Limit limit;
}
class SelectClause {
private List<String> fields;
}
...
class WhereClause {
Condition root;
}
interface Condition {
...
}
class AndCondition implements Condition { // the same for Not, Or
}
For this particular query its something like:
Query q = new Query(new SelectClause(["id", "name"]), new FromClause("employee"), new WhereClause(new AndCondition(new SimpleLeafCondition("age", Operators.GT, 30), new SimpleLeafCondition("department", Operators.EQ, "IT" )), new Limit(30));
Then Its possible to make some optimizations in the query (like embedding of where clauses if you need, or, for example, manipulating the "For" part if you're working multi tenant environment and have different collections for different tenants).
After all you can go with design pattern "interpreter" and recursively parse the query objects and "translate" them to valid mongo query.
I remember that this step took me something like 1 day to accomplish (it was 7 years ago with mongo 2 I guess, but still), given the correct structure of objects representing the query, so this should not be that complicated. I'm bringing this up, because It looks like its your primary concern in the question.

Bulk read Couchbase documents

I want to asynchronously read a number of documents from a Couchbase bucket. This is my code:
JsonDocument student = bucketStudent.get(studentID);
The problem is for a large data file with a lot of studentIDs, it would take a long time to get all documents for these studentIDs because the get() method is called for each studentID. Is it possible to have a list of studentIDs as input and return an output of a list of students instead of getting a single document for each studentID?
If you are running a query node, you can use N1QL for this. Your query would look like this:
SELECT * FROM myBucket USE KEYS ["key1", "key2", "key3"]
In practice you would probably pass in the array of strings as a parameter, like this:
SELECT * FROM myBucket USE KEYS ?
You will need a primary index for your bucket, or queries like this won't work.
AFAIK couchbase SDK does not have a native function for a bulk get operation.
The node.js SDK has a getMulti method, but it's basically an iteration over an array and then get() is fired for each element.
I've found in my applications that the key-value approach is still faster than the SELECT * on a primary index but the N1QL query is remarkably close (on couchbase 5.x).
Just a quick tip: if you have A LOT of ids to fetch and you decide to go with the N1QL queries, try to split that list in smaller chunks. It's speeds up the queries and you can actually manage your errors better and avoid getting some nasty timeouts.
Retrieving multiple documents using the document IDs is not supported by default in the Couchbase Java SDK. To achieve that you'll need to use a N1QL query as below
SELECT S.* FROM Student S USE KEYS ["StudentID1", "StudentID2", "StudentID3"]
which would return an array of Documents with the given IDs. Construct the query with com.couchbase.client.java.query.N1qlQuery, and use either of below to execute
If you're using Spring's CouchbaseTemplate, you can use the below
List<T> findByN1QL(com.couchbase.client.java.query.N1qlQuery n1ql,
Class<T> entityClass)
If you're using Couchbase's Java SDK, you can use the below
N1qlQueryResult query(N1qlQuery query)
Note that you'll need an index on your bucket to run N1QL queries.
It is possible now. The java sdk gives capability to do multi get. It is present in 2 flavours
Async bulk get
N1Q1 query. Does not work with binary documents
The Couchbase document suggests to use Async bulk get, But that has additional dependency on reactive java client. you can see official documentation here.
There are several tutorial explaining the usage. link .
Here is how a sample get would look like, with java sdk 3.2
List<String> docsToFetch = Arrays.asList("airline_112", "airline_1191", "airline_1203");
Map<String, GetResult> successfulResults = new ConcurrentHashMap<>();
Map<String, Throwable> erroredResults = new ConcurrentHashMap<>();
Flux.fromIterable(docsToFetch).flatMap(key -> reactiveCollection.get(key).onErrorResume(e -> {
erroredResults.put(key, e);
return Mono.empty();
}).doOnNext(getResult -> successfulResults.put(key, getResult))).last().block();
source .

Using Apache Solr's boost query function with Spring in Java

I'm writing a Java application that is using Apache Solr to index and search through a list of articles. A requirement I am dealing with is that when a user searches for something, we are supplying a list of recommended related search terms, and the user has the option to include those extra terms in their search. The problem I'm having, however, is that we want the user's original search term to be prioritized, and results that match that should appear before results that only match related terms.
My research suggests that Solr's boost function is the solution for this, but I'm having some trouble getting it to work with Spring. The code all runs fine and I get my search results as expected, but the boost function doesn't seem to actually be re-ordering my searches at all. For example, I'm trying to do something like this:
Query query = new SimpleQuery();
Criteria searchCriteria = Criteria.where("title").contains("A").boost((float) 2);
Criteria extraCriteria = Criteria.where("title").contains("B").boost((float) 1);
query.addCriteria(searchCriteria.or(extraCriteria));
In this example I would be searching for any document whose title contains "A" or "B", but I want to boost results that match "A" to the top of the list.
I've also tried using the Extended DisMax Query Parser with a different syntax to achieve the same result, with similar lack of success. To follow the same example pattern, I'm trying to use the expression criteria as follows:
Query query = new SimpleQuery();
Criteria searchCriteria = Criteria.where("title").expression("A^2.0 OR B^1.0");
query.setDefType("edismax");
query.addCriteria(searchCriteria);
Again I would expect this to return documents with titles matching "A" or "B" but boost results matching "A", and again it simply doesn't seem to actually affect the ordering of my results at all.
Okay, I figured out the problem here. Elsewhere in the code someone else had added this snippet:
query.setPageRequest(pageable);
This was done to support pagination of the search results, but the pageable object ALSO contained some sort orders that looks like they got added to the query as part of the .setPageRequest method. Something to look out for in the future, it looks like sorts override boosting when working with Spring Solr queries in this scenario.

Search DB entries for a match when table has eight columns

I have to work with a POJO "Order" that 8 fields and each of these fields is a column in the "order" table. The DB schema is denormalized (and worse, deemed final and unchangeable) so now I have to write a search module that can execute a search with any combination of the above 8 fields.
Are there any approaches on how to do this? Right now I get the input in a new POJO and go through eight IF statements looking for values that are not NULL. Each time I find such a value I add it to the WHERE condition in my SELECT statement.
Is this the best I can hope for? Is it arguably better to select on some minimum of criteria and then iterate over the received collection in memory, only keeping the entries that match the remaining criteria? I can provide pseudo code if that would be useful. Working on Java 1.7, JSF 2.2 and MySQL.
Each time I find such a value I add it to the WHERE condition in my SELECT statement.
This is a prime target for Sql Injection attacks!
Would something like the following work with MySql?
SELECT *
FROM SomeTable
WHERE (#param1 IS NULL OR SomeTable.SomeColumn1 = #param1) OR
(#param2 IS NULL OR SomeTable.SomeColumn2 = #param2) OR
(#param3 IS NULL OR SomeTable.SomeColumn3 = #param3) OR
/* .... */

Reverse search in Hibernate Search

I'm using Hibernate Search (which uses Lucene) for searching some Data I have indexed in a directory. It works fine but I need to do a reverse search. By reverse search I mean that I have a list of queries stored in my database I need to check which one of these queries match with a Data object each time Data Object is created. I need it to alert the user when a Data Object matches with a Query he has created. So I need to index this single Data Object which has just been created and see which queries of my list has this object as a result.
I've seen Lucene MemoryIndex Class to create an index in memory so I can do something like this example for every query in a list (though iterating in a Java list of queries would not be very efficient):
//Iterating over my list<Query>
MemoryIndex index = new MemoryIndex();
//Add all fields
index.addField("myField", "myFieldData", analyzer);
...
QueryParser parser = new QueryParser("myField", analyzer);
float score = index.search(query);
if (score > 0.0f) {
System.out.println("it's a match");
} else {
System.out.println("no match found");
}
The problem here is that this Data Class has several Hibernate Search Annotations #Field,#IndexedEmbedded,... which indicated how fields should be indexed, so when I invoke index() method on the FullTextEntityManager instance it uses this information to index the object in the directory. Is there a similar way to index it in memory using this information?
Is there a more efficient way of doing this reverse search?
Just index the new object (if you use automatic indexing you don't have to do anything besides committing the current transaction), then retrieve the queries you want to run and run all of them in a boolean query, combining the stored query with the id of the new object. Something like this:
...
BooleanQuery query = new BooleanQuery();
query.add(storedQuery, BooleanClause.Occur.MUST);
query.add(new TermQuery(ProjectionConstants.ID, id), BooleanClause.Occur.MUST);
...
If you get a result you know the query matched.
Since MemoryIndex is a completely separate component that doesn't extend or implement Lucene's Directory or IndexReader, I don't think there's a way you can plug this into Hibernate Search Annotations. I'm guessing that if you choose to use MemoryIndex, you'll need to write your addField() calls which basically mirrors what you're doing in the annotations.
How many queries are we talking about here? Depending on how many there are you might be able to get away with just running the queries on the main index that Hibernate maintains, ensuring to constrain the search to the document ID you just added. Or for every document that's added, create a one-document in-memory index using RAMDirectory and run the queries through that.

Categories