Bulk read Couchbase documents - java

I want to asynchronously read a number of documents from a Couchbase bucket. This is my code:
JsonDocument student = bucketStudent.get(studentID);
The problem is for a large data file with a lot of studentIDs, it would take a long time to get all documents for these studentIDs because the get() method is called for each studentID. Is it possible to have a list of studentIDs as input and return an output of a list of students instead of getting a single document for each studentID?

If you are running a query node, you can use N1QL for this. Your query would look like this:
SELECT * FROM myBucket USE KEYS ["key1", "key2", "key3"]
In practice you would probably pass in the array of strings as a parameter, like this:
SELECT * FROM myBucket USE KEYS ?
You will need a primary index for your bucket, or queries like this won't work.

AFAIK couchbase SDK does not have a native function for a bulk get operation.
The node.js SDK has a getMulti method, but it's basically an iteration over an array and then get() is fired for each element.
I've found in my applications that the key-value approach is still faster than the SELECT * on a primary index but the N1QL query is remarkably close (on couchbase 5.x).
Just a quick tip: if you have A LOT of ids to fetch and you decide to go with the N1QL queries, try to split that list in smaller chunks. It's speeds up the queries and you can actually manage your errors better and avoid getting some nasty timeouts.

Retrieving multiple documents using the document IDs is not supported by default in the Couchbase Java SDK. To achieve that you'll need to use a N1QL query as below
SELECT S.* FROM Student S USE KEYS ["StudentID1", "StudentID2", "StudentID3"]
which would return an array of Documents with the given IDs. Construct the query with com.couchbase.client.java.query.N1qlQuery, and use either of below to execute
If you're using Spring's CouchbaseTemplate, you can use the below
List<T> findByN1QL(com.couchbase.client.java.query.N1qlQuery n1ql,
Class<T> entityClass)
If you're using Couchbase's Java SDK, you can use the below
N1qlQueryResult query(N1qlQuery query)
Note that you'll need an index on your bucket to run N1QL queries.

It is possible now. The java sdk gives capability to do multi get. It is present in 2 flavours
Async bulk get
N1Q1 query. Does not work with binary documents
The Couchbase document suggests to use Async bulk get, But that has additional dependency on reactive java client. you can see official documentation here.
There are several tutorial explaining the usage. link .
Here is how a sample get would look like, with java sdk 3.2
List<String> docsToFetch = Arrays.asList("airline_112", "airline_1191", "airline_1203");
Map<String, GetResult> successfulResults = new ConcurrentHashMap<>();
Map<String, Throwable> erroredResults = new ConcurrentHashMap<>();
Flux.fromIterable(docsToFetch).flatMap(key -> reactiveCollection.get(key).onErrorResume(e -> {
erroredResults.put(key, e);
return Mono.empty();
}).doOnNext(getResult -> successfulResults.put(key, getResult))).last().block();
source .

Related

JOOQ multiple select count in one connection with PostgreSQL

I have a table SUBSCRIPTION, I want to run multiple selectCount written with JOOQ in one connection with different predicates to the database.
To do so, I have created a list of queries:
List<Query> countQueries = channels.stream().map(c ->
selectCount().from(SUBSCRIPTION)
.innerJoin(SENDER).on(SENDER.ID.equal(SUBSCRIPTION.SENDER_ID))
.innerJoin(CHANNEL).on(CHANNEL.ID.equal(SUBSCRIPTION.CHANNEL_ID))
.where(SENDER.CODE.equal(senderCode))
.and(CHANNEL.CODE.equal(c))
).collect(toList());
And finally, I have launched this list of queries using batch:
using(configuration).batch(countQueries).execute();
I have expected to have the results of the above queries in the return values of execute, but I get an array of integer filled with 0 values.
Is this the right way to run multiple selectCount using JOOQ?
What is the signification of the integer array returned by the execute method?
I have checked this link, in the JOOQ blog, talking about "How to Calculate Multiple Aggregate Functions in a Single Query", but It's just about SQL queries, no JOOQ dialects.
Comments on your assumptions
I have expected to have the results of the above queries in the return values of execute, but I get an array of integer filled with 0 values.
The batch() API can only be used for DML queries (INSERT, UPDATE, DELETE), just like with native JDBC. I mean, you can run the queries as a batch, but you cannot fetch the results this way.
I have checked this link, in the JOOQ blog, talking about "How to Calculate Multiple Aggregate Functions in a Single Query", but It's just about SQL queries, no JOOQ dialects.
Plain SQL queries almost always translate quite literally to jOOQ, so you can apply the technique from that article also in your case. In fact, you should! Running so many queries is definitely not a good idea.
Translating that linked query to jOOQ
So, let's look at how to translate that plain SQL example from the link to your case:
Record record =
ctx.select(
channels.stream()
.map(c -> count().filterWhere(CHANNEL.CODE.equal(c)).as(c))
.collect(toList())
)
.from(SUBSCRIPTION)
.innerJoin(SENDER).on(SENDER.ID.equal(SUBSCRIPTION.SENDER_ID))
.innerJoin(CHANNEL).on(CHANNEL.ID.equal(SUBSCRIPTION.CHANNEL_ID))
.where(SENDER.CODE.equal(senderCode))
.and(CHANNEL.CODE.in(channels)) // Not strictly necessary, but might speed up things
.fetch();
This will produce a single record containing all the count values.
As always, this is assuming the following static import
import static org.jooq.impl.DSL.*;
Using classic GROUP BY
Of course, you can also just use a classic GROUP BY in your particular case. This might even be a bit faster:
Result<?> result =
ctx.select(CHANNEL.CODE, count())
.from(SUBSCRIPTION)
.innerJoin(SENDER).on(SENDER.ID.equal(SUBSCRIPTION.SENDER_ID))
.innerJoin(CHANNEL).on(CHANNEL.ID.equal(SUBSCRIPTION.CHANNEL_ID))
.where(SENDER.CODE.equal(senderCode))
.and(CHANNEL.CODE.in(channels)) // This time, you need to filter
.groupBy(CHANNEL.CODE)
.fetchOne();
This now produces a table with one count value per code. Alternatively, fetch this into a Map<String, Integer>:
Map<String, Integer> map =
ctx.select(CHANNEL.CODE, count())
.from(SUBSCRIPTION)
.innerJoin(SENDER).on(SENDER.ID.equal(SUBSCRIPTION.SENDER_ID))
.innerJoin(CHANNEL).on(CHANNEL.ID.equal(SUBSCRIPTION.CHANNEL_ID))
.where(SENDER.CODE.equal(senderCode))
.and(CHANNEL.CODE.in(channels))
.groupBy(CHANNEL.CODE)
.fetchMap(CHANNEL.CODE, count());

How to create optimized select with the Java API (with gremlin)?

I have a query in SQL that does a direct search, but when I try to implement it in the Java API (with Gremlin) it does a slow sequential search.
The query is:
SELECT FROM ? where outE().inV() contains ?
Any ideas on how to write it optimized in the Java API (with Gremlin)?
I am actually using Scala but Java should be similar, example:
graph
.getNoTx
.V()
.hasLabel("Class")
.outE("Edge")
.inV()
.has("Class", "Field", "Value")
.toList

HBase get multiple distinct rowids?

I am using HBase JAVA/Scala API and have a list of rowids List(1,2,5) and want to retrieve rows corresponding to those ids at the same time.
val getOne = new Get(Bytes.toBytes("1"))
Let's me access only 1 row and I don't want to do get(1) get(2) and get(5) sequentially (latency).
How would I do all at once and iterate through the return set later?
If API does not offer it what is the next best way?
So the answer is easy
list of gets
Result[] get(List gets) throws IOException
You just have to construct multiple "Get" requests and use the below method from the HTable.
Result[] get(List<Get> gets)
Link to the javadoc is here.

Sort the values by Date in mongodb

I am new to mongodb and I am trying to sort all my rows by date. I have records from mixed sources and I trying to sort it separately. I didn't update the dateCreated while writing into db for some records. Later I found and I added dateCreated to all my records in the db. Say I have total of 4000 records, first 1000 I don't have dateCreated. Latest 3000 has that column. Here I am trying to get the last Updated record using dateCreated column. Here is my code.
db.person.find({"source":"Naukri"}&{dateCreated:{$exists:true}}).sort({dateCreated: 1}).limit(10)
This code retruns me some results (from that 1000 records) where I can't see that dateCreated column at all. Moreover if I change (-1) here {dateCreated: -1} I am getting results from some other source, but not Naukri.
So I need help this cases,
How do I sort by dateCreated to get the latest updated record and by sources also.
I am using Java API to get the records from Mongo. I'd be grateful if someone helps me to find how I will use the same query with java also.
Hope my question is clear. Thanks in advance.
From the documentation you will (and you will, won't you - nod yes) read, you will find that the first argument to the find command you are using is what is called a query document. In this document you are specifying a list of fields and conditions, "comma" separated, which is the equivalent of an and condition in declarative syntax such as SQL.
The problem with your query is it was not valid, and did not match anything. The correct syntax would be as follows:
db.person.find({"source":"Naukri", dateCreated:{$exists:true}})
.sort({dateCreated: -1})
.limit(10)
So now this will filter by the value provided for "source" and where the "dateCreated" field exists, meaning it is there and it contains something.
I recommend looking at the links below, the first of the two concerned with structuring mongoDB queries and the find method and it's arguments. All of the functionality translates to every language implementation.
As for the Java API and how to use, there are different methods depending on which you are comfortable with. The API provides a BasicDBObject class which is more or less equivalent to the JSON document notation, and is sort of a hashmap concept. For something a bit more along the lines of the shell methods and a helper to be a little more like some of the dynamic languages approach, there is the QueryBuilder class which the last two links give example and information on. These allow chaining to make your query more readable.
There are many examples on Stack Overflow alone. I suggest you take a look.
http://docs.mongodb.org/manual/tutorial/query-documents/
http://docs.mongodb.org/manual/reference/method/db.collection.find/
How to do this MongoDB query using java?
http://api.mongodb.org/java/2.2/com/mongodb/QueryBuilder.html
Your query is not correct.Update it as follows :
db.person.find({"source":"Naukri", dateCreated:{$exists:true}}).sort({dateCreated: 1}).limit(10)
In Java, you can do it as follows :
Mongo mongo = ...
DB db = mongo.getDB("yourDbName");
DBCollection coll = db.getCollection("person");
DBObject query = new BasicDBObject();
query.put("source", "Naukri");
query.put("dateCreated", new BasicDBObject($exists : true));
DBCursor cur = coll.find(query).sort(new BasicDBObject("dateCreated", 1)).limit(10);
while(cur.hasNext()) {
DBObject obj = cur.next();
// Get data from the result object here
}

Reverse search in Hibernate Search

I'm using Hibernate Search (which uses Lucene) for searching some Data I have indexed in a directory. It works fine but I need to do a reverse search. By reverse search I mean that I have a list of queries stored in my database I need to check which one of these queries match with a Data object each time Data Object is created. I need it to alert the user when a Data Object matches with a Query he has created. So I need to index this single Data Object which has just been created and see which queries of my list has this object as a result.
I've seen Lucene MemoryIndex Class to create an index in memory so I can do something like this example for every query in a list (though iterating in a Java list of queries would not be very efficient):
//Iterating over my list<Query>
MemoryIndex index = new MemoryIndex();
//Add all fields
index.addField("myField", "myFieldData", analyzer);
...
QueryParser parser = new QueryParser("myField", analyzer);
float score = index.search(query);
if (score > 0.0f) {
System.out.println("it's a match");
} else {
System.out.println("no match found");
}
The problem here is that this Data Class has several Hibernate Search Annotations #Field,#IndexedEmbedded,... which indicated how fields should be indexed, so when I invoke index() method on the FullTextEntityManager instance it uses this information to index the object in the directory. Is there a similar way to index it in memory using this information?
Is there a more efficient way of doing this reverse search?
Just index the new object (if you use automatic indexing you don't have to do anything besides committing the current transaction), then retrieve the queries you want to run and run all of them in a boolean query, combining the stored query with the id of the new object. Something like this:
...
BooleanQuery query = new BooleanQuery();
query.add(storedQuery, BooleanClause.Occur.MUST);
query.add(new TermQuery(ProjectionConstants.ID, id), BooleanClause.Occur.MUST);
...
If you get a result you know the query matched.
Since MemoryIndex is a completely separate component that doesn't extend or implement Lucene's Directory or IndexReader, I don't think there's a way you can plug this into Hibernate Search Annotations. I'm guessing that if you choose to use MemoryIndex, you'll need to write your addField() calls which basically mirrors what you're doing in the annotations.
How many queries are we talking about here? Depending on how many there are you might be able to get away with just running the queries on the main index that Hibernate maintains, ensuring to constrain the search to the document ID you just added. Or for every document that's added, create a one-document in-memory index using RAMDirectory and run the queries through that.

Categories