Couchbase 2.0 Java SDK 1.1 - Synchronous Add and Views - java

I am trying to create a junit test. Scenario:
setUp: I'm adding two json documents to database
Test: I'm getting those documents using view
tearDown: I'm removing both objects
My view:
function (doc, meta) {
if (doc.type && doc.type == "UserConnection") {
emit([doc.providerId, doc.providerUserId], doc.userId);
}
}
This is how I add those documents to database and make sure that "add" is synchronous:
public boolean add(String key, Object element) {
String json = gson.toJson(element);
OperationFuture<Boolean> result = couchbaseClient.add(key, 0, json);
return result.get();
}
JSON Documents that I'm adding are:
{"userId":"1","providerId":"test_pId","providerUserId":"test_pUId","type":"UserConnection"}
{"userId":"2","providerId":"test_pId","providerUserId":"test_pUId","type":"UserConnection"}
This is how I call the view:
View view = couchbaseClient.getView(DESIGN_DOCUMENT_NAME, VIEW_NAME);
Query query = new Query();
query.setKey(ComplexKey.of("test_pId", "test_pUId"));
ViewResponse viewResponse = couchbaseClient.query(view, query);
Problem:
Test fails due to invalid number of elements fetched from view.
My observations:
Sometimes tests are passing
Number of elements that are fetched from view is not consistent(from 0 to 2)
When I've added those documents to database instead of calling setUp the test passed every time
Acording to this http://www.couchbase.com/docs/couchbase-sdk-java-1.1/create-update-docs.html documentation I'm adding those json documents synchronously by calling get() on returned Future object.
My question:
Is there something wrong with how I've approached to fetching data from view just after this data was inserted to DB? Is there any good practise for solving this problem? And can someone explain it to me please what I've did wrong?
Thanks,
Dariusz

In Couchbase 2.0 documents are required to be written to disk before they will show up in a view. There are three ways you can do an operation with the Java SDK. The first is asynchronous which means that you just send the data and at a later time check to make sure that the data was received correctly. If you do an asynchronous operation and then immediately call .get() as you did above then you have created a synchronous operation. When an operation returns success in these two cases above you are only guaranteed that the item has been written into memory. Your test passed sometimes only because you were lucky enough that both items were written to disk before did your query.
The third way to do an operation is with durability requirements and this is the one you want to do for your tests. Durability requirements allow you to say that you want an item to be written to disk or replicated before success is returned to the client. Take a look at the following function.
https://github.com/couchbase/couchbase-java-client/blob/1.1.0/src/main/java/com/couchbase/client/CouchbaseClient.java#L1293
You will want to use this function and set the PersistedTo parameter to MASTER.

Related

Kafka Persistence State-Store Not Worked with combined use of python and java

Today i found very strange thing in Kafka state store i google lot but didn't found the reason for the behavior.
Consider the below state store written in java:
private KeyValueStore<String, GenericRecord> userIdToUserRecord;
There are two processor who are using this state store.
topology.addStateStore(userIdToUserRecord, ALERT_PROCESSOR_NAME, USER_SETTING_PROCESSOR_NAME)
USER_SETTING_PROCESSOR_NAME will put the data to state store
userIdToUserRecord.put("user-12345", record);
ALERT_PROCESSOR_NAME will get the data from state store
userIdToUserRecord.get("user-12345");
Adding source to UserSettingProcessor
userSettingTopicName = user-setting-topic;
topology.addSource(sourceName, userSettingTopicName)
.addProcessor(processorName, UserSettingProcessor::new, sourceName);
Adding source to AlertEngineProcessor
alertTopicName = alert-topic;
topology.addSource(sourceName, alertTopicName)
.addProcessor(processorName, AlertEngineProcessor::new, sourceName);
Case 1:
Produce record using Kafka produce in java
First produce record to topic user-setting-topic using java it will add the user record to state store
Second produce record to topic alert-topic using java it will take record from state store using user id userIdToUserRecord.get("user-12345");
Worked fine i am using kafkaavroproducer to produce record to both the topic
Case 2:
First produce record to topic user-setting-topic using python it will add the user record to state store *userIdToUserRecord.put("user-100", GenericRecord);
Second produce record to topic alert-topic using java it will take record from state store using user id userIdToUserRecord.get("user-100");
the strange happen here userIdToUserRecord.get("user-100") will return null
I check the scenario like this also
i produce record to user-setting-topic using python then the userSettingProcessor process method triggered there is check in debug mode and try to get user record from state store userIdToUserRecord.get("user-100") it worked fine in userSettingProcessor i am able to get the data from state-store
Then i produce record to alert-topic using java then try to get the userIdToUserRecord.get("user-100") it will return null
i don't know this strange behavior anyone tell me about this behavior.
Python code:
value_schema = avro.load('user-setting.avsc')
value = {
"user-id":"user-12345",
"client_id":"5cfdd3db-b25a-4e21-a67d-462697096e20",
"alert_type":"WORK_ORDER_VOLUME"
}
print("------------------------Kafka Producer------------------------------")
avroProducer = AvroProducer(
{'bootstrap.servers': 'localhost:9092', 'schema.registry.url': 'http://localhost:8089'},
default_value_schema=value_schema)
avroProducer.produce(topic="user-setting-topic", value=value)
print("------------------------Sucess Producer------------------------------")
avroProducer.flush()
Java Code:
Schema schema = new Schema.Parser().parse(schemaString);
GenericData.Record record = new GenericData.Record(schema);
record.put("alert_id","5cfdd3db-b25a-4e21-a67d-462697096e20");
record.put("alert_created_at",123449437L);
record.put("alert_type","WORK_ORDER_VOLUME");
record.put("client_id","5cfdd3db-b25a-4e21-a67d-462697096e20");
//record.put("property_key","property_key-"+i);
record.put("alert_data","{\"alert_trigger_info\":{\"jll_value\":1.4,\"jll_category\":\"internal\",\"name\":\"trade_Value\",\"current_value\":40,\"calculated_value\":40.1},\"work_order\":{\"locations\":{\"country_name\":\"value\",\"state_province\":\"value\",\"city\":\"value\"},\"property\":{\"name\":\"property name\"}}}");
return record;
The problem is that the Java producer and the Python producer (that is based on the C producer) use a different default hash-function for data partitioning. You will need to provide a customized partitioning to one (or both) to make sure they use the same partitioning strategy.
Unfortunately, the Kafka protocol dose not specify what the default partitioning hash function should be and thus clients can use whatever they want by default.

Java EBean Play Framework. findOne() not working ? how to return one object instead of findList()?

//Below code is Not Working
//Here, my query just returns one object, So I am trying to use findOne() //method.
Query<Topic> query = Ebean.find(Topic.class);
Topic topic = new Topic();
Topic topic=Topic.find.where().eq("columnName", "nameToMatch").findOne();
//Below part is working if I use findList(). But I have to do get(0) to //fetch the topic which is not good practice I think.
List<Topic> topicList = Ebean.find(Topic.class).where().eq("columnName", "NametoMatch").findList();
topicList.get(0)
Can anyone provide ideas how to return just One Object instead of list ?
I don't know if findOne exists in Ebean, but when I need to retrieve only one object I use findUnique()
If you're sure the object you want to find is unique, you can get it via findUnique(): Topic.find.where().eq("columnName", "nameToMatch").findUnique();
Otherwise you can use findList() with setMaxRows(), because you don't want to load in memory whole result set:
Topic.find.where().eq("columnName", "nameToMatch").setMaxRows(1).findList();

How to get result from multiple query on multiple collection of mongo in eventBus vertx2?

I am working on providing API and I am storing data by month a database and by date a collection on mongodb.
So I have db db_08_2015 then I have 31 collection from date_01 to date_31
and I have to query data from date 1 to date 10 to have a total money spend so I need to send 31 request like this.
My question is How to get data by 1 request at the time to get a sum before I return to client like sync request into mongo to get result.
Something like I have date_01 = 10 then date_02 = 20 ... and I want to sum it all before return to client.
vertx.eventBus().send("mongodb-persistor", json, new Handler<Message<JsonObject>>() {
#Override
public void handle(Message<JsonObject> message) {
logger.info(message.body());
JsonObject result = new JsonObject(message.body().encodePrettily());
JsonArray r = result.getArray("results");
if (r.isArray()) {
if (r.size() > 0) {
String out = r.get(0).toString();
req.response().end(out);
} else {
req.response().end("{}");
}
} else {
req.response().end(message.body().encodePrettily());
}
}
});
I think in your case you might be better off by having a different approach to model your data.
In terms of analytics I would recommend the lambda architecture approach as quoted below:
All data entering the system is dispatched to both the batch layer and the speed layer for processing.
The batch layer has two functions: (i) managing the master dataset (an immutable, append-only set of raw data), and (ii) to pre-compute
the batch views.
The serving layer indexes the batch views so that they can be queried in low-latency, ad-hoc way.
The speed layer compensates for the high latency of updates to the serving layer and deals with recent data only.
Any incoming query can be answered by merging results from batch views and real-time views.
Having the above in mind, why not have an aggregates collection that should hold the aggregated data in the format your query requires, while at the same time keep a raw copy in the format you described.
By having this you will have a view over the data in the required query format and a way to recreate the aggregated data in case your system backfires.
References for diagram and quotes - Lambda Architecture

Creating Couchbase views in code: First query returns 0 rows, index builds in background

I am creating a view in code using the Couchbase Java API 2.1.2 like this:
DefaultView view = DefaultView.create(viewName, jsMapFunctionAsString);
List<View> views = new ArrayList<>();
views.add(view);
DesignDocument doc = DesignDocument.create(name, views);
bucket.bucketManager().insertDesignDocument(doc);
Calling ViewResult result = bucket.query(ViewQuery.from(name, viewName)) directly after inserting the document, viewResult.success() always returns true, but both iterators rows() and iterator return 0 rows (there are definitely results present. When I execute the view in the web interface, it returns correct values).
The workaround I found after several hours is to call query twice with enough waiting time in between like
ViewResult result = bucket.query(ViewQuery.from(name, viewName));
Thread.sleep(10000);
result = bucket.query(ViewQuery.from(name, viewName));
The second call will then return the correct result.
It seems as if Couchbase has to build the index for the query first but returns directly, even before the index has been built.
Waiting 10 seconds is of course not optimal, maybe creating the index will take more time in the future.
So my question is, how can I make sure, that I only wait until the index has been built?
Is this a bug in the API?
You can use the stale method and set it to false in the ViewQuery, this will force to wait for the indexation to finish before returning results.

How to get myBatis(iBatis) result as Iterable. (in case of very large rows)

When using myBatis, I should get's very large result set from DB and do secuential operation. (such as CSV Export)
I am thinking and afraid that if return type is List, all returned data on my memory and will cause OutOfMemoryException.
So, I want to get result as a type of ResultSet or Iterable<MyObject> using myBatis.
Tell me any solutions.
Starting from mybatis 3.4.1 you can return Cursor which is Iterable and can be used like this (under condition that result is ordered, see above Cursor API java doc for details):
MyEntityMapper.java
#Select({
"SELECT *",
"FROM my_entity",
"ORDER BY id"
})
Cursor<MyEntity> getEntities();
MapperClient.java
MyEntityMapper mapper = session.getMapper(MyEntityMapper.class);
try (Cursor<MyEntity> entities = mapper.getEntities()) {
for (MyEntity entity:entities) {
// process one entity
}
}
You should use fetchSize(refer here). Based on the heap size and data size per row you can select the number of result sets to be fetched from database. Alternatively since basically you are using data export to csv, you can use spring batch which has mybatis paging item reader. Though the draw back in this item reader for each after each page an request is fired to get next page, which increase the load on your database. If you are not worried about the load you can go ahead with paging item reader. or there is simple another item reader called JdbccursorItem reader

Categories