CouchbaseClient vs MemcachedClient

CouchbaseClient vs MemcachedClient - java

I am wondering what are advantages of using one client over another. Is there possibility to use MemcachedClient to connect to client side Moxi instance?

Performance. From the memcache->Membase days, Moxi handles the routing/distribution from the Couchbase cluster side, but then all your AppServers hit individual nodes of Moxi in the cluster. We have done a lot of optimizations and improvements over this design to increase performance.
One of the improvements is with Couchbase SDK's the client's themselves have a map of cluster topology so they go directly to nodes responsible for the data rather than the Moxi routing. You will find that you get a performance boost and reduced latencies by using the Couchbase SDK's!

In addition to Scalabl3 answer take a look to : http://www.couchbase.com/memcached

Related

Elasticsearch cluster load balancing best practices

I would like to understand whether I need or is it considered as a good practice to have load balancer as part of the deployment of Elasticsearch.
As far as I understand high level rest client as well as transport client of Elasticsearch can manage load balancing between the nodes. So the client needs coma separated endpoint list and that's it.
Is there any point to have also Load Balancer at the middle?
For which case it might be useful?
Pros and cons of each method?

Normally external load-balancer in ES cluster is not very common and not required as Elasticsearch already does load balancing and by default all the data nodes in ES cluster act as co-ordinating role but if you want to improve the performance you can have dedicated co-ordinating node as well.
If your goal is to have a smart load-balancing which improves the performance than if you are on ES 6.X or higher(turned by default on 7.X), you get it out of the box without doing any external configuration, by using Adaptive replica selection.
Having another loadbalancer means extra configuration and another layer before your request reaches to ES, so IMHO it doesn't make any sense to use it.

The answer depends on your architecture and also your requirements. Do you need a loadbalancer for high availability? Or for performance reasons/scalability? Or both?
Elasticsearch like many other distributed systems comes with its own protocols and semantics to distribute load across multiple nodes and to manage fail-overs.
You can use these semantics to configure nodes in such a way that a node can perform just the role of a coordinator -- effectively acting as a load balancer for heavy duty operations like search requests or bulk index requests.
Elasticsearch also has its own built-in protocol for electing a new master node in case of failures -- again effectively performing the role of a load balancer.
In general, I would recommend you to use the native capabilities to achieve your goals instead of adding more complexity by introducing another technology in front of it.
If you want a stable URL for your cluster, then configure your DNS server to reach that goal. A cloud provider managed cluster should already have such a feature, otherwise you can configure it with some efforts.

Distributed Cache in Java using ehcache RMI

Requirement : I have 4 servers : A,B,C,D. They all connect to data provider, get the data and persist it into mongodb for N mins. So that if, next time, same request arrives to another server, it takes data from mongodb only instead of making a call to data provider.
|A|
|B| |data provider|
|C|
|D|
But if, |data provider| response slow, there is a possibility that 2 different requests for same resource arrive to A, B. I want one request waiting until the response of first request is received. I am using queue for this which is fine for single server. But now I need need distributed cache due to multiple servers.
Implementation : After reading few articles over the net, I got to know that Distributed Cache in Java can be implemented using ehcache RMI replication. But I have few doubts before going ahead with ehchache. (Although there are more solutions like JCS etc, I decided to pick ehcache on the basis other answers on StackOverflow)
Doubts
What if one of the servers gets down? Does ehcache handles this automatically?

Interesting situation, but I fail too see how an extra cache (of any kind) would help solve the problem. Ultimately your problem boils down to one of coordination between servers and a cache has little to add there.
Instead I would either use a queue shared between the four servers where only one request for a resource is allowed on at a time. Another possibility is a shared Map where each server will lock a resource name while retrieving it. Other servers can then wait on this lock and once it is released try to retrieve the resource from MongoDB.
I haven't tried using it, but a combination of redis and redisson looks like a good fit for such a task.

Ehcache with RMI replication is NOT a distributed cache and will not help in your situation because there will not be any shared state on which to queue/isolate your accesses.
Distributed Ehcache - that is backed by Terracotta - can help as you could combine strong consistency with a CacheLoader to obtain that only one thread across servers does load a given resource. But unless you are ready to stick with Terracotta 3.7.x, this is no longer an open source option.
But as Martin said, this may not be the best answer to your use case, as I feel you already use MongoDB as your fast access storage, which makes the cache redundant.

CouchBase Client vs Spymemcache, Client Usage

I am to use CouchBase as my caching layer. My caches are serialized objects and not(always) Document type or JSON values. The couchbase client(Couchbase-Java-Client-1.4.3) also uses Spymemcache jars which is the client of memcached server.
My questions are,
Should I use only Couchbas Client to cache my objects into couchbase server?
What if I use default Spymemcache client to cache ? if I use, what could be pros and conse over the using couchbase client? Can the Spymemcache client handle server nodes and node failures itself?

You should not use Spymemcached with Couchbase. You can, but I would recommend against it. First off, CouchbaseClient uses the Spymemcached jar because both Memcached and Couchbase use the memcached binary protocol for key-value operations. When we initially started writing a Java Client for Couchbase it didn't make sense to re-implement something that was already written so we used Spymemcached because it performs very well. You can think of Spymemcached as doing all of the basic stuff like sending messages and Couchbase Client as adding all of the nice functionality that makes it really easy to use Couchbase.
What if I use default Spymemcache client to cache ?
Everything goes through the cache regardless of what client you use so there is not point in using two different clients.
What could be pros and cons over the using couchbase client?
If you don't use CouchbaseClient you will be using something a lot more primitive. Think about it this way. Couchbase has spent about 3 years adding features to CouchbaseClient that make it really easy to use the Couchbase. CouchbaseClient will deal with topology changes and also adds operations that Spymemcached does not have.
Can the Spymemcache client handle server nodes and node failures itself?
No, but if you send traffic through Moxi then you will be able to deal with node failures.
I wouldn't even both with Spymemcached if your using Couchbase.

Duplex streaming in Java EE

I'm looking for a full duplex streaming solution with Java EE.
The situation: client applications (JavaFX) read data from a peripheral device. This data needs to be transferred in near real-time to a server for processing and also get the response back asynchronously, all while it keeps sending new data for processing.
Communication with the server needs to have an overhead as low as possible. Data coming in is basically some sensor data and after processing it is turned in what can be described as a set of commands.
What I've looked into:
A TCP/IP server (this is a non-Java EE approach).This would be the obvious solution. Two connections opened in parallel from each client app: one for upstream data and one for downstream data.
Remote & stateless EJBs. This would mean that there's no streaming involved and that I pack sensor data in smaller windows (1-2 seconds worth of sensor data) which I then send to the server for processing and get the processing result as a response. For this approach, while it is scalable, I am not sure how fast it will be considering I have to make a request each 1-2 seconds. I still need to test this but I have my doubts.
RMI. Is this any different than EJBs, technically?
Two servlets (up/down) with long polling. I've not done this before, so it's something to be tested.
For now I would like to test the performance for my approach #2. The first solution will work for sure, but I'm not too fond of having a separate server (next to Tomcat, where I already have something running).
However, meanwhile, it would be worth knowing if there are any other Java specific (EE or not) technologies that could easily solve this. If anyone has an idea, then please share it.

This looks like a good place for using JMS. Instead of stateless EJBs, you will probably be using Message-Driven Beans.
This gives you an approach similar to your first solution, using two message queues instead of TCP/IP connections. JMS makes your communications fully asynchronous and is low-overhead in the sense that your clients can send messages as fast as they can regardless of how fast your server can consume them. You also get delivery guarantees and other JMS goodness.
Tomcat does not come with JMS, however. You might try TomEE or integrate your existing Tomcat with a JMS implementation like ActiveMQ.

There are numerous options you could try. Appropriate solutions depend on the nature of your application, communication protocol, data transfer type, control you have over the client and server and firewall restrictions on client server routes.
There's not much info on this in your question, but given what you have provided, you may like to look at netty as it is quite general purpose and flexible and seems to fit your requirements. Netty also includes a duplex websocket implementation. Note that a netty based solution may be more complex to implement and require more background study than some other solutions (such as jms).
Yet another possible solution in GraniteDS, which advertises a JavaFX client integration and multiple server integrations for full duplex client/server communication, though I have not used it. GraniteDS uses comet (your two asynchronous servlets with long polling model) with the Active Message Format for data which you may be familiar with from Flex/Flash.

Have you looked at websockets as a solution? They are known to keep persistent connections and hence the asynchronous response will be quick.

What does a Terracotta server do when it is used as a backend for EHCache with Hibernate?

My DAL is implemented with Hibernate and I want to use EHCache as its second level cache with its distributed capabilities (for scalability and HA).
Seeing as EHCache provides distributed caching only with Terracotta my question is what is the role of the Terracotta server instance? Does it also hold data? Does it only coordinate the distribution between the partitioned cache parts?
My confusion derives mainly from this explanation regarding TSA which says the server holds the data but I think that maybe in my scenario the cache and the Terracotta server are sort of merged. Am I correct?
If the server does hold data then why shouldn't the bottleneck just move from the db to the Terracotta server?
Update:
Affe's answer answered the second part of my question which was the important part but just in case someone comes by looking for the first part I'll say that the TC server has to hold all the data that the EHCache in memory holds and so if you want a distributed cache (not replicated) then the L2 (TC server) must hold all the objects itself as well.
Thanks in advance,
Ittai

The idea is it's still significantly faster to contact the terracotta cluster via the terracotta driver and do what's basically a Map lookup, than to acquire a database connection and execute an SQL statement. Even if that does become the application's choke point, overall throughput would be expected to still be significantly higher than a JDBC Connection + SQL choke point. Open connections and open cursors are big resource hogs in the database, an open socket to the terracotta cluster is not!

You can get ehcache clustered without using terracotta. They have documentation for doing it via RMI, JGroups and JMS. We are using JMS since we have a significant JMS infrastructure to handle the communication already. I don't know how well it will scale in the long term, but our current concern is just HA.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.