high availability singleton processor in Tomcat - java

I have a job processing analytic service working against RDBMS that, due to the need for complex caching and cache update logic needs to be a singleton in a high availability cluster. Jobs are coming as JMS messages (via ActiveMQ). It is part of the application hosted in HA Tomcat cluster with web front end.
The problem is, the service itself needs to be able to recover within seconds if a node where it is running fails. Failure could mean system down or just a slow CPU - i.e. if node recovers after CPU delay, but the processing is handed over, it cannot continue.
From experience, what would be the most suitable solution here:
database-based locks and lock checking before each job starts (I could not easily come up with bullet-proof solution here - any recommendations?)
some kind of Paxos algorithm? Do you know of any slim framework for that purpose as the algorithm itself takes time to get right and then QA?
anything else?
I don't mind if failure recovery is slow, but I would want to minimize an overhead for each job.
Some additional background: job does not involve anything more than reading data from the database, massaging it with various algorithms (somewhat resembling finding shortest routes) and putting back optimal solutions for different actors to move on. Actors interact with real world and put back some feedback, based on which consequent steps are optimized by the same job processor.

Solution Using Hazelcast
Hazelcast locking method proposed by Tomasz works. You need to read documentation carefully, use time leased locks and ensure monitoring of your singleton to renew leases. One thing to keep in mind is that Hazelcast was written to work in large clusters - as such its start up time is relatively slow, 1 to 5 seconds even for two nodes. But after that operations are qute performant and obtainng the lock takes milliseconds. Normally it all does not matter, but failure/recovery cycle takes time and it should be treated as exceptional situation.
There are limits to this solution being buletproof. If the cluster is split (network disruption between nodes) but each node is alive and has access to the database, there is no way of knowing deterministically how to proceed. Ultimately, you need to think about a contingency plan here. In real life this scenario is very unlikely for a typical failover HA setup.
At the end of the day, before resorting to a solution with distributed locking, think hard about making your process not-so-singleton. It might still be hard to run certain things in parallel, but it might not be so hard to ensure the cache is not stale or find other ways to prevent database corruption. In my case, there is a database transaction counter working like optimisitic lock. Code reads it before making all the decisions and update-where's it in both, db and cache in the transaction where the result is stored. In case of discrepancy cache is purged and operation repeated. It makes two nodes working in parallel impossibly slow, but it prevents data corruption. By storing additional data with the transaction counter you might be able to optimize cache refresh strategies and slowly move towards parallel processing.
Conclusion.
This is how I would proceed about such a request next time.
Try making your singletons survive working in parallel on different nodes
Try again, maybe there is a way to orchestrate them
Check if it is possible to use HASingleton or similar technology to
avoid boilerplate
Implement the Hazelcast solution as outlined above
It makes no sense to post the code here as the most time consuming part is to test and verify all failure scenarios and contingency plans. There is almost no boilerplate, the code itself will always be solution specific. It is possible to come up with well working PoC covering all the bases within couple of days.

Related

Large Data transfer from one Service to another

In one of my interview, I have been asked a question that how would you efficiently design a system when you need to transfer millions of data sitting in one DB to another service?
What should be the most efficient way to design where we don't compromise scalability and throughput?
I would say it is more to check the way of thinking, then looking for production ready solution.
As a consultant I would start with "it depends". :)
First - more details needed. How big it really is? How often that operation happens? How critical it is? Can someone access the server?
In case of something really big- https://aws.amazon.com/snowmobile/
If that happens once, and there is an access to both severs - maybe simple solution - someone can copy that?
But my assumption is that question is about doing it in Java. Is TCP / TLS good enough it terms o security, data integrity? For me that is OK, but question is to show the way of thinking... Next step is actual processing. It will take a time - how to know that we are in sync? We can use relational DB, that is a bit older solution, but... widely used, well tested. Then started transaction with serialization isolation will do the job. Still question is how to restart whole operation? Besides that long lasting transactions are not perfect for DB. So, if there is a possibility to use queue - I would use it. That is a bit of complication, more resources, but if that operation is crucial, and/or happens often?
There are many factors involved however considering you are talking about movement of data from a DB to a service, we are talking about a service. So here i what i would do :-
Design a async process or a framework by introducing queueing mechanism in place. This framework will have the capability to scale up and down based on the usage. Introduce a integration layer in between the application and the host system from where the data will be transferred. Let it be a AWS SQS / Google PUBSUB or what not. Let you host system stream its data to it and then have your framework pull the data from there and move it to the service asynchronously. Scale your services based on the load etc.

Reactive Webflux for toggle server - is it beneficial?

We have a need to implement a simple toggle server (rest application) that would take the toggle name and return if it is enabled or disabled. We are expecting a load of 10s of thousands of requests per day.
Does Spring (reactive) webflux makes sense here?
My understanding is reactive rest apis will be useful if there is any possibility of idle time of the http thread - meaning thread waiting for some job to be done and it can't proceed until it receives the response like from db reads or rest calls to other services.
Our use case is just to return the toggle value (probably from some cache) that is being queried. Will the reactive rest service be useful in our case? Does it provide any advantages over simple spring boot application?
I'm coming from a background of "traditional" spring/spring-mvc apps development experience, and these days I'm also starting to learn spring webflux and based on the data provided in the question here are my observations (disclaimer: since I'm a beginner in this area as I said, take this answer with a grain of salt):
WebFlux is less "straight forward" to implement compared to the traditional application: the maintenance cost is higher, the debugging is harder, etc.
WebFlux will shine if your operations are I/O bound. If you're going to read the data from in-memory cache - this is not an I/O bound operation. I also understand that the nature of "toggle" data is that it doesn't change that much, but gets accessed (read) frequently, hence keeping it in some memory cache indeed makes sense here, unless you build something huge that won't fit in memory at all, but this is a different story.
WebFlux + netty will let you to serve simultaneously thousands of requests, tomcat, having a traditional "thread per request" model, still allows 200 threads + 100 in the queue by default, if you exceed these values it will fail, but netty will "survive". Based on the data presented in the question, I don't see you'll benefit from netty here.
10s of thousands requests per day seems like something that any kind of server can handle easily, tomcat, jetty, whatever - you don't need that "high-load" here.
As I've mentioned in item "3" WebFlux is good in simultaneous request handling, but you probably won't gain any performance improvement over the traditional approach, its not about the speed, its about the better resource utilization.
If you're going to read the data from the database and you do want to go with webflux, make sure you do have reactive drivers for your database - when you run the flow, you should be "reactive" all the way, blocking on db access doesn't make sence.
So, bottom line, If I were you, I would have started with a regular server and consider moving to reactive stack later (probably this "later" will never come as long the expectations specified in the questions don't change).
Indeed it aims to minimize thread idling and getting more performance by using fewer threads than in a traditional multithreading approach where a thread per request is used, or in reality a pool of worker threads to prevent too many threads being created.
If you're only getting tens of thousands of requests per day, and your use case is as simple as that, it doesn't sound like you need to plan anything special for it. A regular webapp will perform just fine.

Is running each SQL-query in a separate thread (during startup) bad practise?

My Java application will need to gather information from different MySQL-tables on startup. Without the database information, the application cannot be used. Hence, it takes up to a few seconds to start up (reducing this time with cache when possible).
Would it be bad practise to preform each of these SQL-queries in a
separate thread, allowing computers with multiple CPU cores to start
the application eventually even faster (no guarantees, I know)?
Is there any cons that I need to know about before implementing this
"system"?
It's something your going to have to try.
If your bringing back relatively few rows from each table, it would probably take longer to establish the database connections in each of the threads (or jdbc connection pool) than to establish it once and do the queries.
Fortunately it's not a lot of code, so you should be able to try it out pretty quickly.
No, certainly not. For example, a Java web server like Tomcat makes it all the time, when multiple users access your web application concurrently.
Just make sure you manage properly your data integrity using transactions.
Executing the request by parallelizing them instead of executing them serially may be a good idea.
Take care to use a Datasource and each request must use its own connection to the database (never share a conenction between different threads simultaneously).
Be sure that your connection pool and your thread pool size is well adapted
Database sessions are relatively expensive objects. Parallelizing to a few threads is no problem, but don't create 1000 threads if you have 1000 tables.
Furthermore, multithreading comes with complexity and potentially huge maintenance costs (for example, unreproducible issues resulting from race conditions). So, do your measurements, and if you find out that the speed up is just a few percent, just put everything back.
There are more ways to avoid the latency you see. For example, you can send multiple queries in a single command batch, thus reducing the number of roundtrips between your code and the database.

ehcache monitor: installation/configuration

In the documentation http://ehcache.org/documentation/user-guide/monitor there is a phrase:
It is recommended that you install the Monitor on an Operations server separate to production.
Why it is so? What will be if I install it on the production?
And, the second question on which I did not find the answer there - is it really this monitor does not affect the performance of application?
I'll try to explain what I think they mean.
First of all, I don't think the intention is that you not use the Monitor in production. Rather, I think they mean that the Monitor should be installed on a separate server in a production environment. There are at least three good reasons to do this.
The first is one of security. The clients that your production server is servicing shouldn't be able to reach the Monitor's services. By putting it on a separate server (perhaps behind a firewall) you prevent this.
The second is one of landscape simplicity. The Monitor can monitor several servers. By putting it on a separate server, you prevent one application server from being "special" - all the application servers are identical as far as this is concerned. Easier for scaling and maintenance of your landscape.
The third reason is one of performance. Calls to the Monitor won't impact the application server/s. This is as it should be.
As for the second part of your question- obviously, adding ehcache monitoring will affect performance to some extent. Probably it's meant to only incur a minimal overhead, but nothing is completely without cost. But if you end up optimizing the caches, it will probably be worth it.
I found this paragraph detailing how often the Monitor samples:
Memory is estimated by sampling. The first 15 puts or updates are measured and then every 100th put or update
(this is from the statistics section of the Monitor page)

Zookeeper/Chubby -vs- MySql NDB

I have been reading the Paxos paper, the FLP theorem etc. recently and evaluating Apache Zookeeper for a project. I have also been going thru Chubby (Google's distributed locking service) and the various literature on it that is available online. My fundamental usecase for Zookeeper is to implement replication and general coordination for a distributed system.
I was just wondering though, what is the specific advantage that Zookeeper or a Chubby like distributed locking system brings to the table. Basically I am just wondering why I can't just use a MySQL NDB Cluster. I keep hearing that MySQL has a lot of replication issues. I was hoping some with more experience on the subject might shed some light on it.
Thanks in advance..
A simplistic listing of my requirements :
I have a homogeneous distributed system.
I need some means of maintaining consistent state across all my nodes.
My system exposes a service, and interaction with clients will lead to some change in collective state of my system.
High availability is a goal, thus a node going down must not affect the service.
I expect the system to service atleast a couple of 1000 req/sec.
I expect the collective state of the system to be bounded in size (basically inserts/deletes will be transient... but in steady state, i expect lots of updates and reads)
It depends on the kind of data you are managing and the scale and fault tolerance you are going for.
I can answer from the ZooKeeper point of view. Before starting I should mention that ZooKeeper is not a Chubby clone. Specifically it does not do locks directly. It is also designed with different ordering and performance requirements in mind.
In ZooKeeper the entire copy of system state is memory resident. Changes are replicated using an atomic broadcast protocol and synced to disk (using a change journal) by a majority of ZooKeeper servers before being processed. Because of this ZooKeeper has deterministic performance that can tolerate failures as long as a majority of servers are up. Even with a big outage, such as a power failure, as long as a majority of servers come back on line, system state will be preserved. The information stored is ZooKeeper is usually considered the ground truth of the system so such consistency and durability guarantees are very important.
The other things that ZooKeeper gives you have to do with monitoring dynamic coordination state. Ephemeral nodes allow you do to easy failure detection and group membership. The ordering guarantees allow you to do leader election and client side locking. Finally, watches allow you to monitor system state and quickly respond to changes in system state.
So if you need to manage and respond to dynamic configuration, detect failures, elect leaders, etc. ZooKeeper is what you are looking for. If you need to store lots of data or you need a relational model for that data, MySQL is a much better option.
MySQL with Innodb provides a good general purpose solution, and will probably keep up with your performance requirements quite easily on not-too-expensive hardware. It can easily handle many thousands of updates per second on a dual quad-core box with decent disks. The built-in asynchronous replication will get you most of the way there for your availability requirements - but you might lose a few seconds' worth of data if the primary fails. Some of this lost data might be recoverable when the primary is repaired, or might be recoverable from your application logs: whether you can tolerate this is dependent on how your system works. A less lossy - but slower - alternative is to use MySQL Innodb with shared disk between Primary and Failover units: in this case, the Failover unit will take over the disk when the Primary fails with no loss of data -- as long as the Primary did not have some kind of disk catastrophe. If shared disk is not available, DRBD can be used to simulate this by synchronously copying disk blocks to the Failover unit as they are written: this might have an impact on performance.
Using Innodb and one of the replication solutions above will get your data copied to your Failover unit, which is a large part of the recovery problem solved, but extra glue is required to reconfigure your system to bring the Failover unit on-line. This is usually performed with a cluster system like RHCS or Pacemaker or Heartbeat (on Linux) or the MS Cluster stuff for Windows. These systems are toolkits, and you are left to get your hands dirty building them into a solution that will fit your environment. However, for all of these systems there is a brief outage period while the system notices that the Primary has failed, and reconfigures the system to use the Failover unit. This might be tens of seconds: trying to reduce this can make your failure detection system too sensitive, and you might find your system being failed over unnecessarily.
Moving up, MySQL NDB is intended to reduce the time to recovery, and to some extent help scale up your database for improved performance. However, MySQL NDB has a quite narrow range of applicability. The system maps a relational database on to a distributed hash table, and so for complex queries involving multiple joins across tables, there is quite a bit of traffic between the MySQL component and the storage components (the NDB nodes) making complex queries run slow. However, queries that fit well run very fast indeed. I have looked at this product a few times, but my existing databases have been too complicated to fit well and would require a lot of redesign to get good performance. However, if you are at the design stage of a new system, NDB would work well if you can bear its constraints in mind as you go. Also, you might find that you need quite a few machines to provide a good NDB solution: a couple of MySQL nodes plus 3 or more NDB nodes - although the MySQL and NDB nodes can co-exist if your performance needs are not too extreme.
Even MySQL NDB cannot cope with total site loss - fire at the data centre, admin error, etc. In this case, you usually need another replication stream running to a DR site. This will normally be done asynchronously so that connectivity blips on the inter-site link does not stall your whole database. This is provided with NDB's Geographic replication option (in the paid-for telco version), but I think MySQL 5.1 and above can provide this natively.
Unfortunately, I know little about Zookeeper and Chubby. Hopefully someone else can pick up these aspects.

Categories