Postgresql Replication solutions and their performance

Postgresql Replication solutions and their performance - java

I am doing POC on Posgtresql replication. I am using latest version of postgresql i.e. 9.1. There are multiple replication solutions avaliable in the market (PGCluster, Pgpool-II, Slony-I). Postgresql also provide in-built replication solutions (Streaming replication, Warm Standby and hot standby). I am confused which solution is best for the financial application for which I am doing POC. The application will write around 160 million records with row size of 2.5 KB in database. My questions is for following scenarios which replication solution will be suitable:
If I would require replication for backup purpose only
If I would require to scale the reads
If I would require High Avaliability and Consistency
Also It will be very helpful if you can share the perfomance or experience with postgresql replication solutions.

The short answer is "whatever your problem is, there is a solution."
Let's look at just a few of he main ones.
Slony-I is a replication solution that allows you to scale reads across part or all of your database. This is designed so that you could take part of your database and replicate it into your DMZ for, say, customer reports. On the other hand, this flexibility begets complexity, and while Slony lets you replicate only part of your database, Slony lets you replicate only part of your database...... Also Slony's flexibility doesn't stop there. It allows you to replicate across different versions of Pgsql, therefore making sure you have zero downtime for read queries during major upgrades.
Postgres-XC is really the successor in spirit to PGCluster. It offers Teradata-style clustering for PostgreSQL. If you really need to scale reads and writes, this is the solution for you but again that adds complexity too.
The built-in replication solutions are simplest, allow you to scale for purposes of taking backups and taking writes. It ensures high availability and consistency but a major upgrade requires downtime of all nodes.
So the thing is you need to figure out exactly what you want and then look for help in selecting the right tool for the job. I would recommend asking on the pgsql-general email lists when you get to that point.

Related

Java Caching frameworks for maintaining huge data

Java Caching frameworks for storing huge data.
Context: We are developing a Restful service using Jersey 2.6 and will deploy it on WAS 8.5. This service need to serve more than 10 million requests per day.
We need to implement a cache to store more than 300k object (data will come from DB). And we need some way to update the cache on a daily basis.
Is this approach of caching 300k object and updating them on a daily basis is recommended?
Are there any Java framework which supports this kind of functionality?

Your question is too general to get a clear answer. You need to be describe what the problem you are trying to solve is.
Are you concerned about response times?
Are you trying to protect your DB from doing heavy lifting?
Are expecting to have to scale out and want to be sure that you can deal with future loads?
Additionally some more contextual information would be useful, especially:
How dynamic is your data compared to your requests?
What percentage of your data population will be requested on average per day? (How many of the 3 lakh objects will be enquired upon at least once per day? If you don't know, provide your best guess).
Your figures given as 3 lakh (300k) data points and 10M requests means that you are expecting to hit each object on average 33 times a day, which indicates that you are more concerned about back end DB load than your responses being right up to date.
In my experience there are a lot of fairly primitive solutions which will work much better than going for a heavyweight distributed systems such as Mongo, Cassandra or Coherence.
My first response would be: Keep it simple - 300k objects is not too much to store in an internal hash table which you flush once a day and populate on first request.
If you need to scale horizontally, I would suggest Memcache Spymemcached with a 1 day cache time, which populate when you don't find an existing entry.
I would NOT go for something like Cassandra or Mongo unless you have real compelling reasons to require a persistent store. Rationale: Purging can become really onerous, especially if your data is fast moving. For example: Cassandra does not really know how to delete, but instead "tombstones" deleted entries, which means that your data store will simply grow and grow until you create a strategy for purging.

Question is if caching must be distributed. Remember the caching is something you have seen. And posting this around for the chance it might be of use... well why.
Distributed Cache system: Redis, Cassandra in Memory. MongoDB in memory.
Local RocksDB (let you store byte[] -> byte[]) and SSDs makes a fine local cache layer. You might also add distributed layer on top of it. Usually better than something from the shelves. Should also be easy to implement.
10Million Requests per day isnt much. in 10hours tops you can server 1Mio / 60 / 60 => 3000 requests per second. Based on the afford you usually can go with an efficient frontend and efficient backend. We can do 40k pages per second and core and having 24 cores.. you know the math. Data in memory no chaching done...

For the caching provider I suggest Coherence, I am using Coherence at my company, and it is very robust and synchronized over multiple clusters.
For the other point about how to handle cache, it depends on the nature of your application, based on my experience with caching, I've decided to update the cache in the following scenarios:
1. Grid paging
2. Browsing
and decided to clear the cache and reload the data again:
Edit item
Add new item
Delete item
And I've decided so as maintaining the cache it an overkill headache that will be blown in your face when you handle some kind of statistics and nested hierarchies.
Hope this helped you.

Yes they are for example: Coherence, Hazelcast. All are distrubuted cashes.
http://java.dzone.com/articles/sneak-peek-jcache-api-jsr-107
In general you should cache what you are using, and cache should be always in sync not daily. You place in cache the recently used objects, and you get read/write through cache to your DB.

If you have money , best one is coherence (its reputation is proved by big financial companies )
Hazelcast is an other distributed cache memory you can use, it is one level lower than coherence based on preformance metrics.

Cou could try ehcache. It can be used as query cache or even hibernate second level cache.
You can configure how long entities should be stored in cache before they are invalidated.

If you already have WebSphere ND 8.5.5, you may take a look at WebSphere Extreme Scale, which is provided with that. It is distributed, partitioned caching solution that integrates with WebSphere. See WebSphere eXtreme Scale overview for more details.

See the new JCache standard (JSR 107 in the Java Community Process). This API is implemented by Coherence and other caching implementations (ehcache etc.), and also has a small reference implementation that you can use for basic use cases.
Yes, any of the Java caching frameworks should be able to help you. Coherence (note: I work with Coherence at Oracle) for example can definitely handle 3,00,000 items easily (I assume you are from India if you use lakh!), but I suggest only using Coherence if you are deploying this on more than one server.

High-performance storage for messages

I have been looking for high-performance file storage solution to be used for persisting soap messages in Java EE environment.
We are currently using a CLOB table on Oracle RMDBS, but it is very expensive to scale. While oracle works well for storing the related metadata, it doesn't perform too well with the message content. Insert on a table with a CLOB gives roughly 1000% worse performance than one without it (This was measured by comparing performance of VARCHAR2(4000)-insert to CLOB-insert when in row storage has been disabled for CLOB)
Persisting messages on file system is one option, but I have some serious doubts how an average file systems would perform storing millions of files per day. Considering we have to keep those files for several months, it just doesn't sound right.
I know there are several open source key-value databases (jackrabbit, mongodb to name few) that might be up for the task, but I just can't find time to evaluate them all. I would also like to hear about performance of open source RMDBS.
Considering that volume of transmitted messages is ever increasing, priority is on low latency and high performance. We do not require clustering or transactionality and (minor) data loss on system failure is acceptable.
Requirements:
Must be able to maintain rate of at least 100persisted messages/sec when message size is 8kilobytes
Must be able to store at least 100million messages
Must support deletion of persisted messages by age
Must support persisting while deletion is in progress
Must support retrieval of message by id
Help is appreciated

Here is nice comparison between MongoDB and SQL Server (I believe Oracle will have similar performance). You can see from charts that Mongo can handle 20 000 inserts per second. Mongo has also query language based on JSON which can do almost everything like regular SQL and it has Sharded Clusters and Replica sets which can handle all neccesary backups and failover (some basic info here).
Also, if you are interested in digging little bit deeper, 10 gen has an online course starting in two weeks awarded with a certificate.

You can try the following products:
HBase
MongoDB
Cassandra
Solr 4.0 (only)
These are the guys that I have any experience. There are a lot of other good products that can do what you want in the market.
Some observations: none of them have this "delete by age" feature out-of-the-box, as far as I know it. But it should be really simple to implement it. Easier in MogoDB I must assume.
If you will try Solr, you should stick with versions 4.X as these are the only ones with support to near realtime commits, and it will affect your "delete and insert" requirement.
All of them have great performance, but I did not run a benchmark with your requirement. If I were you I would make my own benchmarks.

Oracle11g has the data deduplication featured introduced. This feature will improve the performance of the oracle database with clob.

This is what I've discovered so far. I will try to update this answer after evaluating each product.
I started my experiments using MongoDB, which on paper looked like a viable option. Here's a summary of my findings:
Written in C++
Replication (replicaset) requires 3 nodes for high availability
One of the nodes is elected as a master - only the master can write
Scaling out is done by sharding (partitioning)
Each shard is essentially a replicaset - therefore sharded environment requires atleast 6 nodes for high availability
mongod instance consumes all available memory - virtualization should be used for resource partitioning (if you intend to run application server on same hardware)
Master re-election may take up to 1minute
Document collections (tables) use exclusive lock during write operation
Java API is exceptionally easy to use and includes a virtual filesystem called GridFS
Single node write performance on test system was ~20000 inserts/sec for 1kbyte document
Single node read performance was ~20000 read/sec for 1kbyte document
The fact that MongoDB would require 6 nodes on a two data center configuration, made me look further for more cost-efficient solutions.
Apache Cassandra:
Written in Java
Replication requires 3 nodes for high availability
Database survives network partitioning
Replication algorithm has been designed for multiple data centers
All nodes are writable
Scaling out can be done by adding more nodes (up to a certain limit)
Cassandra may require JVM garbage collection tuning
Java API is not the easiest to work with
Single node write performance was ~7000 inserts/sec for 1kbyte document
Single node read performance was ~7000 reads/sec for 1kbyte document
While Cassandra was slower in a single node configuration, write performance on a high availability configuration would match MongoDB's performance. The ability perform writes on every node (even during network partitioning) is very welcome addition for logging.
Couchbase:
Unfortunately I was unable to test Couchbase.
For now we'll keep using Oracle SecureFiles. Would we run out of resources on Oracle, both Cassandra and MongoDB seem like viable alternatives.

RealWorld HazelCast [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Does anyone have any real world experience with Hazelcast distributed data grid and execution product? How has it worked for you? It has an astonishingly simple API and functionality that seems almost to good to be true for such a simple to use tool. I have done some very simple apps and it seems to work as advertised so far. So here I am looking for the real world 'reality check'. Thank you.

We've been using it in production since version 1.8+, using mainly the distributed locking feature. It works great, we've found a couple of workarounds/bugs, but those were fixed relatively fast.
With 1.8M locks per day we found no problems so far.
I recommend start using version 1.9.4.4.

There are still some issues still with its development,
http://code.google.com/p/hazelcast/issues/list
Generally, you can choose to either let it use its own multicast algorithm or specify your own ip's. We've tried it in a LAN environment and it works pretty well. Performance wise it's not bad but the monitoring tool didn't work very well as it failed to update most of the time. If you can live with the current issues then by all mean go for it. I would use it with caution but it's a great working tool IMHO.
Update:
We've been using Hazelcast for a few months now and it's working very well. The settings are relatively easy to set up and with the new updates, are comprehensive enough to customize even small things like the number of threads allowed in read/write operations.

We are using Hazelcast (1.9.4.6 now) in production integrated with a complicated transactional service. It was added to alleviate immediate database throughput issues. We have discovered that we frequently have to stop it bringing down all transaction services for at least an hour. We are running clients in superclient mode because it is the only option that even remotely meets our performance requirements (about 4 times faster than native clients.) Unfortunately stopping a superclient node causes split brain issues and causes the grid to lose records, forcing a complete shutdown of services. We have been trying to make this product work for us for almost a full year now, and even paid to have 2 hazelcast reps flown in to help. They were unable to produce a solution, but were able to let us know that we were probably doing it wrong. In their opinion it should work better but it was pretty much a wasted trip.
At this point we are on the hook for over 6 figures per year in licensing fees and we are currently using about 5 times the resources to keep the grid alive and meet our performance needs than we would be using with a clustered and optimized database stack. This was absolutely the wrong decision for us.
This product is killing us off. Use with caution, sparingly, and only for simple services.

If my own company and projects count as real world, here's my experience. I wanted to get as close to eliminating external (disk) storage in favor of limitless and persistent "RAM". For starters that eliminates CRUD plumbing which sometimes makes up to 90% of the so-called "middle tier". There are other benefits. Since RAM is your "database" you don't need any complex caches or HTTP session replication (which in turn eliminates ugly sticky session technique).
I believe RAM is the future and Hazelcast has everything to be an in-memory database: queries, transactions, etc. So I wrote a mini-framework abstracting it: to load data from the persistent storage (I can plugin anything that can store BLOBs - the fastest turned out to be MySQL). It is too long to explain why I didn't like Hazelcast's built-in persistence support. It's rather generic and rudimentary. They should remove it. It is not rocket science to implement your own distributed and optimized write-behind and write-through. Took me a week.
Everything was fine until I started performance-testing. Queries are slow - after all of the optimizations I did: indexes, Portable serialization, explicit comparators, etc. A simple "greater than" query on an indexed field takes 30 seconds on the set of 60K of 1K records (map entries). I believe Hazelcast team did everything they could. As much as I hate to say it, Java collections are still slow compared to super-optimized C++ code normal databases use. There are some open-source Java projects that address that. However at this time query persistence is unacceptable. It should be instant on a single local instance. It is an in-memory technology after all.
I switched to Mongo for the database, however left Hazelcast for shared runtime data - namely sessions. Once they improve query performance I'll switch back.

If you have alternatives to hazelcast maybe look at these first. We have it in running production mode and it is still quite buggy, just check out the open issues.
However, the integration with Spring, Hibernate etc. is quite nice and the setup is really easy :)

We use Hazelcast in our e-commerce application to make sure that our inventory is consistent.
We use extensive use of distributed locking to make sure SKU Items of inventory are modified in atomic way because there are hundred of nodes in our web application cluster that operates concurrently on these items.
Also, we use distributed map for caching purpose which are shared across all the nodes. Since scaling node in Hazelcast is so simple and it utilises all its CPU core, it gives added advantage over redis or any other caching framework.

We are using Hazelcast from last 3 years in our e-commerce application to make sure availability (supply & demand) is consistent, atomic, available & scalable.
We are using IMap (distributed map) to cache the data and Entry Processor for read & write operations to do fast in-memory operations on IMap without you having to worry about locks.

Zookeeper/Chubby -vs- MySql NDB

I have been reading the Paxos paper, the FLP theorem etc. recently and evaluating Apache Zookeeper for a project. I have also been going thru Chubby (Google's distributed locking service) and the various literature on it that is available online. My fundamental usecase for Zookeeper is to implement replication and general coordination for a distributed system.
I was just wondering though, what is the specific advantage that Zookeeper or a Chubby like distributed locking system brings to the table. Basically I am just wondering why I can't just use a MySQL NDB Cluster. I keep hearing that MySQL has a lot of replication issues. I was hoping some with more experience on the subject might shed some light on it.
Thanks in advance..
A simplistic listing of my requirements :
I have a homogeneous distributed system.
I need some means of maintaining consistent state across all my nodes.
My system exposes a service, and interaction with clients will lead to some change in collective state of my system.
High availability is a goal, thus a node going down must not affect the service.
I expect the system to service atleast a couple of 1000 req/sec.
I expect the collective state of the system to be bounded in size (basically inserts/deletes will be transient... but in steady state, i expect lots of updates and reads)

It depends on the kind of data you are managing and the scale and fault tolerance you are going for.
I can answer from the ZooKeeper point of view. Before starting I should mention that ZooKeeper is not a Chubby clone. Specifically it does not do locks directly. It is also designed with different ordering and performance requirements in mind.
In ZooKeeper the entire copy of system state is memory resident. Changes are replicated using an atomic broadcast protocol and synced to disk (using a change journal) by a majority of ZooKeeper servers before being processed. Because of this ZooKeeper has deterministic performance that can tolerate failures as long as a majority of servers are up. Even with a big outage, such as a power failure, as long as a majority of servers come back on line, system state will be preserved. The information stored is ZooKeeper is usually considered the ground truth of the system so such consistency and durability guarantees are very important.
The other things that ZooKeeper gives you have to do with monitoring dynamic coordination state. Ephemeral nodes allow you do to easy failure detection and group membership. The ordering guarantees allow you to do leader election and client side locking. Finally, watches allow you to monitor system state and quickly respond to changes in system state.
So if you need to manage and respond to dynamic configuration, detect failures, elect leaders, etc. ZooKeeper is what you are looking for. If you need to store lots of data or you need a relational model for that data, MySQL is a much better option.

MySQL with Innodb provides a good general purpose solution, and will probably keep up with your performance requirements quite easily on not-too-expensive hardware. It can easily handle many thousands of updates per second on a dual quad-core box with decent disks. The built-in asynchronous replication will get you most of the way there for your availability requirements - but you might lose a few seconds' worth of data if the primary fails. Some of this lost data might be recoverable when the primary is repaired, or might be recoverable from your application logs: whether you can tolerate this is dependent on how your system works. A less lossy - but slower - alternative is to use MySQL Innodb with shared disk between Primary and Failover units: in this case, the Failover unit will take over the disk when the Primary fails with no loss of data -- as long as the Primary did not have some kind of disk catastrophe. If shared disk is not available, DRBD can be used to simulate this by synchronously copying disk blocks to the Failover unit as they are written: this might have an impact on performance.
Using Innodb and one of the replication solutions above will get your data copied to your Failover unit, which is a large part of the recovery problem solved, but extra glue is required to reconfigure your system to bring the Failover unit on-line. This is usually performed with a cluster system like RHCS or Pacemaker or Heartbeat (on Linux) or the MS Cluster stuff for Windows. These systems are toolkits, and you are left to get your hands dirty building them into a solution that will fit your environment. However, for all of these systems there is a brief outage period while the system notices that the Primary has failed, and reconfigures the system to use the Failover unit. This might be tens of seconds: trying to reduce this can make your failure detection system too sensitive, and you might find your system being failed over unnecessarily.
Moving up, MySQL NDB is intended to reduce the time to recovery, and to some extent help scale up your database for improved performance. However, MySQL NDB has a quite narrow range of applicability. The system maps a relational database on to a distributed hash table, and so for complex queries involving multiple joins across tables, there is quite a bit of traffic between the MySQL component and the storage components (the NDB nodes) making complex queries run slow. However, queries that fit well run very fast indeed. I have looked at this product a few times, but my existing databases have been too complicated to fit well and would require a lot of redesign to get good performance. However, if you are at the design stage of a new system, NDB would work well if you can bear its constraints in mind as you go. Also, you might find that you need quite a few machines to provide a good NDB solution: a couple of MySQL nodes plus 3 or more NDB nodes - although the MySQL and NDB nodes can co-exist if your performance needs are not too extreme.
Even MySQL NDB cannot cope with total site loss - fire at the data centre, admin error, etc. In this case, you usually need another replication stream running to a DR site. This will normally be done asynchronously so that connectivity blips on the inter-site link does not stall your whole database. This is provided with NDB's Geographic replication option (in the paid-for telco version), but I think MySQL 5.1 and above can provide this natively.
Unfortunately, I know little about Zookeeper and Chubby. Hopefully someone else can pick up these aspects.

Persistence strategy for low latency reads and writes

I am building an application that includes a feature to bulk tag millions of records, more or less interactively. The user interaction is very similar to Gmail where users can tag individual emails, or bulk tag large amounts of emails. I also need quick read access to these tag memberships as well, and where the read pattern is more or less random.
Right now we're using Mysql and inserting one row for every tag-document pair. Writing millions of rows to Mysql takes a while (high I/O), even with bulk insertions and heavy optimization. We need this to be an interactive process, not a batch process.
For the data that we're storing and reading, consistency and availability of the data are not as important as performance and scalability. So in the event of system failure while the writes are occurring, I can deal with some data loss. However, the data definitely needs to be persisted to secondary storage at some point.
So, to sum up, here are the requirements:
Low latency bulk writes of potentially tens of millions of records
Data needs to be persisted in some way
Low latency random reads
Durable writes not required
Eventual consistency is okay
Here are some solutions I've looked at:
Write behind caches (Terracotta, Gigaspaces, Coherence) where records are written to memory and drained to the database asynchronously. These scare me a little because they appear to add a certain amount of complexity to the app that I'd want to avoid.
Highly scalable key-value stores, like MongoDB, HBase, Tokyo Tyrant

If you have the budget to use Coherence for this, I highly recommend doing so. There is direct support for write-behind, eventual consistency behavior in Coherence and it is very survivable to both a database outage and Coherence cluster node outages (if you use >= 3 Coherence nodes on separate JVMs, preferably on separate hosts). I have implemented this for doing high-volume CRM for a Fortune 100 company's e-commerce site and it works fantastically.
One of the best aspects of this architecture is that you write your Java application code as if none of the write-behind behavior were taking place, and then plug in the Coherence topology and configuration that makes it happen. If you need to change the behavior or topology of Coherence later, no change in your application is required. I know there are probably a handful of reasonable ways to do this, but this behavior is directly supported in Coherence rather than having to invent or hand-roll a way of doing it.
To make a really fine point - your worry about adding application complexity is a good one. With Coherence, you simply write updates to the cache (or if you're using Hibernate it can be the L2 cache provider). Depending upon your Coherence configuration and topology, you have the option to deploy your application to use write-behind, distributed, caches. So, your application is no more complex (and, frankly unaware) due to the features of the cache.
Finally, I implemented the solution mentioned above from 2005-2007 when Coherence was made by Tangosol and they had the best possible support. I'm not sure how things are now under Oracle - hopefully still good.

I've worked on a large project that used asyncrhonous writes althoguh in that case it was just hand-written using background threads. You could also implement something like that by offloading the db write process to a JMS queue.
One thing that will certainly speed up db writes is to do them in batches. JDBC batch updates can be orders of magnitude faster than individual writes, and if you're doing them asynchronously you can just write them 500 at a time.

Depending on how your data is organized perhaps you would be able to use sharding,
if the read latency isn't low enough you can also try to add caching. Memcache is one popular solution.

Berkeley DB has a very high performance disk-based hash table that supports transactions, and integrates with a Java EE environment if you need that. If you're able to model the data as key/value pairs, this can be a very scalable solution.
http://www.oracle.com/technology/products/berkeley-db/je/index.html
(Note: oracle bought berkeley db about 5-10 years ago; the original product has been around for 15-20 years).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.