Improving Get operation performance

Improving Get operation performance - java

I am running some comparison tests (ignite vs cassandra) to check how to improve the performance of 'get' operation. The data is fairly straightforward. A simple Employee Object(10 odd fields), being stored as BinaryObject in the cache as
IgniteCache<String, BinaryObject> empCache;
The cache is configured with,
Write Sync Mode - FULL_SYNC, Atomicity - TRANSACTIONAL, Backup - 1 & Persistence - Enabled
Cluster config,
3 server + 1 client node.
Client has multiple threads(configurable) making concurrent get calls.
For about 500k request, i am getting a throughput of about 1500/sec. Given all of the data is in off-heap with cache hits percentage = 100%. Interestingly with Cassandra i am getting a similar performance, with key Cache and limited row cache.
Letting the defaults for most of the Data configuration. For this test i turned the persistence off. Ideally for get's it shouldn't really matter. The performance is the same.
Data Regions Configured:
[19:35:58] ^-- default [initSize=256.0 MiB, maxSize=14.1 GiB, persistence=false]
Topology snapshot [ver=4, locNode=038f99b3, servers=3, clients=1, state=ACTIVE, CPUs=40, offheap=42.0GB, heap=63.0GB]
Frankly, i was expecting Ignite gets to be pretty fast, given all data is in cache. Atleast looking at this test https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher-cassandratm-benchmarks-power-in-memory-computing
Planning to run one more test tomorrow with no-persistence and setting near cache (on heap) to see if it helps.
Let me know if you guys see any obvious configurations that should be set.

Related

Data Change Notification performance impact

I built a simple application that is monitoring Oracle's DB on a single table. I tried to test performance impact of enabled subscription and was unpleased surprised that degradation is about x2 for inserting about 10000 records each in standalone transaction.
without subscription 10k insert ~ 30 sec
with subscription ROWID granularity 10k insert ~ 60 sec
If I set:
OracleConnection.DCN_NOTIFY_ROWIDS, "false"
OracleConnection.DCN_QUERY_CHANGE_NOTIFICATION, "false"
then all degradation is vanishing but I need to get details of updates.
I removed all extra processing from client side so this is all about subscription overhead.
I am wondering is it so expensive by nature or I can tune this somehow?

Database change notification has an overhead during commit. This can't be tuned. Note that this feature is designed for read-mostly tables that are worth being cached on the client/mid-tier. One trick might be to unregister your app during batch inserts.

How can I achieve better throughput for a large migration to DynamoDB?

I have been running a test for a large data migration to dynamo that we intend to do in our prod account this summer. I ran a test to batch write about 3.2 billion documents to our dynamo table, which has a hash and range keys and two partial indexes. Each document is small, less than 1k. While we succeeded in getting the items written in about 3 days, we were disappointed with the Dynamo performance we experienced and are looking for suggestions on how we might improve things.
In order to do this migration, we are using 2 ec2 instances (c4.8xlarges). Each runs up to 10 processes of our migration program; we've split the work among the processes by some internal parameters and know that some processes will run longer than others. Each process queries our RDS database for 100,000 records. We then split these into partitions of 25 each and use a threadpool of 10 threads to call the DynamoDB java SDK's batchSave() method. Each call to batchSave() is sending only 25 documents that are less than 1k each, so we expect each to only make a single HTTP call out to AWS. This means that at any given time, we can have as many as 100 threads on a server each making calls to batchSave with 25 records. Our RDS instance handled the load of queries to it just fine during this time, and our 2 EC2 instances did as well. On the ec2 side, we did not max out our cpu, memory, or network in or network out. Our writes are not grouped by hash key, as we know that can be known to slow down dynamo writes. In general, in a group of 100,000 records, they are split across 88,000 different hash keys. I created the dynamo table initially with 30,000 write throughput, but configured up to 40,000 write throughput at one point during the test, so our understanding is that there are at least 40 partitions on the dynamo side to handle this.
We saw very variable responses times in our calls to batchSave() to dynamo throughout this period. For one span of 20 minutes while I was running 100 threads per ec2 instance, the average time was 0.636 seconds, but the median was only 0.374, so we've got a lot of calls taking more than a second. I'd expect to see much more consistency in the time it takes to make these calls from an EC2 instance to dynamo. Our dynamo table seems to have plenty of throughput configured, and the EC2 instance is below 10% CPU, and the network in and out look healthy, but are not close to be maxed out. The CloudWatch graphs in the console (which are fairly terrible...) didn't show any throttling of write requests.
After I took these sample times, some of our processes finished their work, so we were running less threads on our ec2 instances. When that happened, we saw dramatically improved response times in our calls to dynamo. e.g. when we were running 40 threads instead of 100 on the ec2 instance, each making calls to batchSave, the response times improved more than 5x. However, we did NOT see improved write throughput even with the increased better response times. It seems that no matter what we configured our write throughput to be, we never really saw the actual throughput exceed 15,000.
We'd like some advice on how best to achieve better performance on a Dynamo migration like this. Our production migration this summer will be time-sensitive, of course, and by then, we'll be looking to migrate about 4 billion records. Does anyone have any advice on how we can achieve an overall higher throughput rate? If we're willing to pay for 30,000 units of write throughput for our main index during the migration, how can we actually achieve performance close to that?

One component of BatchWrite latency is the Put request that takes the longest in the Batch. Considering that you have to loop over the List of DynamoDBMapper.FailedBatch until it is empty, you might not be making progress fast enough. Consider running multiple parallel DynamoDBMapper.save() calls instead of batchSave so that you can make progress independently for each item you write.
Again, Cloudwatch metrics are 1 minute metrics so you may have peaks of consumption and throttling that are masked by the 1 minute window. This is compounded by the fact that the SDK, by default, will retry throttled calls 10 times before exposing the ProvisionedThroughputExceededException to the client, making it difficult to pinpoint when and where the actual throttling is happening. To improve your understanding, try reducing the number of SDK retries, request ConsumedCapacity=TOTAL, self-throttle your writes using Guava RateLimiter as is described in the rate-limited scan blog post, and log throttled primary keys to see if any patterns emerge.
Finally, the number of partitions of a table is not only driven by the amount of read and write capacity units you provision on your table. It is also driven by the amount of data you store in your table. Generally, a partition stores up to 10GB of data and then will split. So, if you just write to your table without deleting old entries, the number of partitions in your table will grow without bound. This causes IOPS starvation - even if you provision 40000 WCU/s, if you already have 80 partitions due to the amount of data, the 40k WCU will be distributed among 80 partitions for an average of 500 WCU per partition. To control the amount of stale data in your table, you can have a rate-limited cleanup process that scans and removes old entries, or use rolling time-series tables (slides 84-95) and delete/migrate entire tables of data as they become less relevant. Rolling time-series tables is less expensive than rate-limited cleanup as you do not consume WCU with a DeleteTable operation, while you consume at least 1 WCU for each DeleteItem call.

Can ehCache return old cached value when cache is refreshing

I am using Spring framework and Java CXF services.
There are several clients accessing a fixed list of data, which are fetched from Database. Since fetching from Database is an expensive call and DB doesn't change frequently, I thought it was good to cache value.
DB refresh -> 15 times a day at indefinite intervals
Cache refresh -> Every 15 minutes.
Cache loading by call to DB takes 15 seconds.
Now, If while Cache is refreshing by making a call to DB, it takes 15 secs. Within this 15 seconds, if clients wants to access data, I am OK to send the previous data in cache. Application is that Stale and Outdated data can be tolerated instead of waiting for 15 secs (there is a delta function which brings data after the time cache was loaded which is very inexpensive). Is there a way in ehcache to return old data in cache while the new data is being loaded while cache is refreshing at regular interval of 15 minutes?

A consistent copy of the cache on local disk provides many possibilities for business requirements, such as working with different datasets according to time-based needs or moving datasets around to different locations. It can range from a simple key-value persistence mechanism with fast read performance, to an operational store with in-memory speeds during operation for both reads and writes.
Ehcache has a RestartStore which provides fast restartability and options for cache persistence. The RestartStore implements an on-disk mirror of the in-memory cache. After any restart, data that was last in the cache will automatically load from disk into the RestartStore, and from there the data will be available to the cache.
Cache persistence on disk is configured by adding the sub-element to a cache configuration. The sub-element includes two attributes: strategy and synchronousWrites.
<cache>
<persistence strategy=”localRestartable|localTempSwap|none|distributed” synchronousWrites=”false|true”/>
</cache>
For more information Ehcache

If you use a "cache aside" pattern and your application code is responsible for reloading the cached data, before it expires, you will observe exactly the behaviour you desire as long as the cache holds the mapping.
If you use a "cache through" pattern, with a cache loader, then there is no way to obtain such a behaviour as when the mapping is being reloaded it will lock the key, thus blocking other threads trying to get to it at the same time.

memcached and performance

I might be asking very basic question, but could not find a clear answer by googling, so putting it here.
Memcached caches information in a separate Process. Thus in order to get the cached information requires inter-process communication (which is generally serialization in java). That means, generally, to fetch a cached object, we need to get a serialized object and generally transport it to network.
Both, serialization and network communication are costly operations. if memcached needs to use both of these (generally speaking, there might be cases when network communication is not required), then how Memcached is fast? Is not replication a better solution?
Or this is a tradeoff of distribution/platform independency/scalability vs performance?

You are right that looking something up in a shared cache (like memcached) is slower than looking it up in a local cache (which is what i think you mean by "replication").
However, the advantage of a shared cache is that it is shared, which means each user of the cache has access to more cache than if the memory was used for a local cache.
Consider an application with a 50 GB database, with ten app servers, each dedicating 1 GB of memory to caching. If you used local caches, then each machine would have 1 GB of cache, equal to 2% of the total database size. If you used a shared cache, then you have 10 GB of cache, equal to 20% of the total database size. Cache hits would be somewhat faster with the local caches, but the cache hit rate would be much higher with the shared cache. Since cache misses are astronomically more expensive than either kind of cache hit, slightly slower hits are a price worth paying to reduce the number of misses.
Now, the exact tradeoff does depend on the exact ratio of the costs of a local hit, a shared hit, and a miss, and also on the distribution of accesses over the database. For example, if all the accesses were to a set of 'hot' records that were under 1 GB in size, then the local caches would give a 100% hit rate, and would be just as good as a shared cache. Less extreme distributions could still tilt the balance.
In practice, the optimum configuration will usually (IMHO!) be to have a small but very fast local cache for the hottest data, then a larger and slower cache for the long tail. You will probably recognise that as the shape of other cache hierarchies: consider the way that processors have small, fast L1 caches for each core, then slower L2/L3 caches shared between all the cores on a single die, then perhaps yet slower off-chip caches shared by all the dies in a system (do any current processors actually use off-chip caches?).

You are neglecting the cost of disk i/o in your your consideration, which is generally going to be the slowest part of any process, and is the main driver IMO for utilizing in-memory caching like memcached.

Memory caches use ram memory over the network. Replication uses both ram-memory as well as persistent disk memory to fetch data. Their purposes are very different.
If you're only thinking of using Memcached to store easily obtainable data such as 1-1 mapping for table records :you-re-gonna-have-a-bad-time:.
On the other hand if your data is the entire result-set of a complex SQL query that may even oveflow the SQL memory pool (and need to be temporarily written to disk to be fetched) you're going to see a big speed-up.
The previous example mentions needing to write data to disk for a read operation - yes it happens if the result set is too big for memory (imagine a CROSS JOIN) that means that you both read and write to that drive (thrashing comes to mind).
In A highly optimized application written in C for example you may have a total processing time of 1microsec and may need to wait for networking and/or serialization/deserialization (marshaling/unmarshaling) for a much longer time than the app execution time itself. That's when you'll begin too feel the limitations of memory-caching over the network.

JCS Cache shutdown,guaranteed persistence to disk

Iam using JCS for caching.Now I am using disk cache to temporarily store all the data.The problem is when I use JCS,the keys are written to disk only if the cache is properly shutdown.
I am using the disk usage pattern as UPDATE which tells JCS to immediately write data to the disk without keeping it in memory.But the problem is we are not maitaining the key list of objects in the cache.So I use group cache access and get the keys from the cache and then iterate through the keys to get the results.
So now I am caught in a situation where I have to shutdown the cache properly i.e after all the data is written to disk using Indexed disk cache.But there is a complexity here,the indexed disk cache uses a background thread to write to disk which does not return anything on its status.
So now,I am unable to guarantee that indexed disk cache has written data to the disk to my front end implementation.Is there a way to tackle this situation,because now I am just sleeping some random time(say 10 seconds),before the cache is shutdown,which is a very stupid way of doing it actually.
Edit : I am facing this issue with Memory Cache as well,but a sleep of one second is mostly enough for 500mb of data.But the case of disk cache is little different.

It could be because your objects are stored in the memory and waiting to write to disk. If you need to write the objects immediately to disk while in execution then you need to make the MaxObjects of your cache configs to 0.
jcs.region.<yourRegion>.cacheattributes.MaxObjects=0
jcs.region.<yourRegion>.cacheattributes.DiskUsagePattern=UPDATE
I know you already aware of UPDATE. Adding it for reference again.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.