currently we have a publish / consumer service where the consumer writes the messages received to AWS S3. We are currently writing more than 100.000.000 objects per month.
However, we can group this messages based on some rules in order to save some money.
These rules, can be something like:
If we have received 10 messages of the User 1 -> group them, and
write to S3.
If we have received < 10 messages of the User 1 and the elapsed time since the last message is more than 5 seconds, flush to S3.
If the "internal" queue, is bigger than N, start to flush
What we don't want is to eat our memory... Because of that, I am looking of what would be the best approach from design patterns perspective, taking into consideration that we are speaking about a high loaded system, so we don't have infinite memory resources.
Thanks!,
Well, based on your further explanation in comments, there are related algorithms called Leaky bucket and Token bucket. Their primary purpose is slightly different but you may consider using some modification - especially you may consider viewing the "leaking droplets out of the bucket" as the regular commit of all the messages of a single user in a bunch flush to S3.
So more or less modification like this (please read the description of the algorithms first):
You have a bucket per user (you may easily afford it, as you have only about 300 users)
Each bucket is filled by messages coming from each user
You regularly let each bucket leak (flush all the messages or just a limited bunch of messages)
I guess that it somehow follows what your original requirement might have been.
Related
I'm using IgniteDataStreamer with allowOverwrite to load continious data.
Question 1.
From javadoc:
Note that streamer will stream data concurrently by multiple internal threads, so the data may get to remote nodes in different order from which it was added to the streamer.
Reordering is not acceptable in my case. Will perNodeParallelOperations set to 1 guarantee keeping order of addData calls? There is a number of caches being simultaneously loaded with IgniteDataStreamer, so Ignite server node threads will all be utilized anyway.
Question 2.
My streaming application could hang for a couple of seconds due to GC pause. I want to avoid cache loading pause at that moments and keep high average cache writing speed. Is iy possible to configure IgniteDataStreamer to keep (bounded) queue of incoming batches on server node, that would be consumed while streaming (client) app hangs? See question 1, queue should be consumed sequentially. It's OK to utilize some heap for it.
Question 3.
perNodeBufferSize javadoc:
This setting controls the size of internal per-node buffer before buffered data is sent to remote node
According to javadoc, data transfer is triggered by tryFlush / flush / autoFlush, so how does it correlate with perNodeBufferSize limitation? Would flush be ignored if there is less than perNodeBufferSize messages (I hope no)?
I don't recommend trying to avoid reordering in DataStreamer, but if you absolutely need to do that, you will also need to set data streamer pool size to 1 on server nodes. If it's larger then data is split into stripes and not sent sequentially.
DataStreamer is designed for throughput, not latency. So there's not much you can do here. Increasing perThreadBufferSize, perhaps?
Data transfer is automatically started when perThreadBufferSize is reached for any stripe.
I have been running a test for a large data migration to dynamo that we intend to do in our prod account this summer. I ran a test to batch write about 3.2 billion documents to our dynamo table, which has a hash and range keys and two partial indexes. Each document is small, less than 1k. While we succeeded in getting the items written in about 3 days, we were disappointed with the Dynamo performance we experienced and are looking for suggestions on how we might improve things.
In order to do this migration, we are using 2 ec2 instances (c4.8xlarges). Each runs up to 10 processes of our migration program; we've split the work among the processes by some internal parameters and know that some processes will run longer than others. Each process queries our RDS database for 100,000 records. We then split these into partitions of 25 each and use a threadpool of 10 threads to call the DynamoDB java SDK's batchSave() method. Each call to batchSave() is sending only 25 documents that are less than 1k each, so we expect each to only make a single HTTP call out to AWS. This means that at any given time, we can have as many as 100 threads on a server each making calls to batchSave with 25 records. Our RDS instance handled the load of queries to it just fine during this time, and our 2 EC2 instances did as well. On the ec2 side, we did not max out our cpu, memory, or network in or network out. Our writes are not grouped by hash key, as we know that can be known to slow down dynamo writes. In general, in a group of 100,000 records, they are split across 88,000 different hash keys. I created the dynamo table initially with 30,000 write throughput, but configured up to 40,000 write throughput at one point during the test, so our understanding is that there are at least 40 partitions on the dynamo side to handle this.
We saw very variable responses times in our calls to batchSave() to dynamo throughout this period. For one span of 20 minutes while I was running 100 threads per ec2 instance, the average time was 0.636 seconds, but the median was only 0.374, so we've got a lot of calls taking more than a second. I'd expect to see much more consistency in the time it takes to make these calls from an EC2 instance to dynamo. Our dynamo table seems to have plenty of throughput configured, and the EC2 instance is below 10% CPU, and the network in and out look healthy, but are not close to be maxed out. The CloudWatch graphs in the console (which are fairly terrible...) didn't show any throttling of write requests.
After I took these sample times, some of our processes finished their work, so we were running less threads on our ec2 instances. When that happened, we saw dramatically improved response times in our calls to dynamo. e.g. when we were running 40 threads instead of 100 on the ec2 instance, each making calls to batchSave, the response times improved more than 5x. However, we did NOT see improved write throughput even with the increased better response times. It seems that no matter what we configured our write throughput to be, we never really saw the actual throughput exceed 15,000.
We'd like some advice on how best to achieve better performance on a Dynamo migration like this. Our production migration this summer will be time-sensitive, of course, and by then, we'll be looking to migrate about 4 billion records. Does anyone have any advice on how we can achieve an overall higher throughput rate? If we're willing to pay for 30,000 units of write throughput for our main index during the migration, how can we actually achieve performance close to that?
One component of BatchWrite latency is the Put request that takes the longest in the Batch. Considering that you have to loop over the List of DynamoDBMapper.FailedBatch until it is empty, you might not be making progress fast enough. Consider running multiple parallel DynamoDBMapper.save() calls instead of batchSave so that you can make progress independently for each item you write.
Again, Cloudwatch metrics are 1 minute metrics so you may have peaks of consumption and throttling that are masked by the 1 minute window. This is compounded by the fact that the SDK, by default, will retry throttled calls 10 times before exposing the ProvisionedThroughputExceededException to the client, making it difficult to pinpoint when and where the actual throttling is happening. To improve your understanding, try reducing the number of SDK retries, request ConsumedCapacity=TOTAL, self-throttle your writes using Guava RateLimiter as is described in the rate-limited scan blog post, and log throttled primary keys to see if any patterns emerge.
Finally, the number of partitions of a table is not only driven by the amount of read and write capacity units you provision on your table. It is also driven by the amount of data you store in your table. Generally, a partition stores up to 10GB of data and then will split. So, if you just write to your table without deleting old entries, the number of partitions in your table will grow without bound. This causes IOPS starvation - even if you provision 40000 WCU/s, if you already have 80 partitions due to the amount of data, the 40k WCU will be distributed among 80 partitions for an average of 500 WCU per partition. To control the amount of stale data in your table, you can have a rate-limited cleanup process that scans and removes old entries, or use rolling time-series tables (slides 84-95) and delete/migrate entire tables of data as they become less relevant. Rolling time-series tables is less expensive than rate-limited cleanup as you do not consume WCU with a DeleteTable operation, while you consume at least 1 WCU for each DeleteItem call.
I have a scenario, in which
A HUGE Input file with a specific format, delimited with \n has to be read, it has almost 20 Million records.
Each Record has to be read and processed by sending it to server in specific format.
=====================
I am thinking on how to design it.
- Read the File(nio)
- The thread that reads the file can keep those chunks into a JMS queue.
- Create n threads representing n servers (to which the data is to be sent). and then n Threads running in parallel can pick up one chunk at a time..execute that chunk by sending requests to the server.
Can you suggest if the above is fine, or you see any flaw(s) :). Also it would be great if you can suggest better way/ technologies to do this.
Thank you!
Updated : I wrote a program to read that file with 20m Records, using Apache Commons IO(file iterator) i read the file in chunks (10 lines at at time). and it read the file in 1.2 Seconds. How good is this? Should i think of going to nio? (When i did put a log to print the chunks, it took almost 26seconds! )
20 million records isn't actually that many so first off I would try just processing it normally, you may find performance is fine.
After that you will need to measure things.
You need to read from the disk sequentially for good speed there so that must be single threaded.
You don't want the disk read waiting for the networking or the networking waiting for the disk reads so dropping the data read into a queue is a good idea. You probably will want a chunk size larger than one line though for optimum performance. Measure the performance at different chunk sizes to see.
You may find that network sending is faster than disk reading already. If so then you are done, if not then at that point you can spin up more threads reading from the queue and test with them.
So your tuning factors are:
chunk size
number of threads.
Make sure you measure performance over a decent sized amount of data for various combinations to find the one that works best for your circumstances.
I believe you could batch the records instead of sending one at a time. You could avoid unnecessary network hops given the volume of data that need to be processed by the server.
I am looking to get some ideas on how I can solve my failover problem in my Java service.
At a high level, my service receives 3 separate object streams of data from another service, performs some amalgamating logic and then writes to a datastore.
Each object in the stream will have a unique key. Data from the 3 streams can arrive simultaneously, there is no guaranteed ordering.
After the data arrives, it will be stored in some java.util.concurrent collection, such as a BlockingQueue or a ConcurrentHashMap.
The problem is that this service must support failover, and I am not sure how to resolve this if failover were to take place when data is stored in an in-memory object.
One simple idea I have is the following:
Write to a file/elsewhere when receiving an object, and prior to adding to queue
When an object is finally procesed and stored in the datastore
When failover occurs, ensure that same file is copied across and we know which objects we need to receive data for
Performance is a big factor in my service, and as IO is expensive this seems like a crude approach and is quite simplistic.
Therefore, I am wondering if there are any libraries etc out there that can solve this problem easily?
I would use Java Chronicle partly because I wrote it but mostly because ...
it can write and read millions of entries per second to disk in a text or a binary format.
can be shared between processes e.g. active-active clustering with sub-microsecond latency.
doesn't require a system call or flush to push out the data.
the producer is not slowed by the consumer which can be GBs ahead (more than the total memory of the machine)
it can be used in a low heap GC-less and lockless manner.
I'm designing a system that will use jms and some messaging software (I'm leaning towards ActiveMQ) as middleware. There will be less than 100 agents, each pushing at most 5000 messages per day through the queue.
The payload per message will be around 100 bytes each. I expect roughly half (2500) of the messages to cluster around midnight and the other half will be somewhat evenly distributed during the day.
The figures given above are all on the higher end of what I expect. (Yeah, I'll probably eat that statement in a near future).
There is one type of message where the payload will be considerably larger, say in the range of 5-50mb.
These messages will only be sent a few times per day from each agent.
My questions are:
Will this cause me problems in any way or is it perfectly normal to send larger amounts of data through a message queue?
For example, will it reduce throughput (smaller messages queuing up) while dealing with the larger messages?
Or will the message queue choke on larger messages?
Or should I approach this in a different way, say sending the location of the data through jms, and let the end receiver pick up the data elsewhere?
(I was hoping not to have a special case due to coupling, security issues, and extra configuration).
I'm completely new to the practical details of jms, so just tell me if I need to provide more details.
Edited:
I accepted Andres truly awesome answer. Keep posting advices and opinions, I will keep upvote everything useful.
Larger messages will definitely have an impact, but the sizes you mention here (5-50MB) should be managable by any decent JMS server.
However, consider the following. While processing a particular message, the entire message is read into memory. So if 100 agents each send a 50MB message to a different queue at around the same time, or at different times but the messages take long to dequeue, you could run into a situation where you are trying to put 5000MB worth of messages into memory. I have run into similar problems with 4MB messages with ActiveMQ in the past, however there were more messages being sent than the figures mentioned here. If the messages are all sent to the same (persistent) queue, this should not be a problem, as only the message being processed needs to be in memory.
So it depends on your setup. If the theoretical upper limit of 5000MB is managable for you (and keep the 32-bit JVM limit of 2000MB in mind) then go ahead, however this approach clearly does not scale very well so I wouldn't suggest it. If everything is sent to one persistent queue, it would probably be fine, however I would recommend putting a prototype under load first to make sure. Processing might be slow, but not necessarily slower than if it is fetched by some other mechanism. Either way, I would definitely recommend sending the smaller messages to separate destinations where they can be processed in parallel with the larger messages.
We are running a similar scenario with a hgher amount of messages. We did it similar to Andres proposal with using different queues for the big amount of smaller messages (which are still ~3-5MB in our scenario) and the few big messages that are around 50-150 MB.
In addition to the memory problems already cited, we also encountered general performance issuees on the message broker when processing a huge number of persistent large messages. This is caused by the need of persisting these messages somehow into the filesystem, we ran into bottlenecks on this side.
of cause the message size has an impact on the throughput (in msgs/sec). the larger the messages, the smaller the throughput.