I am looking to get some ideas on how I can solve my failover problem in my Java service.
At a high level, my service receives 3 separate object streams of data from another service, performs some amalgamating logic and then writes to a datastore.
Each object in the stream will have a unique key. Data from the 3 streams can arrive simultaneously, there is no guaranteed ordering.
After the data arrives, it will be stored in some java.util.concurrent collection, such as a BlockingQueue or a ConcurrentHashMap.
The problem is that this service must support failover, and I am not sure how to resolve this if failover were to take place when data is stored in an in-memory object.
One simple idea I have is the following:
Write to a file/elsewhere when receiving an object, and prior to adding to queue
When an object is finally procesed and stored in the datastore
When failover occurs, ensure that same file is copied across and we know which objects we need to receive data for
Performance is a big factor in my service, and as IO is expensive this seems like a crude approach and is quite simplistic.
Therefore, I am wondering if there are any libraries etc out there that can solve this problem easily?
I would use Java Chronicle partly because I wrote it but mostly because ...
it can write and read millions of entries per second to disk in a text or a binary format.
can be shared between processes e.g. active-active clustering with sub-microsecond latency.
doesn't require a system call or flush to push out the data.
the producer is not slowed by the consumer which can be GBs ahead (more than the total memory of the machine)
it can be used in a low heap GC-less and lockless manner.
Related
I'm using IgniteDataStreamer with allowOverwrite to load continious data.
Question 1.
From javadoc:
Note that streamer will stream data concurrently by multiple internal threads, so the data may get to remote nodes in different order from which it was added to the streamer.
Reordering is not acceptable in my case. Will perNodeParallelOperations set to 1 guarantee keeping order of addData calls? There is a number of caches being simultaneously loaded with IgniteDataStreamer, so Ignite server node threads will all be utilized anyway.
Question 2.
My streaming application could hang for a couple of seconds due to GC pause. I want to avoid cache loading pause at that moments and keep high average cache writing speed. Is iy possible to configure IgniteDataStreamer to keep (bounded) queue of incoming batches on server node, that would be consumed while streaming (client) app hangs? See question 1, queue should be consumed sequentially. It's OK to utilize some heap for it.
Question 3.
perNodeBufferSize javadoc:
This setting controls the size of internal per-node buffer before buffered data is sent to remote node
According to javadoc, data transfer is triggered by tryFlush / flush / autoFlush, so how does it correlate with perNodeBufferSize limitation? Would flush be ignored if there is less than perNodeBufferSize messages (I hope no)?
I don't recommend trying to avoid reordering in DataStreamer, but if you absolutely need to do that, you will also need to set data streamer pool size to 1 on server nodes. If it's larger then data is split into stripes and not sent sequentially.
DataStreamer is designed for throughput, not latency. So there's not much you can do here. Increasing perThreadBufferSize, perhaps?
Data transfer is automatically started when perThreadBufferSize is reached for any stripe.
I a new in java. I'm c++ programmer and nowadays study java for 2 months.
Sorry for my pool English.
I have a question that if it needs memory pool or object pool for Akka actor model. I think if i send some message from one actor to one of the other actors, i have to allocate some heap memory(just like new Some String, or new Some BigInteger and other more..) And times on, the garbage collector will be got started(I'm not sure if it would be started) and it makes my application calculate slowly.
So I search for the way to make the memory-pool and failed(Java not supported memory pool). And I Could Make the object pool but in others project i did not find anybody use the object-pool with actor(also in Akka Homepage).
Is there any documents bout this topic in the akka hompage? Plz tell me the link or tell me the solution of my question.
Thanks.
If, as it's likely you will, you are using Akka across multiple computers, messages are serialized on the wire and sent to the other instance. This means that simply a local memory pool won't suffice.
While it's technically possible that you write a custom JSerializer (see the doc here) implementation that stores local messages in a memory pool after deserializing them, I feel that's a bit of an overkill for most applications (and easy to cock-up and actually worsen performance with lookup times in the map)
Yes, when the GC kicks in, the app will lag a bit under heavy loads. But in 95% of the scenarios, especially under a performant framework like Akka, GC will not be your bottleneck: IO will.
I'm not saying you shouldn't do it. I'm saying that before you take on the task, given its non-triviality, you should measure the impact of GC on your app at runtime with things like Kamon or other Akka-specialized monitoring solutions, and only after you are sure it's worth it you can go for it.
Using an ArrayBlockingQueue to hold a pool of your objects should help,
Here is the example code.
TO create a pool and insert an instance of pooled object in it.
BlockingQueue<YOURCLASS> queue = new ArrayBlockingQueue<YOURCLASS>(256);//Adjust 256 to your desired count. ArrayBlockingQueues size cannot be adjusted once it is initialized.
queue.put(YOUROBJ); //This should be in your code that instanciates the pool
and later where you need it (in your actor that receives message)
YOURCLASS instanceName = queue.take();
You might have to write some code around this to create and manage the pool.
But this is the gist of it.
One can do object pooling to minimise long tail of latency (by sacrifice of median in multythreaded environment). consider using appropriate queues e.g. from JCTools, Distruptor, or Agrona. Don't forget about rules of engagement for state exhange via mutable state using multiple thereads in stored objects - https://youtu.be/nhYIEqt-jvY (the best content I was able to find).
Again, don't expect to improve throughout using such slightly dangerous techniques. You will loose L1-L3 cache efficiency and will polite PCI with barriers.
Bit of tangent (to get sense of low latency technology):
One may consider some GC implementation with lower latency if you want to stick with Akka, or use custom reactive model where object pool is used by single thread, or memory copied over e.g. Distrupptor's approach.
Another alternative is using memory regions (the way Erlang VM works). It creates garbage, but in form easy to handle by GC!
If you go to very low latency IO and are the biggest enemy of latency - forget legacy TCP (vs RDMA over Infininiband), switches (over swichless), OS accessing disk via OS calls and file system (use RDMA), forget interrupts shared by same core, not pinned cores (and without spinning for input) to real CPU core (vs virtual/hyperthreads) or inter NUMA communication or messages one by one instead of hardware multicast (or better optical switch) for multiple consumers and don't forget turning Epsilon GC for JVM ;)
Dynamo db allows only 25 requests per batch. Is there any way we can increase this in Java as I have to process 1000's of records per second? Any solution better than dividing it in batches and processing them?
the 25 per BatchWriteItem is a hard dynamodb limit as documented here: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html
There is nothing preventing you from doing multiple BatchWrites in parallel. The thing that is going to gate how much you can write is the write-provisioned-throughput on the table.
BatchWrites in DynamoDB were introduce to reduce the number of round trips required to perform multiple write operations for languages that do not provide opportunities for parallel threads to perform the work such as PHP.
While you will still get better performance because of the reduced round trips by using the batch API, there is still the possibility that individual writes can fail and your code will need to look for those. A robust way to perform massively parallel writes using Java would be to use the ExecutorService class. This provides a simple mechanism to use multiple threads to perform the inserts. However, just as individual items within a batch can fail, you will want to track the Future objects to ensure the writes are performed successfully.
Another way to improve throughput is to run your code on EC2. If you are calling DynamoDB from your laptop or a datacenter outside of AWS the round trip time will take longer and the requests will be slightly slower.
The bottom line is to use standard Java multi-threading techniques to get the performance you want. However, past a certain point you may need to fan out and use additional hardware to drive even higher write OPS.
Whenever you've got a large stream of real-time data that needs to end up in AWS, Kinesis Streams are probably the way to go. Particularly with AWS Kinesis Firehose, you can pipe your data to S3 at massive scale with no administrative overhead. You can then use DataPipeline to move it to Dynamo.
I currently involved in a project to move our search algorithm which is based on dijkstra/a-star algorithm to a production system. Basically, the algorithm receives a request and starts a search until the optimal solutions is found, which usually takes a few seconds. The problem is that the prototype version of our algorithm relies on the JDK priority queue (which is basically a binary heap) which consumes a high amount of memory during the search. So one of the big problems is how to handle the scalability of the system if we want to put the algorithm in a production system, handling multiple requests concurrently. We are trying to figure out what's the best option to do that, and the ideas that are flying through our minds are:
The most trivial approach is to create a new instance of the algorithm each time a request is received, but it does not look like an efficient way to solve the problem (we would require a high amounts of ram for each instance).
Use some kind of persistent and efficient store/database to move part of the elements of the queue to there when the size of the queue is too big. This can alleviate the memory problems but new problems arise, like keeping the order between elements in the in-memory queue and the elements in the store.
Delegate the task of handling the queue to a big framework, like Hazelcast. Each instance of the algorithm can use a distributed queue with hazelcast. The problem is that Hazelcast does not have any kind of sorted queues, so we have to explicitly handle the order of the queue from outside the queue, which is a big performance issue.
We are also considering the idea of using ActiveMQ, although the framework is not designed for this sort of problems. The priority queue of ActiveMQ manages only 9 different priorities, which is not enough for our problem as we sort the elements in the queue based in a float-value (infinity priorities).
We are completely lost in this architecture design problem. Any advice is welcome.
I'm evaluating Terracotta to help me scale up an application which is currently RAM-bounded. It is a collaborative filter and stores about 2 kilobytes of data per-user. I want to use Amazon's EC2, which means I'm limited to 14GB of RAM, which gives me an effective per-server upper-bound of around 7 million users. I need to be able to scale beyond this.
Based on my reading so-far I gather that Terracotta can have a clustered heap larger than the available RAM on each server. Would it be viable to have an effective clustered heap of 30GB or more, where each of the servers only supports 14GB?
The per-user data (the bulk of which are arrays of floats) changes very frequently, potentially hundreds of thousands of times per minute. It isn't necessary for every single one of these changes to be synchronized to other nodes in the cluster the moment they occur. Is it possible to only synchronize some object fields periodically?
I'd say the answer is a qualified yes for this. Terracotta does allow you to work with clustered heaps larger than the size of a single JVM although that's not the most common use case.
You still need to keep in mind a) the working set size and b) the amount of data traffic. For a), there is some set of data that must be in memory to perform the work at any given time and if that working set size > heap size, performance will obviously suffer. For b), each piece of data added/updated in the clustered heap must be sent to the server. Terracotta is best when you are changing fine-grained fields in pojo graphs. Working with big arrays does not take the best advantage of the Terracotta capabilities (which is not to say that people don't use it that way sometimes).
If you are creating a lot of garbage, then the Terracotta memory managers and distributed garbage collector has to be able to keep up with that. It's hard to say without trying it whether your data volumes exceed the available bandwidth there.
Your application will benefit enormously if you run multiple servers and data is partitioned by server or has some amount of locality of reference. In that case, you only need the data for one server's partition in heap and the rest does not need to be faulted into memory. It will of course be faulted if necessary for failover/availability if other servers go down. What this means is that in the case of partitioned data, you are not broadcasting to all nodes, only sending transactions to the server.
From a numbers point of view, it is possible to index 30GB of data, so that's not close to any hard limit.