How to integrate LMAX within a real financial application

How to integrate LMAX within a real financial application - java

I am also thinking of integrating the disruptor pattern in our application. I am a bit unsure about a few things before I start using the disruptor
I have 3 producers, mainly a FIX thread which de-serialises the requests. Another thread which continously modifies order price as the market moves. Also we have one more thread which is responsible for de-serialising the requests sent from a GUI application. All three threads currently write to a Blocking Queue (hence we see a lot of contention on the queue)
The disruptor talks about a Single writer principle and from what I have read that approach scales the best. Is there any way we could make the above three threads obey the single writer principle?
Also in a typical request/response application, specially in our case we have contention on an in memory cache, as we need to lock the cache when we update the cache with the response, whilst a request might be happening for the same order. How do we handle this through the disruptor, i.e. how do I tie up a response to a particular request? Can I eliminate the lock on the cache if yes how?
Any suggestions/pointers would be highly appreciated. We are currently using Java 1.6

I'm new to distruptor and am trying to understand as much usecases as possible. I have tried to answer your questions.
Yes, Disruptor can be used to sequence calls from multiple
producers. I understand that all 3 threads try to update the state
of a shared object. And a single consumer which takes necessary action on the shared object. Internally you can have the single consumer delegate calls to the appropriate single threaded handler based on responsibility. The
The Disruptor exactly does this. It sequences the calls such that
the state is accessed only by a thread at a time. If there's a specific order in which the event handlers are to be invoked, set up the memory barrier. The latest version of Disruptor has a DSL that lets you setup the order easily.
The Cache can be abstracted and accessed through the Disruptor. At a time, only a
Reader or a Writer would get access to the cache, since all calls to
the cache are sequential.

Related

Best practices with Akka in Scala and third-party Java libraries

I need to use memcached Java API in my Scala/Akka code. This API gives you both synchronous and asynchronous methods. The asynchronous ones return java.util.concurrent.Future. There was a question here about dealing with Java Futures in Scala here How do I wrap a java.util.concurrent.Future in an Akka Future?. However in my case I have two options:
Using synchronous API and wrapping blocking code in future and mark blocking:
Future {
blocking {
cache.get(key) //synchronous blocking call
}
}
Using asynchronous Java API and do polling every n ms on Java Future to check if the future completed (like described in one of the answers above in the linked question above).
Which one is better? I am leaning towards the first option because polling can dramatically impact response times. Shouldn't blocking { } block prevent from blocking the whole pool?

I always go with the first option. But i am doing it in a slightly different way. I don't use the blocking feature. (Actually i have not thought about it yet.) Instead i am providing a custom execution context to the Future that wraps the synchronous blocking call. So it looks basically like this:
val ecForBlockingMemcachedStuff = ExecutionContext.fromExecutorService(Executors.newFixedThreadPool(100)) // whatever number you think is appropriate
// i create a separate ec for each blocking client/resource/api i use
Future {
cache.get(key) //synchronous blocking call
}(ecForBlockingMemcachedStuff) // or mark the execution context implicit. I like to mention it explicitly.
So all the blocking calls will use a dedicated execution context (= Threadpool). So it is separated from your main execution context responsible for non blocking stuff.
This approach is also explained in a online training video for Play/Akka provided by Typesafe. There is a video in lesson 4 about how to handle blocking calls. It is explained by Nilanjan Raychaudhuri (hope i spelled it correctly), who is a well known author for Scala books.
Update: I had a discussion with Nilanjan on twitter. He explained what the difference between the approach with blocking and a custom ExecutionContext is. The blocking feature just creates a special ExecutionContext. It provides a naive approach to the question how many threads you will need. It spawns a new thread every time, when all the other existing threads in the pool are busy. So it is actually an uncontrolled ExecutionContext. It could create lots of threads and lead to problems like an out of memory error. So the solution with the custom execution context is actually better, because it makes this problem obvious. Nilanjan also added that you need to consider circuit breaking for the case this pool gets overloaded with requests.
TLDR: Yeah, blocking calls suck. Use a custom/dedicated ExecutionContext for blocking calls. Also consider circuit breaking.

The Akka documentation provides a few suggestions on how to deal with blocking calls:
In some cases it is unavoidable to do blocking operations, i.e. to put
a thread to sleep for an indeterminate time, waiting for an external
event to occur. Examples are legacy RDBMS drivers or messaging APIs,
and the underlying reason is typically that (network) I/O occurs under
the covers. When facing this, you may be tempted to just wrap the
blocking call inside a Future and work with that instead, but this
strategy is too simple: you are quite likely to find bottlenecks or
run out of memory or threads when the application runs under increased
load.
The non-exhaustive list of adequate solutions to the “blocking
problem” includes the following suggestions:
Do the blocking call within an actor (or a set of actors managed by a router), making sure to configure a thread pool which is either
dedicated for this purpose or sufficiently sized.
Do the blocking call within a Future, ensuring an upper bound on the number of such calls at any point in time (submitting an unbounded
number of tasks of this nature will exhaust your memory or thread
limits).
Do the blocking call within a Future, providing a thread pool with an upper limit on the number of threads which is appropriate for the
hardware on which the application runs.
Dedicate a single thread to manage a set of blocking resources (e.g. a NIO selector driving multiple channels) and dispatch events as they
occur as actor messages.
The first possibility is especially well-suited for resources which
are single-threaded in nature, like database handles which
traditionally can only execute one outstanding query at a time and use
internal synchronization to ensure this. A common pattern is to create
a router for N actors, each of which wraps a single DB connection and
handles queries as sent to the router. The number N must then be tuned
for maximum throughput, which will vary depending on which DBMS is
deployed on what hardware.

Common practices to avoid timeouts / starvation in Java?

I have a web-service that write files to disk and other stuff to database. The entire operation takes 1-2 seconds for each write.
The service can, bur that is unlikely, be called from several clients at the same time. Let´s assume that 20 clients call the webservice at the same time, the write operations must be synchronized. In that case, some clients can get a time out exception because they have to wait to many seconds.
Are there any good practices to solve these kind of situations? As it is now, the methods are synchronized (and that can cause the starvation/timeouts).
Should I let all threads get into the write method by removing the synchronized keyword and put their task into a task queue to avoid a timeout? Is that the correct way to get arount this?

Removing the synchronized and putting it into a task queue by itself will not help you (because that's effectively what the synchronized is doing for you). However if you respond to the web request as soon as you put it on the queue, then you will reduce your response fime. But at the cost of some reliability as the user will get a confirmation that the work is done and the work will not really have been done (the system could crash before the work is done).

Francis Upton's practice is indeed an accepted practice.
Another one, is making more fine grained synchronization. Instead of synchronizing all read/write methods of a class, you can synchronize access of the exact invariants that should be synchronized.
And yet even better, is to get rid of synchronization altogether. This is possible using the java.util.concurrent package. This package introduce new collections that use Non-Blocking Algorithms (implemented in java using Compare-Ans-Swap atomic instructions). These collections, such as ConcurrentHashMap, enable much better throughput when scaling.
You can read more about it in this article.

In this type of implementation (slow service under increasing load) you want to make as much as possible async, including the timeout processing (if server-based) and the required I/O. Don't hold up your client response threads waiting for either of these time-consuming operations, to preserve the server's responsiveness to new requests, but instead fire off the required operations (maybe to a dynamic thread pool) and let callbacks process the results, whether timeout, complete I/O, or errors.
Send the appropriate response depending on what happens first, but be prepared to roll back I/O if you send an error/timeout message and then a completed I/O arrives (due to a race condition between I/O and timer). This implies transactional semantics are required in the server.
This is an area that get increasingly complex as your load grows but good design early on should allow you to scale as load grows. Ideally the client servicing threads should not block at all.

Continuous parsing and processing of text

I have a class that's a listener to a log server. The listener gets notified whenever a log/text is spewed out. I store this text in an arraylist.
I need to process this text (remove duplicate words, store it in a trie, compare it against some patterns etc).
My question is should i be doing this as an when the listener is notified? Or should i be creating a separate thread that handles the processing.
What is the best way to handle this situation?

Sounds like you're trying to solve the Producer Consumer Problem, in which case - Yes, you should be looking at threads.
If, however, you only need to do very basic operations that take less than milliseconds per entry - don't overly complicate things. If you use a TreeSet in conjunction with an ArrayList - it will automatically take care of keeping duplicates out. Simple atomic operations such as validating the log entry aren't such a big deal that they need a seperate thread, unless new text is coming in at such a rapid rate that you need to need a thread to busy itself full time with processing new notifications.

The process that are not related to UI i always run that type of process in separate thread so it will not hang your app screen. So as my point of view you need to go with separate thread.

Such a situation can be solved using Queues. The simplest solution would be to have an unbounded blocking queue (a LinkedTransferQueue is tailored for such a case) and a limited size pool of worker threads.
You would add()/offer() the log entry from the listener's thread and take() for processing with worker threads. take() will block a thread if no log entries are available for processing.
P. S. A LinkedTransferQueue is designed for concurrent usage, no external synchronization is necessary: it's based on weak iterators, just like the Concurrent DS family.

Multiple SingleThreadExecutors for a given application...a good idea?

This question is about the fallouts of using SingleThreadExecutor (JDK 1.6). Related questions have been asked and answered in this forum before, but I believe the situation I am facing, is a bit different.
Various components of the application (let's call the components C1, C2, C3 etc.) generate (outbound) messages, mostly in response to messages (inbound) that they receive from other components. These outbound messages are kept in queues which are usually ArrayBlockingQueue instances - fairly standard practice perhaps. However, the outbound messages must be processed in the order they are added. I guess use of a SingleThreadExector is the obvious answer here. We end up having a 1:1 situation - one SingleThreadExecutor for one queue (which is dedicated to messages emanating from one component).
Now, the number of components (C1,C2,C3...) is unknown at a given moment. They will come into existence depending on the need of the users (and will be eventually disposed of too). We are talking about 200-300 such components at the peak load. Following the 1:1 design principle stated above, we are going to arrange for 200 SingleThreadExecutors. This is the source of my query here.
I am uncomfortable with the thought of having to create so many SingleThreadExecutors. I would rather try and use a pool of SingleThreadExecutors, if that makes sense and is plausible (any ready-made, seen-before classes/patterns?). I have read many posts on recommended use of SingleThreadExecutor here, but what about a pool of the same?
What do learned women and men here think? I would like to be directed, corrected or simply, admonished :-).

If your requirement is that the messages be processed in the order that they're posted, then you want one and only one SingleThreadExecutor. If you have multiple executors, then messages will be processed out-of-order across the set of executors.
If messages need only be processed in the order that they're received for a single producer, then it makes sense to have one executor per producer. If you try pooling executors, then you're going to have to put a lot of work into ensuring affinity between producer and executor.
Since you indicate that your producers will have defined lifetimes, one thing that you have to ensure is that you properly shut down your executors when they're done.

Messaging and batch jobs is something that has been solved time and time again. I suggest not attempting to solve it again. Instead, look into Quartz, which maintains thread pools, persisting tasks in a database etc. Or, maybe even better look into JMS/ActiveMQ. But, at the very least look into Quartz, if you have not already. Oh, and Spring makes working with Quartz so much easier...

I don't see any problem there. Essentially you have independent queues and each has to be drained sequentially, one thread for each is a natural design. Anything else you can come up with are essentially the same. As an example, when Java NIO first came out, frameworks were written trying to take advantage of it and get away from the thread-per-request model. In the end some authors admitted that to provide a good programming model they are just reimplementing threading all over again.

It's impossible to say whether 300 or even 3000 threads will cause any issues without knowing more about your application. I strongly recommend that you should profile your application before adding more complexity
The first thing that you should check is that number of concurrently running threads should not be much higher than number of cores available to run those threads. The more active threads you have, the more time is wasted managing those threads (context switch is expensive) and the less work gets done.
The easiest way to limit number of running threads is to use semaphore. Acquire semaphore before starting work and release it after the work is done.
Unfortunately limiting number of running threads may not be enough. While it may help, overhead may still be to great, if time spent per context switch is major part of total cost of one unit of work. In this scenario, often the most efficient way is to have fixed number of queues. You get queue from global pool of queues when component initializes using algorithm such as round-robin for queue selection.
If you are in one of those unfortunate cases where most obvious solutions do not work, I would start with something relatively simple: one thread pool, one concurrent queue, lock, list of queues and temporary queue for each thread in pool.
Posting work to queue is simple: add payload and identity of producer.
Processing is relatively straightforward as well. First you get get next item from queue. Then you acquire the lock. While you have lock in place, you check if any of other threads is running task for same producer. If not, you register thread by adding a temporary queue to list of queues. Otherwise you add task to existing temporary queue. Finally you release the lock. Now you either run the task or poll for next and start over depending on whether current thread was registered to run tasks. After running the task, you get lock again and see, if there is more work to be done in temporary queue. If not, remove queue from list. Otherwise get next task. Finally you release the lock. Again, you choose whether to run the task or to start over.

Is there a use case for creating threads without synchronization and locks?

Since thread execution happens in a pool, and is not guaranteed to queue in any particular order, then why would you ever create threads without the protection of synchronization and locks? In order to protect data attached to an object's state (what I understand to be the primary purpose of using threads), locking appears to be the only choice. Eventually you'll end up with race conditions and "corrupted" data if you don't synchronize. So if you're not interested in protecting that data, then why use threads at all?

If there's no shared mutable data, there's no need for synchronization or locks.

Delegation, just as one example. Consider a webserver that gets connect requests. It can delegate to a worker thread a particular request. The main thread can pass all the data it wants to the worker thread, as long as that data is immutable, and not have to worry at all about concurrent data access.
(For that matter, both main thread and worker thread can send all the immutable data to each other they want, it just requires a messaging queue of some sort, so the queue may need synchronization but not the data itself. But you don't need a message queue to get data to a worker thread, just construct the data before the thread starts, and as long as the data is immutable at that point, you don't need any synchronization or locks or concurrency management of any sort, other than the ability to run a thread.)

Synchronization and locks protect shared state from conflicting concurrent updates. If there is no shared state to protect, you can run multiple threads without locking and synchronization. This might be the case in a web server with multiple independent worker threads serving incoming requests. Another way to avoid synchronization and locking is to have your threads only operate on immutable shared state: if a thread can't alter any data that another thread is operating on, concurrent unsynchronized access is fine.
Or you might be using an Actor-based system to handle concurrency. Actors communicate by message passing only, there is no shared state for them to worry about. So here you can have many threads running many Actors without locks. Erlang uses this approach, and there is a Scala Actors library that allows you to program this way on the JVM. In addition there are Actors-based libraries for Java.

In order to protect data attached to
an object's state (what I understand
to be the primary purpose of using
threads), locking appears to be the
only choice. ... So if
you're not interested in protecting
that data, then why use threads at
all?
The highlighted bit of your question is incorrect, and since it is the root cause of your "doubts" about threads, it needs to be addressed explicitly.
In fact, the primary purpose for using threads is to allow tasks to proceed in parallel, where possible. On a multiprocessor the parallelism will (all things being equal) speedup your computations. But there are other benefits that apply on a uniprocessor as well. The most obvious one is that threads allow an application to do work while waiting for some IO operation to complete.
Threads don't actually protect object state in any meaningful way. The protection you are attributing to threads comes from:
declaring members with the right access,
hiding state behind getters / setters,
correct use of synchronization,
use of the Java security framework, and/or
sending requests to other servers / services.
You can do all of these independently of threading.

java.util.concurrent.atomic provides for some minimal operations that can be performed in a lock-free and yet thread-safe way. If you can arrange your concurrency entirely around such classes and operations, your performance can be vastly enhanced (as you avoid all the overhead connected with locking). Granted, it's unusual to be working on such a simplifiable problem (more often some locking will be needed), but, if and when you do find yourself in such a situation, well, then, that's exactly the use case you're asking about!-)

There are other kinds of protection for shared data. Maybe you have atomic sections, monitors, software transactional memory, or lock-free data structures. All these ideas support parallel execution without explicit locking. You can Google any of these terms and learn something interesting. If your primary interest is Java, look up Tim Harris's work.

Threads allow multiple parallel units of work to progress concurrently. The synchronisation is simply to protect shard resources from unsafe access if not needed you don't use it.
Processing on threads becomes delayed when accessing certain resources such as IO and it may be desirable to keep the CPU processing other units of work while others are delayed.
As in the example in the other answer listening to services requests may well be a unit of work that is kept independent of responding to a request as the latter my block due to resource contention - say access disk or IO.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.