Query on RxJava observeOn scheduler thread - java

I have to write into a file based on the incoming requests. As multiple requests may come simultaneously, I don't want multiple threads trying to overwrite the file content together, which may lead into losing some data.
Hence, I tried collecting all the requests' data using a instance variable of PublishSubject. I subscribed publishSubject during init and this subscription will remain throughout the life-cycle of application. Also I'm observing the same instance on a separate thread (provided by Vertx event loop) which invokes the method responsible for writing the file.
private PublishSubject<FileData> publishSubject = PublishSubject.create();
private void init() {
publishSubject.observeOn(RxHelper.blockingScheduler(vertx)).subscribe(fileData -> writeData(fileData));
}
Later during request handling, I call onNext as below:
handleRequest() {
//do some task
publishSubject.onNext(fileData);
}
I understand that, when I call onNext, the data will be queued up, to be written into the file by the specific thread which was assigned by observeOn operator. However, what I'm trying to understand is
whether this thread gets blocked in WAITING state for only this
task? Or,
will it be used for other activities also when no file
writing happens?
I don't want to end up with one thread from the vertx event loop wasted in waiting state for going with this approach. Also, please suggest any better approach, if available.
Thanks in advance.

Actually RxJava will do it for you, by definition onNext() emissions will act in serial fashion:
Observables must issue notifications to observers serially (not in parallel). They may issue these notifications from different threads, but there must be a formal happens-before relationship between the notifications. (Observable Contract)
So as long as you will run blocking calls inside the onNext() at the subscriber (and will not fork work to a different thread manually) you will be fine, and no parallel writes will be happen.
Actually, you're worries should come from the opposite direction - Backpressure.
You should choose your backpressure strategy here, as if the requests will come faster then you will process them (writing to file) you might overflow the buffer and get into troubles. (consider using Flowable and choose you're backpressure strategy according to your needs.
Regarding your questions, that depends on the Scheduler, you're using RxHelper.blockingScheduler(vertx) which seems like your custom code, so I can't tell, if the scheduler is using shared thread in work queue fashion then it will not stay idle.
Anyhow, Rx will not determine this for you, the scheduler responsibility is to assign the work to some thread according to its logic.

Related

Nodejs performance event loop

There are articles claiming superior nodejs performance due to its single threaded event loop. I'm not asking for opinions, I'm asking for a mechanics explanation.
A thread starts to process a request, computes a little, and finds out that it needs to read from a database. This gets done asynchronously. No delay involved and the thread can continue... but what should it do without the data?
A1 Answer "don't know yet"?
A2 Grab another request?
A1 makes little sense to me. I can imagine a client issuing other requests in the meantime (like loading multiple resources on first site access), but in general, no.
A2 When it grabs another request, then it loses the whole context. This context gets saved in the promise which will get fulfilled when the data arrive, but which thread does process this promise?
B1 The same thread later
B2 A different thread.
In case B1 you may be lucky and some relevant data may be still in the threads' cache, but given that a DB request takes a few milliseconds, the gain is IMHO low.
Isn't case B2 practically equivalent to a context switch?
A: Node.js will not respond to any request unless you write code that actively sends a response. It doesn't matter whether that code runs synchronously or asynchronously.
The client (or even the server's networking stack) cannot know or care whether asynchrony happened in the meantime.
B: There is only one Node.js thread, period.
When a response arrives for an asynchronous operation kicked off in Node.js code, an event is raised in the Node.js event loop thread, and the appropriate callback/handler is called.
Node.js is based on libuv C library.
Threads are used internally to fake the asynchronous nature of all the
system calls. libuv also uses threads to allow you, the application,
to perform a task asynchronously that is actually blocking, by
spawning a thread and collecting the result when it is done.
A thread starts to process a request, computes a little, and finds out that it needs to read from a database. This gets done asynchronously. No delay involved and the thread can continue... but what should it do without the data?
Pass a callback to a DB module's method, and return from the current function which was invoked as an event listener too. Event loop will continue to next event in a queue.
Context is accessible inside callback as function's closure.

Using Java thread pool, how to process some messages serially and others in parallel depending on message characteristic?

This is more of a Java concurrency design question. I’m working on an application that need to process many messages for many different clients. If two messages have different client names, then they can be processed in parallel. However, if they have the same client name, then they need to be processed in order serially.
What’s the best way to implement this?
My current implementation is pretty simple: I wrote a wrapper class called OrderedExecutorPool. It has a list of single-threaded executors. In its submit method, it does the following to figure out which executor to submit the task to:
int executorNum = Math.abs(clientName.hashCode()) % numExecutors;
executorList.get(executorNum).submit(task);
This ensures that all messages with same clients go to the same executor while still supporting processing messages for different clients in parallel.
There are a couple of problems with this design:
1.) If most client names have same hash code, then only a few executors are doing work
2.) If one client has MANY messages, only one executor may not keep up
Is there an elegant solution to this problem that can fix the shortcomings above?
Edit
clientName is just a String. I'm just invoking the String.hashCode() method on it.
There is no jdk builtin solution that i know of. i've implemented a custom executor solution to this at my current job using this basic logic.
keep an internal map of clientname to work queue (each client has their own queue)
when work comes in for a client, add it to their queue
if this is the first job on the queue, create a Runnable for this clientname/queue and push it into the "real" executor (standard jdk thread pool)
Runnable impl just consumes tasks from a single client queue until empty and then exits
this simple implementation is the "greedy" approach (a client will keep working until its queue is empty). if you have more clients than underlying threads, you may want a more "fair" approach, where a client executes some number of tasks and they re-queues itself in the underlying executor (thus allowing other clients to get some work done).

java threads for each instance or object instances

I have object instances of a custom class, and each instance processes messages (via methods) coming through independently for each instance. No instances "talk" to other instances.
My question is, is putting each object in its own thread necessary since each object processes independently real-time messages (logs etc...) coming through anyhow?
Thanks for any responses.
My question is, is putting each object in its own thread necessary
since each object processes independently real-time messages (logs
etc...) coming through anyhow?
You need to process each of the message acquired by each object in new separate thread. This will lead to fast processing of the incoming messages for your object. And since , there is not interaction between each object so no thread synchronization is needed which is good for your application. Or, better that you use pool of threads. Have a look at ThreadPoolExecutor
It is not necessary for each object to have its own thread, however, you may gain improved performance by having more than one message processing thread. The ideal number of threads is not necessarily (or even likely) to be the same as the number of processing objects.
Typically, in a situation like you describe the approach would be to use a task / message processing queue where each object you have adds tasks to the queue, and then multiple threads process items from the queue in order. The number of threads used here is configurable so that the application can be optimized for the platform it is running on.
An easy way to achieve this design is to simply use an ExecutorService as your task queue (in which case your messages themselves must implement Runnable):
// For 2 threads, adjust as appropriate.
ExecutorService executor = Executors.newCachedThreadPool(2);
And then to add a Runnable message:
// Add a message to the queue for concurrent / asynchronous processing
executor.submit(message);
Note that the executor itself should be shared across all of your message handling objects, so that each object is adding messages to the same queue (assuming you have many message handling objects). It is also possible to have a queue per message handling object, but that decision would depend on the number of handling objects and any requirements surrounding how messages are processed.

How to integrate LMAX within a real financial application

I am also thinking of integrating the disruptor pattern in our application. I am a bit unsure about a few things before I start using the disruptor
I have 3 producers, mainly a FIX thread which de-serialises the requests. Another thread which continously modifies order price as the market moves. Also we have one more thread which is responsible for de-serialising the requests sent from a GUI application. All three threads currently write to a Blocking Queue (hence we see a lot of contention on the queue)
The disruptor talks about a Single writer principle and from what I have read that approach scales the best. Is there any way we could make the above three threads obey the single writer principle?
Also in a typical request/response application, specially in our case we have contention on an in memory cache, as we need to lock the cache when we update the cache with the response, whilst a request might be happening for the same order. How do we handle this through the disruptor, i.e. how do I tie up a response to a particular request? Can I eliminate the lock on the cache if yes how?
Any suggestions/pointers would be highly appreciated. We are currently using Java 1.6
I'm new to distruptor and am trying to understand as much usecases as possible. I have tried to answer your questions.
Yes, Disruptor can be used to sequence calls from multiple
producers. I understand that all 3 threads try to update the state
of a shared object. And a single consumer which takes necessary action on the shared object. Internally you can have the single consumer delegate calls to the appropriate single threaded handler based on responsibility. The
The Disruptor exactly does this. It sequences the calls such that
the state is accessed only by a thread at a time. If there's a specific order in which the event handlers are to be invoked, set up the memory barrier. The latest version of Disruptor has a DSL that lets you setup the order easily.
The Cache can be abstracted and accessed through the Disruptor. At a time, only a
Reader or a Writer would get access to the cache, since all calls to
the cache are sequential.

What design pattern to use for a threaded queue

I have a very complex system (100+ threads) which need to send email without blocking. My solution to the problem was to implement a class called EmailQueueSender which is started at the beginning of execution and has a ScheduledExecutorService which looks at an internal queue every 500ms and if size()>0 it empties it.
While this is going on there's a synchronized static method called addEmailToQueue(String[]) which accepts an email containing body,subject..etc as an array. The system does work, and my other threads can move on after adding their email to queue without blocking or even worrying if the email was successfully sent...it just seems to be a little messy...or hackish...Every programmer gets this feeling in their stomach when they know they're doing something wrong or there's a better way. That said, can someone slap me on the wrist and suggest a more efficient way to accomplish this?
Thanks!
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html
this class alone will probably handle most of the stuff you need.
just put the sending code in a runnable and add it with the execute method.
the getQueue method will allow you to retrieve the current list of waiting items so you can save it when restarting the sender service without losing emails
If you are using Java 6, then you can make heavy use of the primitives in the java.util.concurrent package.
Having a separate thread that handles the real sending is completely normal. Instead of polling a queue, I would rather use a BlockingQueue as you can use a blocking take() instead of busy-waiting.
If you are interested in whether the e-mail was successfully sent, your append method could return a Future so that you can pass the return value on once you have sent the message.
Instead of having an array of Strings, I would recommend creating a (almost trivial) Java class to hold the values. Object creation is cheap these days.
Im not sure if this would work for your application, but sounds like it would. A ThreadPoolExecutor (an ExecutorService-implementation) can take a BlockingQueue as argument, and you can simply add new threads to the queue. When you are done you simply terminate the ThreadPoolExecutor.
private BlockingQueue<Runnable> queue;
...
ThreadPoolExecutor executor = new ThreadPoolExecutor(10, 10, new Long(1000),
TimeUnit.MILLISECONDS, this.queue);
You can keep a count of all the threads added to the queue. When you think you are done (the queue is empty, perhaps?) simply compare this to
if (issuedThreads == pool.getCompletedTaskCount()) {
pool.shutdown();
}
If the two match, you are done. Another way to terminate the pool is to wait a second in a loop:
try {
while (!this.pool.awaitTermination(1000, TimeUnit.MILLISECONDS));
} catch (InterruptedException e) {//log exception...}
There might be a full blown mail package out there already, but I would probably start with Spring's support for email and job scheduling. Fire a new job for each email to be sent, and let the timing of the executor send the jobs and worry about how many need to be done. No queuing involved.
Underneath the framework, Spring is using Java Mail for the email part, and lets you choose between ThreadPoolExecutor (as mention by #Lorenzo) or Quartz. Quartz is better in my opinion, because you can even set it up so that it fires your jobs at fixed points in time like cron jobs (eg. at midnight). The advantage of using Spring is that it greatly simplifies working with these packages, so that your job is even easier.
There are many packages and tools that will help with this, but the generic name for cases like this, extensively studied in computer science, is producer-consumer problem. There are various well-known solutions for it, which could be considered 'design patterns'.

Categories