Java RMI and advanced multithreading - java

I am implementing something like a database where data manipulation statements (inserts, updates and deletes) get evaluated. Some statements can execute concurrently and others cannot (I compute that). I like the ease of use and convenience of RMI, however I need to have a deeper understanding of the RMI service implementation w.r.t multithreading. For example,
Can the multithreading be controlled in any way?
Is a thread created for each remote call (on server side) or are thread pools used?
More generally, using RMI, how can I ensure that some rmi calls wait for other calls to terminate?
Is there another non-RMI approach, with the same convenience and efficiency that would work better for this?
If I want multi-threading should I just create threads myself on the server side code? The concern is that if the RMI Service creates multiple threads than I would be adding additional unnecessary threads.
If, for example, a thread is created on each call, then I can use the java join method to order the statement execution. On the other hand, if thread pools are used then the join method won't work (since the threads don't terminate).

Overview
There seems to be a few questions within this post, so I will attempt to walk you through each portion in some detail.
Question 1 - Can the multi-threading be controlled in any way?
Yes! Your implementation of the multi-threading can be whatever you want it to be. A RMI implementation is only the communication between seperate JVMs with enough abstraction to feel like they exist on 1 JVM; thus has no effect on multi-threading as it is only the communication layer.
Question 2 - Is a thread created for each remote call (on the server side) or are thread-pools used?
See the documentation here. The short answer to if they are on separate threads is no.
A method dispatched by the RMI runtime to a remote object implementation may or may not execute in a separate thread. The RMI runtime makes no guarantees with respect to mapping remote object invocations to threads. Since remote method invocation on the same remote object may execute concurrently, a remote object implementation needs to make sure its implementation is thread-safe.
RMI using thread-pools depends on the implementation, but as a developer utilizing RMI this should be of no concern as it is encapsulated in the RMI connection layer.
Question 3 - Using RMI, how can I ensure that some RMI calls wait for other calls to terminate?
This is a rather vague question, but I think what your asking is how do you properly block when synchronizing in RMI. This comes with your design of the application. Lets take the scenario where you are trying to access the database and you must synchronize DB access. If the client attempts to invoke access through RMI, it will invoke the remote server's method that holds all the synchronization, thus wait for a lock if it must. Therefore, the Client will be waiting for its turn to access the DB via the server. So, with your current scenario, you want your synchronization of the DB to be present on the server-side.
Question 4 - Is there another non-RMI approach, with the same convenience and efficiency that would work better for this?
Absolutely. Below is a brief list of communication implementations that could be utilized for communication.
1) RESTful
2) RMI
3) Sockets
4) gRPC
My recommendation is to utilize RESTful as it is the most straight-forward and has plenty of implementations/documentation on the internet. Efficiency seems to be quite a high concern for you, but your operations are only manipulating a DB in a standard manner. Therefore, I believe a Restful implementation would provide more than enough efficiency.
Think of it like this; you have N number clients, a load-balancer, and M servers. There exists no constant connection between clients and servers thus reducing complexity and computation. As N clients grows, the load balancer creates more instances of servers and allocating the load appropriately. Note, the requests between clients and servers are actually quite small as they will have a payload and a request type. Additionally, servers will receive the requests and compute the operations as normal and in parallel. The optimization can be done on the server side via threadpools or frameworks such as spring.

What you are asking for is a way to coordinate execution of the tasks according to their dependencies. The fact that your tasks make RMI calls is insignificant. We can imagine a pure computational tasks which does not access remote machines and still are dependent on each other, for example, by providing values computed in one task as parameters for other tasks.
The coordination of dependent tasks is the central problem of asynchronous programming. The support of asynchronous programming in JDK is not full but sufficient for your problem. You need to use 2 things: CompletableFuture and an Executor. Note that an RMI call blocks the thread it runs on, so using an Executor with limited number of threads can lead to the deadlock of specific kind, named "thread starvation", when the computation cannot move on because all available threads are blocked. So use an executor with unlimited number of threads, the simplest is the one which creates new thread for each task:
Executor newThreadExecutor = (Runnable r)->new Thread(r).start();
Then, for each RMI call (or any other task), declare the task method. If the task does not depend on other tasks, then the method should have no parameters. If the task depends on the result(s) produced by other task(s), then declare one or two parameters (greater number of parameters is not supported by CompletableFuture directly). Let we have:
String m0a() {return "ok";} // no args
Integer m0b() {return 1;} // no args
Double m2(String arg, Integer arg2) {return arg2/2.0;} // 2 args
Let we want to compute the following result:
String r0a = m0a();
Integer r0b = m0b();
Double r2 = m2(r0a, r0b);
but asynchronously, so that calls to m0a and m0b are executed in parallel, and the call to m2 starts as soon as both m0a and m0b finished.
Then, wrap each task method with an instance of CompletableFuture Depending on the signature of the task method, different methods of CompletableFuture are used:
CompletableFuture<String> t0a = CompletableFuture.supplyAsync(this::m0a, newThreadExecutor)
CompletableFuture<Integer> t0b = CompletableFuture.supplyAsync(this::m0b, newThreadExecutor)
CompletableFuture<Double> t2 = t0a.thenCombineAsync(t0b, this::m2, newThreadExecutor)
The tasks start execution right after declaration, no call to special start method is required.
To get the final result from the last task t2, method get() of interface Future can be used:
Double res = t2.get();

Related

Async service call delegation in a Threadpool

I have an use case, where I want to do a POST request to another Java Service(Service B) from my service (Service A) whose response I don't care about but I am concerned about not changing the existing latencies for functionality of Service A that executes the call.
So, basically I have set of checks I perform in Service A which run in parallel using a ThreadPool. An ExecutorService instance is used to spawn different threads. Each check has an CompletabaleFuture associated. These CompletableFuture objects are submitted to the ExecutorService, which then provides threads for execution. The ThreadPool has a limited size.
For one of the check that I perform parallely, I also want to additionally perform a delegation which a new class will handle and try to execute a POST request to Service B. If I do delegation within the same thread as the one used to perform a particular check, then because of the limited size of ThreadPool, the overall performance of the parallel checks will be affected, ultimately affecting my service.
Is there a way to restructure or make the calls to new service in an efficient manner ? I'm aware that in Kotlin, there's something called as Co-routines. Not sure if Java has something similar that could be useful.

Right to implement synchronized blocks for Netty server?

I have been building a game server using Netty which may have thousands concurrent connections. I have known that in the server side those connections may share some but not only one worker threads thus it is not safe to let them to access freely shared data, e.g. to find, remove or add some objects to some common lists and maps. I am considering to add synchronized blocks to all code which access shared data. (For heavy tasks such as querying database I plan to use ExecutorService / Threads so synchronisation won’t be a big problem for those tasks).
I am still confused if it a good / common solution or there are better ways (than using synchronized blocks) to do that for Netty server.
Can someone give me some advices please. Many thanks in advance.
Synchronized blocks (or their equivalent ReentrantLocks) are the only reliable way to access shared data. In asynchronous environment like Netty, however, the code inside synchronized block may not call to wait(), since this excludes current thread from serving. Usually synchronized block is an intermediate object in producer-consumer communication, and consumer calls to wait() when there is no data from producer. To avoid blocking, consumer, when data are not ready, places himself (or another object of type Runnable) into the intermediate object, and producer, when sending data, submits that Runnable to a thread pool (instead of calling to notify()).
Synchronized blocks are low-level facilities and should not be mixed with business logic. Instead, a communicating framework should be developed with adequate interface. An example of such framework is my df4j library.

Concurrent access to a Remote Object Java RMI

I am currently studying how Java RMI works but I do not understand a certain aspect.
In a non-distributed multithredaded environment if methods on the same object are called simultaneously from different threads each of them will be executed on the respective thread's stack (accessing shared data is not a part of my question).
In a distributed system since a client process calls methods on the stub and the actual call is executed on the stack of the process that created the remote object how are simultaneous calls to a method handled? In other words what happens at the lets say server thread when there are two (or more) requests to execute the same method on that thread?
I thought of this question as I want to compare this to what I am used to - the executions being on different stacks.
how are simultaneous calls to a method handled?
It isn't specified. It is carefully stated in the RMI specification: "The RMI runtime makes no guarantees with respect to mapping remote object invocations to threads."
The occult meaning of this is that you can't assume the server is single-threaded.
In other words what happens at the lets say server thread when there are two (or more) requests to execute the same method on that thread?
There can't be two or more requests to execute the method on the same thread. The question doesn't make sense. You've posited a unique 'lets say server thread' that doesn't actually exist.
There can however be two or more requests to execute the method arising from two or more concurrent clients, or two or more concurrent threads in a single client, or both, and because of the wording of the RMI Specification you can't assume a single-threaded despatching model at the server.
In the Oracle/Sun implementation it is indeed multi-threaded, ditto the IBM implementation. I'm not aware of any RMI implementation that isn't multi-threaded, and any such implementation would be basically useless.

Best practices with Akka in Scala and third-party Java libraries

I need to use memcached Java API in my Scala/Akka code. This API gives you both synchronous and asynchronous methods. The asynchronous ones return java.util.concurrent.Future. There was a question here about dealing with Java Futures in Scala here How do I wrap a java.util.concurrent.Future in an Akka Future?. However in my case I have two options:
Using synchronous API and wrapping blocking code in future and mark blocking:
Future {
blocking {
cache.get(key) //synchronous blocking call
}
}
Using asynchronous Java API and do polling every n ms on Java Future to check if the future completed (like described in one of the answers above in the linked question above).
Which one is better? I am leaning towards the first option because polling can dramatically impact response times. Shouldn't blocking { } block prevent from blocking the whole pool?
I always go with the first option. But i am doing it in a slightly different way. I don't use the blocking feature. (Actually i have not thought about it yet.) Instead i am providing a custom execution context to the Future that wraps the synchronous blocking call. So it looks basically like this:
val ecForBlockingMemcachedStuff = ExecutionContext.fromExecutorService(Executors.newFixedThreadPool(100)) // whatever number you think is appropriate
// i create a separate ec for each blocking client/resource/api i use
Future {
cache.get(key) //synchronous blocking call
}(ecForBlockingMemcachedStuff) // or mark the execution context implicit. I like to mention it explicitly.
So all the blocking calls will use a dedicated execution context (= Threadpool). So it is separated from your main execution context responsible for non blocking stuff.
This approach is also explained in a online training video for Play/Akka provided by Typesafe. There is a video in lesson 4 about how to handle blocking calls. It is explained by Nilanjan Raychaudhuri (hope i spelled it correctly), who is a well known author for Scala books.
Update: I had a discussion with Nilanjan on twitter. He explained what the difference between the approach with blocking and a custom ExecutionContext is. The blocking feature just creates a special ExecutionContext. It provides a naive approach to the question how many threads you will need. It spawns a new thread every time, when all the other existing threads in the pool are busy. So it is actually an uncontrolled ExecutionContext. It could create lots of threads and lead to problems like an out of memory error. So the solution with the custom execution context is actually better, because it makes this problem obvious. Nilanjan also added that you need to consider circuit breaking for the case this pool gets overloaded with requests.
TLDR: Yeah, blocking calls suck. Use a custom/dedicated ExecutionContext for blocking calls. Also consider circuit breaking.
The Akka documentation provides a few suggestions on how to deal with blocking calls:
In some cases it is unavoidable to do blocking operations, i.e. to put
a thread to sleep for an indeterminate time, waiting for an external
event to occur. Examples are legacy RDBMS drivers or messaging APIs,
and the underlying reason is typically that (network) I/O occurs under
the covers. When facing this, you may be tempted to just wrap the
blocking call inside a Future and work with that instead, but this
strategy is too simple: you are quite likely to find bottlenecks or
run out of memory or threads when the application runs under increased
load.
The non-exhaustive list of adequate solutions to the “blocking
problem” includes the following suggestions:
Do the blocking call within an actor (or a set of actors managed by a router), making sure to configure a thread pool which is either
dedicated for this purpose or sufficiently sized.
Do the blocking call within a Future, ensuring an upper bound on the number of such calls at any point in time (submitting an unbounded
number of tasks of this nature will exhaust your memory or thread
limits).
Do the blocking call within a Future, providing a thread pool with an upper limit on the number of threads which is appropriate for the
hardware on which the application runs.
Dedicate a single thread to manage a set of blocking resources (e.g. a NIO selector driving multiple channels) and dispatch events as they
occur as actor messages.
The first possibility is especially well-suited for resources which
are single-threaded in nature, like database handles which
traditionally can only execute one outstanding query at a time and use
internal synchronization to ensure this. A common pattern is to create
a router for N actors, each of which wraps a single DB connection and
handles queries as sent to the router. The number N must then be tuned
for maximum throughput, which will vary depending on which DBMS is
deployed on what hardware.

Java RMI: Implement a time out in the client code

I have a Java RMI system. The situation is typical: the client invokes a method of the server.
The client has an internal timer, so if the server doesn't finish in due time (the time is specified in the client), then the client must do something else.
So, the client must wait for the server to finish its job for a specific time and in case the server didn't finish do something else (it doesn't matter what). How can I do this?
I don't care about connection timeouts and so on, assume that the server and client are connected through RMI and everything it's fine, only that the server's job can be computationally intensive and can require some time.
thanks a lot!
Make the RMI call on another thread. Have the originating thread wait a certain length of time for a response from the RMI-calling thread.
Alternatively, have the server RMI thread delegate the task to a worker thread. Return to the caller if the worker thread doesn't respond sufficiently quickly.
In general, when you want operations to timeout in Java, you are talking about one or two synchronous/asynchronous conversion layers. I've never done this with RMI, but I imagine you do something similar. Perhaps asking the participants of this discussion: ( Asynchronous Java RMI ) will be useful. Based on the points made in ( Spring Async RMI Call ), I would say you need to do the following:
Call the RMI service using separate thread(s); consider using executor service.
Expose this with a wrapper that calls through to the executor service and blocks for a finite amount of time for results; consider using Futures.
You need to set the undocumented property called (I think) sun.rmi.transport.tcp.responseTimeout at each client JVM. Value in milliseconds.

Categories