sun.rmi.transport.tcp.TCPTransport uses 100% CPU - java

I'm developping a communication library based on NIO's non-blocking SocketChannels so I can use select to keep my thread low on CPU usage (and getting faster reaction time to other events).
SocketChannel are created externally to my thread and added to the list it handles, marking them as non-blocking and adding them to a Selector for READ operations (and WRITE when needed, but that does not happen in my problem).
I have a little Swing application for tests, running locally, that can be either a client or server: the client one connects to the server one and they can send each other messages. Pretty simple and works fine, excepts for the CPU which tops 100% (50% for each jvm) as soon as the connection is established between client and server.
Running jvisualvm shows me that sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run() uses 98% of the application time, counting only 3 method calls!
A forced stack trace shows it's blocking on the read operation on a FilteredInputStream, on a Socket.
I'm a little puzzled as I don't use RMI (though I can understand NIO and RMI can share the "transport" code part). I have seen a few similar questions but each were specifically using RMI, which I'm not. The answers I've seen is that this ConnectionHandler.run() method is responsible for marshalling/unmarshalling things, when here I get 100% CPU without any network traffic. I can only infer an active wait on the sockets but that sounds odd, especially with non-blocking SocketChannel...
Any idea would be greatly appreciated!

I tracked CPU use down to select(int timeout) which returns 0 immediately, regardless of the timeout value. My understanding of this function was it would block until a selected operation pops up, or timeout is reached (as said in the Javadoc).
However, I found out this other StackOverflow post showing the same problem: OP_CONNECT operation has to be cancelled once connection is accepted.
Many thanks to #Alexander, and #EJP for clarification about the OP_WRITE/OP_CONNECT similarities.
Regarding tge RMI part, it was probably due to Eclipse run configurations.

Related

Tomcat server not responding

I run a web server on tomcat 7 with Java 8, the server perform a lot of IO operations - mostly DB and HTTP calls, each transaction consumes a generous amount of memory and it serves around 100 concurrents at a given time.
After some time, around 10,000 requests made, but not in particular, the server start hangs, not respond or respond with empty 500 responses.
I see some errors on the logs which I currently trying to solve, but what bugs me is that I can't figure out what eventually causes that - catalina log file does not show a heap space exception, plus I took some memory dumps and it seems like there's always room to grow and garbage to collect, so I decided it is not a memory problem. Then I took thread dumps, I've always seen dozens of threads in WAITING, TIMED_WAITING, PARKING, etc...from what I read it seems like these threads are available to handle incoming work.
It's worth mentioning that all the work is done asynchronously, with no blocking operations and it seems like all the thread pools are available. Even more, I stop the traffic to the server and let it rest for some time, and even then the issue doesn't go away. So I figured it's also not a thread problem.
So...my question is:
Maybe it is a memory issue? Can it be a thread-cpu issue? can it be anything else?

JMX RMI - Memory Leak - ArrayNotificationBuffer getting bigger over time

i'm writing an application that should run many hours (10-100) which i'm monitoring using JMX.
However, after some time, i discover two things:
com.sun.jmx.remote.internal.ArrayNotificationBuffer#1 gets bigger: after 20 hours, it's about 10MB - when i started it, it was smaller than 1 MB
More threads like RMI TCP Accept-0 (or any other number) and RMI-TCP-Connection(44)-[IP] are instanciated over time.
I'm thinking, that it has something to do with different connections to the application, but currently i'm just connected once, but some connections still seem to be open.
How can that be? How can i fix this?
I was poking around in the source code comments for ArrayNotificationBuffer and it has a decent amount of JMX trace logging, so you might want to enable JMX tracing to get a better idea of what's going on.
You may find that this known bug is affecting you. The bug report indicates the issue is observed on long lasting connections. There's a couple of work-arounds mentioned, although a simpler one is not, if practical to you, which is to periodically disconnect and reconnect. The good news is that there appears to be a patch for this in Java7, although I'm not sure if it has reached a released build yet.
I would also make sure that if you are registering JMX notification listeners, that they are continually and promptly handling notifications. Failure to do so might also be causing this symptom.

Java RMI random latency in a HPC application

I'm using Java and RMI in order to execute 100k Montecarlo Simulations on a cluster of hundreds of cores.
The approach I'm using is to have a client app that invokes RMI processes and divides simulations on the number of available (RMI) processes on the grid.
Once that the simulations have been run I have to reaggregate results.
The only limit I have is that all this has to happen in less than 500ms.
The process is actually in place BUT randomly, from time to time, one of the RMI call takes 200ms more to execute.
I've added loads of logs and timings all over the place and as possible reason I've already discarded:
1) Simulations taking extra time
2) Data transfer (it constantly works, only sometimes the slowdown is verified, and only on a subset of RMI calls)
3) Transferring results back (I can clearly timing how long from last RMI calls return to the end of the process)
The only thing I cannot measure is IF any of the RMI Call is taking extra time to be initialized (and honestly is the only thing I can suppose). The reason of this is that -unfortunately- clocks are not synchronized :(
Is that possible that the RMI remote process got passivated/detached/collected even if I keep a (Remote) reference to it from the client?
Hope the question is clear enough (I'm pretty much sure it isn't).
Thanks a mil and do not hesitate to make more questions if it is not clear enough.
Regards,
Giovanni
Is that possible that the RMI remote process got passivated/detached/collected even if I keep a (Remote) reference to it from the client?
Unlikely, but possible. The RMI remote process should not be collected (as the RMI FAQ indicates for VM exit conditions). It could, however, be paged to disk if the OS desired.
Do you have a way to rule out GC calls (other than writing a monitor with JVM TI)?
Also, is your code structured in such a way that you send off all calls from your aggregator asynchronously, have the replies append to a list, and aggregate the results when your critical time is up, even if some processors haven't returned results? I'm assuming that each processor is an independent, random event and that it's safe to ignore some results. If not, disregard.
I finally came up with issue. Basically after insuring that the stub wasn't getting deallocated and that the GC wasn't triggered behind the scene, I used wireshark for understanding if there was any network issue.
What I found out it was that randomly one of the packet got lost and TCP needed on our network 120ms (41 retransmission) for correctly re-transfer data.
When switching to jdk7, SDP and infiniband, we didn't experience the issue anymore.
So basically the answer to my question was... PACKET LOST!
Thanks who replied to the post it helped to focus on the right path!
Gio

Java RMI specification on thread usage: "..may or may not execute in a separate thread"

I might have a problem with my application. There is a client running multiple threads which might execute rather time consuming calls to the server over Java RMI. Of course a time consuming call from one client should not block everyone else.
I tested it, and it works on my machine. So I created two Threads on the client and a dummy call on the server. On startup the clients both call the dummy method which just does a huge number of sysout. It can be seen that these calls are handled in parallel, without blocking.
I was very satisfied until a collegue indicated that the RMI spec does not necessarily guarantee that behavior.
And really a text on the hp of the university of Lancaster states that
“A method dispatched by the RMI runtime to a remote object
implementation (a server) may or may not execute in a separate thread.
Calls originating from different clients Virtual Machines will execute
in different threads. From the same client machine it is not
guaranteed that each method will run in a separate thread” [1]
What can I do about that? Is it possible that it just won't work in practice?
in theory, yes, you may have to worry about this. in reality, all mainstream rmi impls multi-thread all incoming calls, so unless you are running against some obscure jvm, you don't have anything to worry about.
What that wording means is that you can't assume it will all execute in the same thread. So you are responsible for any required synchronization.
Based on my testing on a Mac laptop, every single client request received in parallel seems to be executed on a separate thread (I tried upto a thousand threads without any issues. I don't know if there is an upper bound though. My guess is that the max no. of threads will be limited only by memory).
These threads then hang around for some time (a minute or two), in case they can service more clients. If they are unused for some time, they get GC'ed.
Note that I used Thread.sleep() on the server to hold up every request, hence none of the threads could finish the task and move on to another request.
The point is that, if required, the JVM can even allocate a separate thread for each client request. If work is done and threads are free, it could reuse existing threads without creating new ones.
I don't see a situation where any client request would be stuck waiting due to RMI constraints. No matter how many threads on the server are "busy" processing existing requests, new client requests will be received.

How can I limit the performance of sandboxed Java code?

I'm working on a multi-user Java webapp, where it is possible for clients to use the webapp API to do potentially naughty things, by passing code which will execute on our server in a sandbox.
For example, it is possible for a client to write a tight while(true) loop that impacts the performance of other clients.
Can you guys think of ways to limit the damage caused by these sorts of behaviors to other clients' performance?
We are using Glassfish for our application server.
The halting problem show that there is no way that a computer can reliably identify code that will not terminate.
The only way to do this reliably is to execute your code in a separate JVM which you then ask the operating system to shut down when it times out. A JVM not timing out can process more tasks so you can just reuse it.
One more idea would be byte-code instrumentation. Before you load the code sent by your client, manipulate it so it adds a short sleep in every loop and for every method call (or method entry).
This avoids clients clogging a whole CPU until they are done. Of course, they still block a Thread object (which takes some memory), and the slowing down is for every client, not only the malicious ones. Maybe make the first some tries free, then scale the waiting time up with each try (and set it down again if the thread has to wait for other reasons).
Modern day app servers use Thread Pooling for better performance. The problem is that one bad apple can spoil the bunch. What you need is an app server with one thread or maybe process per request. Of course there are going to be trade offs. but the OS will handle making sure that processing time gets allocated evenly.
NOTE: After researching a little more what you need is an engine that will create another process per request. If not a user can either cripple you servlet engine by having servlets with infinite loops and then posting multiple requests. Or he could simply do a System.exit in his code and bring everybody down.
You could use a parent thread to launch each request in a separate thread as suggested already, but then monitor the CPU time used by the threads using the ThreadMXBean class. You could then have the parent thread kill any threads that are misbehaving. This is if, of course, you can establish some kind of reasonable criteria for how much CPU time a thread should or should not be using. Maybe the rule could be that a certain initial amount of time plus a certain additional amount per second of wall clock time is OK?
I would make these client request threads have lower priority than the thread responsible for monitoring them.

Categories