Simulating OS processes in java / preemptive thread stop / classloader + thread process simulation

Simulating OS processes in java / preemptive thread stop / classloader + thread process simulation - java

I have a thread pool (executor) which I would like to monitor for excessive resource usage (time since cpu and memory seem to be quite harder). I would like to 'kill' threads that are running too long, like killing an OS process. The workers spend most time calculating, but significant time is also spent waiting for I/O, mostly database...
I have been reading up on stopping threads in java and how it is deprecated for resource cleanup reasons (not properly releasing locks, closing sockets and files and so on). The recommended way is to periodically check in a worker thread whether it should stop and then exit. This obviously expect that client threads be written in certain ways and that they are not blocked waiting on some external I/O. There is also ThreadDeth and InterruptedException which might be able to do the job, but they may actually be circumvented in improperly/malicously written worker threads, and also I got an impression (though no testing yet) they InterruptedException might not work properly in some (or even all) cases when the worker thread is waiting for I/O.
Another way to mitigate it would be to use multiple OS processes to isolate parts of the system, but it brings some unwanted increases in resource consumption.
That led me to that old story about isolates and/or MVM from more than five years ago, but nothing seems to have happened on that front, maybe in java 8 or 9...
So, actually, this all has made me to wander whether some poor mans simulation of processes could be achieved through using threads that would each have their own classloader? Could that be used to simulate processes if each thread (or group) would be loaded in its own classloader? I am not sure how much an increase in resource consumption would that bring (as there would not be much code sharing and code is not tiny). At least process copy-on-write semantics enable code sharing..
Any recommendations/ideas?
EDIT:
I am asking because of general interest and kind of disappointment that no solutions for this exist in the JVM to date (I mean shared application servers are not really possible - application domains, or something like that, in .NET seem to address exactly this kind of problem). I understand that killing a process does not guarantee reverting all system state to some initial condition, but at least all resorces like handles, memory and cpu are released. I was thinking of using classloaders since they might help with releasing locks held by the thread, which is one of the reasons that Thread.stop is deprecated. In my current situation the only other thing should be released (I can think of currently) is database connection, that could be handled separately/externally (by watchdog thread) if needed..
Though, really, in my case Thread.stop might actually be workable, I just dislike using deprecated methods..
Also I am considering this as a safety net for misbehaving processes, Ideally they should behave nicely, and are in a quite high degree under my control.
So, to clarify, I am asking how do for example java people on the server side handle runaway threads? I suspect by using many machines in the cluster to offset the problem and restarting misbehaving ones - when the application is stateless at least..

The difference between a thread and a process is that thread implicitly share memory and resources like sockets and files (making thread local memory a workaround). Processes implicitly have private memory and resources.
Killing the thread is not the problem. The problem is that a poorly behaving thread or even a reasonable behaviour thread can leave resources in an inconsistent state. Using a class loader will not help you track this, or solve the problem for you. For processes its easier to track what resources they are using, as most of the resources are isolated. Even for processes they can leave locks, temporary files and shared IPC resources in an incorrect state if killed.
The real solution is to write code which behaves properly so it can be managed and working around and trying to handle every possible poorly behaving code is next to impossible. If you have a bad third party library you have to use, you can try killing and cleaning it up and you can come up with an ok solution, but you can't expect it to be a clean one.
EDIT: Here is a simple program which will deadlock between two processes or machines because it has a bug in it. The way to stop deadlocks is to fix the code.
public static void main(String... args) throws IOException {
switch(args.length) {
case 1: {
// server
ServerSocket ss = new ServerSocket(Integer.parseInt(args[0]));
Socket s = ss.accept();
ObjectInputStream ois = new ObjectInputStream(s.getInputStream());
ObjectOutputStream oos = new ObjectOutputStream(s.getOutputStream());
// will deadlock before it gets here
break;
}
case 2: {
Socket s = new Socket(args[0], Integer.parseInt(args[1]));
ObjectInputStream ois = new ObjectInputStream(s.getInputStream());
ObjectOutputStream oos = new ObjectOutputStream(s.getOutputStream());
// will deadlock before it gets here
break;
}
default:
System.err.println("Must provide either a port as server or hostname port as client");
}
}

Related

Passing a Java socket from thread A to B

In a server, there is a thread A listening for incoming connections, typically looping forever. When a connection is accepted, thread A creates a task (say, class Callable in Java) and submits it to an Executor.
All this really means is that A lost the reference to the socket, and that now there’s a thread B (created by the Executor) that manages the socket. If B experiences any exception, it would close the socket, and there is no risk that the socket, as an operating system resource, will not be reclaimed.
This is all fine if thread B starts. But what if the executor was shut down before B had a chance to get scheduled?
Does anyone think this is an issue? If the reference to the socket is lost due to this, would the garbage collector close the socket?

Yes, it sounds like an issue.
The OS will probably eventually free up the socket (at least if it's TCP, as far as I can tell) but it will probably take a relatively long time.
I don't think the garbage collector plays a role in this case. At least not for threads, which after having been started will usually keep running even if there is no reference to them in the code (this is true at least for non-daemon threads). Sockets may behave in a similar manner.
If you cannot guarantee the connection is going to be processed (by starting the handling Thread instance as soon as it is established) then you should keep a reference to the socket and make sure you close all of them as soon as possible, which probably means right after Executor.shutdown() or similar method has been called.
Please note that depending on how you ask the Executor to shut down it will either process or not threads which already have been submitted to execution but haven't yet started. So be sure to make your code behave accordingly.
Also if you have limited resources (available threads) to process incoming socket connections and don't want them to grow too much, consider closing them immediately after having been accepted so they don't pile up in the unprocessed wait queue, if this is feasible in your project. The client can then retry connecting at a later time. If you still need to consume connections as soon as they come in, consider a non-blocking I/O approach, which will tend to scale better (and up to a point).

If the reference to the socket is lost due to this, would the garbage collector close the socket?
Probably. But the garbage collector may not run until literally the end of next week: You can't rely on the GC running, pretty much ever, just because 'hey, java has a garbage collector'. It does, and it won't kick in until needed. It may simply never be needed.
Depending on the GC to close resources is a fine way to get your VM killed by the OS for using up too many system resources.
The real question is: What is the causal process that results in shutting down the executor?
If there is some sort of 'cancel all open connections' button, and you implemented that as a one-liner: queue.shutdown(), then, no - that is not a good idea: You'll now be leaning on the GC to clean up those sockets which is bad.
I assume your callables look like:
Socket socket = ....; // obtained from queue
Callable<Void> socketHandler = () -> {
try {
// all actual handling code is here.
} finally {
socket.close();
}
return null;
};
then yeah that is a problem: If the callable is never even started, that finally block won't run. (If you don't have finally you have an even bigger problem - that socket won't get cleaned up if an exception occurs during the handling of it!).
One way out is to have a list of sockets, abstract away the queue itself, and have that abstraction have a shutdown method which both shuts down the queue and closes every socket, guarding every step (both the queue shutdown as well as all the socket.close commands) with a try/catch block to ensure that a single exception in one of these steps won't just stop the shutdown process on the spot.
Note that a bunch of handlers are likely to still be chugging away, so closing the socket 'out from under them' like this will cause exceptions in the handlers. If you don't want that, shut down the queue, then await termination (guarded with try/catch stuff), and then close all the sockets.
You can close a closed socket, that is a noop, no need to check first and no need to worry about the impact of closing a ton of already-closed sockets.
But do worry about keeping an obj ref to an infinitely growing list of sockets. Once a socket is completely done with, get rid of it - also from this curated list of 'stuff you need to close if the queue is terminated'.
Of course, if the only process that leads to early queue termination is because you want to shut down the VM, don't worry about it. The sockets go away with the VM. In fact, no need to shutdown the queue. If you intend to end the VM, just.. end it. immediately: System.shutdown(0) is what you want. There is no such thing as 'but.. I should ask all the things to shut down nicely!'. That IS how you ask. Systems that need to clean up resources are mostly badly designed (design them so that they don't need cleanup on VM shutdown. All the resources work that way, for example), and if you must, register a shutdown hook.

How many threads is good for a datagram receiver?

I created a send-receive datagram system for a game I have created in java (and LWJGL).
However, these datagrams often got dropped. That was because the server was waiting for various IO operations and other processing to be finished in the main loop, while new datagrams were being sent to it (which it was obviously not listening for).
To combat this, I have kept my main thread with the while true loop that catches datagrams, but instead of doing the processing in the main thread, I branch out into different threads.
Like this:
ArrayList<RecieveThread> threads = new ArrayList<RecieveThread>();
public void run(){
while (true){
//System.out.println("Waiting!");
byte[] data = new byte[1024];
DatagramPacket packet = new DatagramPacket(data, data.length);
try {
socket.receive(packet);
} catch (IOException e) {
e.printStackTrace();
}
//System.out.println("Recieved!");
String str = new String(packet.getData());
str = str.trim();
if (threads.size() < 50){
RecieveThread thr = new RecieveThread();
thr.packet = packet;
thr.str = str;
threads.add(thr);
thr.start();
}else{
boolean taskProcessed = false;
for (RecieveThread thr : threads){
if (!thr.nextTask){
thr.packet = packet;
thr.str = str;
thr.nextTask = true;
taskProcessed = true;
break;
}
}
if (!taskProcessed){
System.out.println("[Warning] All threads full! Defaulting to main thread!");
process(str, packet);
}
}
}
}
That is creating a new thread for every incoming datagram until it hits 50 packets, at which point it chooses to process in one of the existing threads that is waiting for a next task - And if all threads are processing, it defaults to the main thread.
So my question is this: How many threads is a good amount? I don't want to overload anybody's system (The same code will also be run on players' clients), but I also don't want to increase system packet loss.
Also, is different threads even a good idea? Does anybody have a better way of doing this?
Edit: Here is my RecieveThread class (class is 777 lines long):
String str;
DatagramPacket packet;
boolean nextTask = true;
public void run(){
while (true){
////System.out.println("CLIENT: " + str);
//BeforeGame
while (!nextTask){
//Nothing
}
<Insert processing code here that you neither know about, nor care to know about, nor is relevant to the issue. Still, I pastebinned it below>
}
}
Full receiving code

First and foremost, any system that uses datagrams (e.g. UDP) for communication has to be able to cope with dropped requests. They will happen. The best you can do is reduce the typical drop rate to something that is acceptable. But you also need to recognize that if your application can't cope with lost datagrams, then it should not be using datagrams. Use regular sockets instead.
Now to the question of how many threads to use. The answer is "it depends".
On the one hand, if you don't have enough threads, there could be unused hardware capacity (cores) that could be used at peak times ... but isn't.
If you have too many threads running (or runnable) at a time, they will be competing for resources at various levels:
competition for CPU
competition for memory bandwidth
contention on locks and shared memory.
All of these things (and associated 2nd order effects) can reduce throughput ... relative to the optimal ... if you have too many threads.
If your request processing involves talking to databases or servers on other machines, then you need enough threads to allow something else to happen while waiting for responses.
As a rule of thumb, if your requests are independent (minimal contention on shared data) and exclusively in-memory (no databases or external service requests) then one worker thread per core is a good place to start. But you need to be prepared to tune (and maybe re-tune) this.
Finally, there is the problem of dealing with overload. On the one hand, if the overload situation is transient, then queuing is a reasonable strategy ... provided that the queue doesn't get too deep. On the other hand, if you anticipate overload to be common, then the best strategy is to drop requests early.
However, there is a secondary problem. A dropped request will probably entail the client noticing that it hasn't gotten a reply in a given time, and resending then resending request. And that can lead to worse problems; i.e. the client resending a request before the server has actually dropped it ... which can lead to the same request being processed multiple times, and a catastrophic drop in effective throughput.
Note that the same thing can happen if you have too many threads and they get bogged down due to resource contention.

Probably just one thread, assuming you have one DatagramSocket. You could always spawn a processData thread from your thread that reads the UDPSocket. Like people said in the comments it's up to you, but usually one is good.
Edit:
Also look into mutexes if you do this.

Seems you're in the same question about the doubt of using NginX or Apache. Have you ever read about the NginX and the 10k problem? If not read about here. There is no "correct" answer for questions like this one. As the other mates have highlighted this question is about the needs (aspects) of your application's environment. Remember that we have so many framework for web: every framework solve the same problem what is serving html documents but using different ways to do the task.

High Performance in Multitasking inside Tomcat

In my web application running tomcat 6, an object (not Servlet) is scheduled to execute for reading files from a defined folder. After reading a file, the file content is saved into a database.
In order to have higher performance, multitasking is needed. My original approach is to build a new thread once a file is read, tasks of each file run in parallel in background. For example, if three files are found, three threads are created.
However, although tomcat configuration has set maxthreads to more than 200 as well as 32GB memory has been assigned, every time only 7-8 threads are running simultaneously. What's wrong? Or multithreading is not a best practice for multitasking? Please help.
Addition (14 Mar 2014)
Thanks for your advice. So my question can be more specific:
1. Can ThreadPoolExecutor improve the performance?
2. Can NIO improve the performance?
Here is the original code:
String[] listFiles = folder.list();
for(int i=0; i<listFiles.length; i++) {
synchronized(globalHashMap) {
MyTask myTask = new MyTask(listFiles[i]);
globalHashMap.put(listFiles[i], myTask );
myTask.start();
}
}
MyTask {
String myFile;
Thread myThread;
public MyTask(String file) {
myFile = file;
}
public void start() {
myThread = new Thread(new Runnable() {
do {
readCnt = bufferedInputStream.read(bytesArray, 1024, 1);
...
} while(not end);
postProcessFunction();
synchronized(globalHashMap) {
globalHashMap.remove(myFile);
globalHashMap.notifyAll();
}
}
myThread.start();
}
}

The maxThreads setting in Tomcat does not mean the max. # of threads the JVM can have. Tomcat has no control over that. It specifies the max. # of worker threads Tomcat itself will create to service incoming HTTP requests. Your Java code can still create any threads it needs.
As for why you only get 7 - 8 threads, I'd have to see the code to know for sure. How many files are in this directory?
I am not sure what analysis you've done, but I often hear "multi-threading" as the canned solution for making something faster and that is a very dangerous way to tackle things. Threading is meant to tackle very specific set of problems. It should be a last resort. Especially in a web application. Web containers use multiple class-loaders to deploy and undeploy and redeploy applications on the fly. Threads create a maintenance nightmare and often prevent proper class loader cleanup.
I have actually seen occasions where multi-threading masks a problem. When I first joined my current company, an effort was underway to multi-thread the process which deploys SQL scripts against our databases to apply bug fixes. The complaint was the process was too slow, so the solution of course was to do multiple DB's in parallel via multi-threading. I recently discovered that the script execution process runs a SQL statement (for GRANT) at the end of every script against every database that takes 2 minutes. This statement is rarely ever needed. If this process was properly profiled to begin with, my recommendation would have been to remove the unneeded code, which would have dropped the process from 2-3 hours to < 10 minutes. Now we are stuck maintaining a mess of thread management code.
So, now my question to you is, have you profiled your code? As #wallenborn pointed out, disk I/O may be the bottleneck. There could also be optimizations in your code that could be made.

The MaxThreads parameter in Tomcat only controls how many threads are used for serving web requests. There is no limit (besides available memory) on how many additional threads your web application can create. There must be something wrong with the code.

Creating new threads inside application than is run on application server is not good idea. This is bad practice. People usually say to never do this because you can run out of threads for processing http requests.
For solving your problem the best way is to utilize jms. You background task will send a message to jms broker to process each each file found on the disk. Jms broker can process messages multithreaded and very efficiently and will controll all multithreading for you.

Sporadic problems in running a multi-threaded Java project in Win7

I am working on a project that is both memory and computationally intensive. A significant portion of the execution utilizes multi-threading by a FixedThreadPool. In short; I have 1 thread for fetching data from several remote locations (using URL connections) and populating a BlockingQueue with objects to be analyzed and n threads that pick these objects and run the analysis. edit: see code below
Now this setup works like a charm on my Linux machine running OpenSUSE 11.3, but a colleague is testing it on a very similar machine running Win7 is getting custom notifications of timeouts on the queue polling (see code below), lots of them actually. I have been trying to monitor the processor use on her machine, and it appears that the software does not get any more than 15% of the CPUs while on my machine the processor usage hits the roof, just as I intended.
My question is, then, can this be a sign of "starvation" of the queue? Could it be so that the producer thread is not getting enough cpu time? If so how do I go about giving one particular thread in the pool higher priority?
UPDATE:
I have been trying to pinpoint the problem, with no joy... I did however gain some new insights.
Profiling the execution of the code with JVisualVM demonstrates a very peculiar behavior. The methods are called in short bursts of CPU-time with several seconds of no progress in between. This to me means that somehow the OS is hitting the brakes on the process.
Disabling the anti-virus and back-up daemons do not have any significant affect on the matter
Changing the priority of java.exe (the only instance) through task manager (adviced here) does not change anything either. (That being said, I could not give "realtime" priority to java, and had to be content with "high" prio)
Profiling the network usage shows good flow of data in and out, so I am guessing that is not the bottleneck (while it is a considerable part of the execution time of the process, but that I know already and is pretty much the same percentage as what I get on my Linux machine).
Any ideas as to how the Win7 OS might be limiting the cpu time to my project? if it's not the OS, what could be the limiting factor? I would like to stress yet again that the machine is NOT running any other computation intensive at the same time and there is almost no load on the cpus other than my software. This is driving me crazy...
EDIT: relevant code
public ConcurrencyService(Dataset d, QueryService qserv, Set<MyObject> s){
timeout = 3;
this.qs = qserv;
this.bq = qs.getQueue();
this.ds = d;
this.analyzedObjects = s;
this.drc = DebugRoutineContainer.getInstance();
this.started = false;
int nbrOfProcs = Runtime.getRuntime().availableProcessors();
poolSize = nbrOfProcs;
pool = (ThreadPoolExecutor) Executors.newFixedThreadPool(poolSize);
drc.setScoreLogStream(new PrintStream(qs.getScoreLogFile()));
}
public void serve() throws InterruptedException {
try {
this.ds.initDataset();
this.started = true;
pool.execute(new QueryingAction(qs));
for(;;){
MyObject p = bq.poll(timeout, TimeUnit.MINUTES);
if(p != null){
if (p.getId().equals("0"))
break;
pool.submit(new AnalysisAction(ds, p, analyzedObjects, qs.getKnownAssocs()));
}else
drc.log("Timed out while waiting for an object...");
}
} catch (Exception ex) {
ex.printStackTrace();
String exit_msg = "Unexpected error in core analysis, terminating execution!";
}finally{
drc.log("--DEBUG: Termination criteria found, shutdown initiated..");
drc.getMemoryInfo(true); // dump meminfo to log
pool.shutdown();
int mins = 2;
int nCores = poolSize;
long totalTasks = pool.getTaskCount(),
compTasks = pool.getCompletedTaskCount(),
tasksRemaining = totalTasks - compTasks,
timeout = mins * tasksRemaining / nCores;
drc.log("--DEBUG: Shutdown commenced, thread pool will terminate once all objects are processed, " +
"or will timeout in : " + timeout + " minutes... \n" + compTasks + " of " + (totalTasks -1) +
" objects have been analyzed so far, " + "mean process time is: " +
drc.getMeanProcTimeAsString() + " milliseconds.");
pool.awaitTermination(timeout, TimeUnit.MINUTES);
}
}
The class QueryingAction is a simple Runnable that calls the data acquisition method in the designated QueryService object which then populates a BlockingQueue. The AnalysisAction class does all the number-crunching for a single instance of MyObject.

I suspect the producer thread is not getting/loading the source data fast enough. This might not be a lack of CPU but an IO related issue. (not sure why you have time outs on your BlockingQueue)
It might be worth having a thread which periodically logs things like the number of tasks added and the length of the queue (e.g. every 5-15 seconds)

So, if I correctly understand your problem, you have one thread to fetch data, and several threads to analyse the fetched data. Your problem is that the threads are not correctly synchronized to run together and take full advantage of the processor.
You have a tipical producer-consumer problem with a single producer and several consumers.
I advise you to remake your code a bit to have, instead, several independent consumer threads that are always waiting for resources to be available and only then running. This way you guarantee the maximum processor use.
Consumer thread:
while (!terminate)
{
synchronized (Producer.getLockObject())
{
try
{
//sleep (no processing at all)
Producer.getLockObject().wait();
}
catch (Exceptions..)
}
MyObject p = Producer.getObjectFromQueue(); //this function should be synchronized
//Analyse fetched data, and submit it to somewhere...
}
Producer thread:
while (!terminate)
{
MyObject newData = fetchData(); //fetch data from remote location
addDataToQueueu(newData); //this should also be synchronized
synchronized (getLockObject())
{
//wake up one thread to deal with the data
getLockObject().notify();
}
}
You see that this way, your threads are always performing useful work or sleeping.
This is just draft code to exemplify.
See more explanation here: http://www.javamex.com/tutorials/wait_notify_how_to.shtml
and here: http://www.java-samples.com/showtutorial.php?tutorialid=306

Priority won't help, since the problem is not an issue of deciding who gets precious resources -- resource usage isn't maxed. The only way the producer thread would not be getting enough CPU time is if it wasn't ready-to-run. Priority won't help, since the problem is not an issue.
How many cores does the machine have? It's possible that the producer thread is running full speed and there still just isn't enough CPU to go around. It's also possible the producer is I/O bound.

You can try to separate the producer thread from the pool (i.e. create a distinct Thread and set the pool to have -1 the current capacity) and then set its priority to maximum via setPriority. See what happens, although priority rarely accounts for such a difference in performance.

When you say URL connection, do you mean local or remote? It could be that network speed is slowing your producer down

So after weeks of fiddling, wrestling in code and other types of suffering I think I had a breakthrough, "a moment of clarity" if you will...
I managed to show that the program can exhibits the same slow behavior on my Linux machine and can indeed run full throttle on the problematic Win-7 machine. The crux of the problem appears to be some sort of corruption of the system/cache files that are used to store the results of previous queries, and overall, speed up the analysis. You got to love the irony, in this case they appeared to be the reason for EXTREME slow analysis. In retrospect, I should have known (a la Occam's razor)...
I am still not sure what how the corruption occurs, but at least it's probably not related to different OS. Using the system files from my machine increases the output on the Win7 host up to about 40% only however. Profiling the process more has also revealed that, oddly enough, there is significantly more GC activity on Win7, which apparently took lots of CPU time from number crunching. Giving -Xmx2g takes care of excessive garbage collection and the CPU usage for the process shoots up to 95-96%, and threads run smoothly.
Now that my original question is answered, I have to say that overall java responsiveness is definitely better on Linux environment, even without allocating more heap memory, I can easily multi-task while I am running an extensive analysis in the background. Things are not as smooth in Win-7, e.x. resizing the GUI is significantly slow once the analysis takes off at full speed.
Thanks for all the replies, I am sorry for the partially misleading problem description. I merely shared what I found out while debugging to the best of my abilities. Anyways, I believe the bounty goes to Peter Lawrey, since he early on pointed to an I/O issue and it was his suggestion about a logger thread which eventually led me to the answer.

I would think it was some OS specific issue because that is the core difference between the two units. More specifically, something is slowing down the data arriving through the remote connection.
Find some traffic analysis tool such as Wireshark and/or Networx and try to discover if there is anything throttling the Win PC. Perhaps it is going through a proxy that has some kind of rate cap configured.

Sorry not really an answer but did not fit inside comment and still it is worth the read I think:
well i am not JAVA friendly
but i have recently the same problem with C++ projects for machine control through USB.
On XP or W2K all goes perfectly for months of 24/7 operation on any 2 or more core machine
On W7 and strong enough machine all goes OK but sometimes (cca 1x per few hours) freezes for few seconds without obvious reason.
On W7 and relatively weak machine (2 core 1.66GHz T2300E notebook) the threads are freezing for some time and run again which under/overflows USB/WIN/App FIFOs and collapse communication ...
it appears that nothing is blocked but the W7 sheduler just do not give CPU to the right threads occasionally.
i thought that USB driver (JUNGO) communication freezes bud that is not true I measured it and it is OK even in freeze
the freeze was about 6-15 seconds cca once per minute.
after adding some safety sleeps to threads loops the freeze has shorten to about 0.5 sec
but still there
even if App do not Under/Overflows FIFOs the windows USB driver side do (few times per minute for few ms)
Change of exe/threads priority and class do not affect performance on W7 (on XP,W2K work as it should)
As you can see it seems we have most likely the same problem. In my case:
is not I/O related (when i replace USB thread with simulation of device it behaves similar)
adding Sleep to time critical code helps a lot
error is present also in low count of threads [2 fast (17ms) + 1 slow (250ms) + App code = 4]
my CPU consumption on W7 slow machine is also not 100% but about 95% which is OK because I have sleeps everywhere
my Apps use about 40-100MB of memory but are CPU computation demanding ...
but not that much it could run safely on much slower machines
but because of USB driver connection and multiple device support it need at least 2 cores
my next step is to add some kind of execution time logging/analyze to see what is happening in more detail
and also little rewrite of send/receive threads to see if it helps
When i learn something new/useful will add it.

Java NIO Threading issue with SocketChannel.write()

Sometimes, while sending a large amount of data via SocketChannel.write(), the underlying TCP buffer gets filled up, and I have to continually re-try the write() until the data is all sent.
So, I might have something like this:
public void send(ByteBuffer bb, SocketChannel sc){
sc.write(bb);
while (bb.remaining()>0){
Thread.sleep(10);
sc.write(bb);
}
}
The problem is that the occasional issue with a large ByteBuffer and an overflowing underlying TCP buffer means that this call to send() will block for an unexpected amount of time. In my project, there are hundreds of clients connected simultaneously, and one delay caused by one socket connection can bring the whole system to a crawl until this one delay with one SocketChannel is resolved. When a delay occurs, it can cause a chain reaction of slowing down in other areas of the project, and having low latency is important.
I need a solution that will take care of this TCP buffer overflow issue transparently and without causing everything to block when multiple calls to SocketChannel.write() are needed. I have considered putting send() into a separate class extending Thread so it runs as its own thread and does not block the calling code. However, I am concerned about the overhead necessary in creating a thread for EACH socket connection I am maintaining, especially when 99% of the time, SocketChannel.write() succeeds on the first try, meaning there's no need for the thread to be there. (In other words, putting send() in a separate thread is really only needed if the while() loop is used -- only in cases where there is a buffer issue, perhaps 1% of the time) If there is a buffer issue only 1% of the time, I don't need the overhead of a thread for the other 99% of calls to send().
I hope that makes sense... I could really use some suggestions. Thanks!

Prior to Java NIO, you had to use one Thread per socket to get good performance. This is a problem for all socket based applications, not just Java. Support for non-blocking IO was added to all operating systems to overcome this. The Java NIO implementation is based on Selectors.
See The definitive Java NIO book and this On Java article to get started. Note however, that this is a complex topic and it still brings some multithreading issues into your code. Google "non blocking NIO" for more information.

The more I read about Java NIO, the more it gives me the willies. Anyway, I think this article answers your problem...
http://weblogs.java.net/blog/2006/05/30/tricks-and-tips-nio-part-i-why-you-must-handle-opwrite
It sounds like this guy has a more elegant solution than the sleep loop.
Also I'm fast coming to the conclusion that using Java NIO by itself is too dangerous. Where I can, I think I'll probably use Apache MINA which provides a nice abstraction above Java NIO and its little 'surprises'.

You don't need the sleep() as the write will either return immediately or block.
You could have an executor which you pass the write to if it doesn't write the first time.
Another option is to have a small pool of thread to perform the writes.
However, the best option for you may be to use a Selector (as has been suggested) so you know when a socket is ready to perform another write.

For hundreds of connections, you probably don't need to bother with NIO. Good old fashioned blocking sockets and threads will do you.
With NIO, you can register interest in OP_WRITE for the selection key, and you will get notified when there is room to write more data.

There are a few things you need to do, assuming you already have a loop using
Selector.select(); to determine which sockets are ready for I/O.
Set the socket channel to non-blocking after you've created it, sc.configureBlocking(false);
Write (possibly parts of) the buffer and check if there's anything left. The buffer itself takes care of current position and how much is left.
Something like
sc.write(bb);
if(sc.remaining() == 0)
//we're done with this buffer, remove it from the select set if there's nothing else to send.
else
//do other stuff/return to select loop
Get rid of your while loop that sleeps

I am facing some of the same issues right now:
- If you have a small amount of connections, but with large transfers, I would just create a threadpool, and let the writes block for the writer threads.
- If you have a lot of connections then you could use full Java NIO, and register OP_WRITE on your accept()ed sockets, and then wait for the selector to come in.
The Orielly Java NIO book has all this.
Also:
http://www.exampledepot.com/egs/java.nio/NbServer.html?l=rel
Some research online has led me to believe NIO is pretty overkill unless you have a lot of incoming connections. Otherwise, if its just a few large transfers - then just use a write thread. It will probably have quicker response. A number of people have issues with NIO not repsonding as quick as they want. Since your write thread is on its own blocking it wont hurt you.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.