High Performance in Multitasking inside Tomcat

High Performance in Multitasking inside Tomcat - java

In my web application running tomcat 6, an object (not Servlet) is scheduled to execute for reading files from a defined folder. After reading a file, the file content is saved into a database.
In order to have higher performance, multitasking is needed. My original approach is to build a new thread once a file is read, tasks of each file run in parallel in background. For example, if three files are found, three threads are created.
However, although tomcat configuration has set maxthreads to more than 200 as well as 32GB memory has been assigned, every time only 7-8 threads are running simultaneously. What's wrong? Or multithreading is not a best practice for multitasking? Please help.
Addition (14 Mar 2014)
Thanks for your advice. So my question can be more specific:
1. Can ThreadPoolExecutor improve the performance?
2. Can NIO improve the performance?
Here is the original code:
String[] listFiles = folder.list();
for(int i=0; i<listFiles.length; i++) {
synchronized(globalHashMap) {
MyTask myTask = new MyTask(listFiles[i]);
globalHashMap.put(listFiles[i], myTask );
myTask.start();
}
}
MyTask {
String myFile;
Thread myThread;
public MyTask(String file) {
myFile = file;
}
public void start() {
myThread = new Thread(new Runnable() {
do {
readCnt = bufferedInputStream.read(bytesArray, 1024, 1);
...
} while(not end);
postProcessFunction();
synchronized(globalHashMap) {
globalHashMap.remove(myFile);
globalHashMap.notifyAll();
}
}
myThread.start();
}
}

The maxThreads setting in Tomcat does not mean the max. # of threads the JVM can have. Tomcat has no control over that. It specifies the max. # of worker threads Tomcat itself will create to service incoming HTTP requests. Your Java code can still create any threads it needs.
As for why you only get 7 - 8 threads, I'd have to see the code to know for sure. How many files are in this directory?
I am not sure what analysis you've done, but I often hear "multi-threading" as the canned solution for making something faster and that is a very dangerous way to tackle things. Threading is meant to tackle very specific set of problems. It should be a last resort. Especially in a web application. Web containers use multiple class-loaders to deploy and undeploy and redeploy applications on the fly. Threads create a maintenance nightmare and often prevent proper class loader cleanup.
I have actually seen occasions where multi-threading masks a problem. When I first joined my current company, an effort was underway to multi-thread the process which deploys SQL scripts against our databases to apply bug fixes. The complaint was the process was too slow, so the solution of course was to do multiple DB's in parallel via multi-threading. I recently discovered that the script execution process runs a SQL statement (for GRANT) at the end of every script against every database that takes 2 minutes. This statement is rarely ever needed. If this process was properly profiled to begin with, my recommendation would have been to remove the unneeded code, which would have dropped the process from 2-3 hours to < 10 minutes. Now we are stuck maintaining a mess of thread management code.
So, now my question to you is, have you profiled your code? As #wallenborn pointed out, disk I/O may be the bottleneck. There could also be optimizations in your code that could be made.

The MaxThreads parameter in Tomcat only controls how many threads are used for serving web requests. There is no limit (besides available memory) on how many additional threads your web application can create. There must be something wrong with the code.

Creating new threads inside application than is run on application server is not good idea. This is bad practice. People usually say to never do this because you can run out of threads for processing http requests.
For solving your problem the best way is to utilize jms. You background task will send a message to jms broker to process each each file found on the disk. Jms broker can process messages multithreaded and very efficiently and will controll all multithreading for you.

Related

Using a disk backed queue

Under certain conditions, one of our servers running legacy code in a Wildfly application server, is suffering thread starvation and needs to be restarted.
After an arduous investigation, I stumbled upon this silly code:
private void addToQueue (Item e) {
if (!_queue.offer(e, 200L, TimeUnit.MILLISECONDS)) {
ThreadService.getInstance().schedule("retry process", () -> {
addToQueue(e);
return null;
}, 5L, TimeUnit.SECONDS);
}
}
ThreadService is the Wildfly implementation of the Java SE Executor Service, that provides a limited number of threads (16).
Sometimes, during reconnections between services, we receive in short time a huge amount of items to be processed (~100k), and as you can see it spams the ThreadService with scheduled tasks.
An obvious solution would be to increase the queue, which has currently a capacity of 20k. However I am afraid this would end up in other problems. Obviously this thread creation spamming needs to be eliminated.
Since processing these items is a non critical task I was thinking on using a disk backed queue, so it can be done in a separated process at slow pace.
Searching a bit, I have seen this project: Tape by Square
I would like to know your opinion about this solution, which kind of reminds me a bit to the pipes in Linux I used years ago. What do you think?

does multi threading improve performance? scenario java [duplicate]

This question already has answers here:
Does multi-threading improve performance? How?
(2 answers)
Closed 8 years ago.
I have a List<Object> objectsToProcess.Lets say it contains 1000000 item`s. For all items in the array you then process each one like this :
for(Object : objectsToProcess){
Go to database retrieve data.
process
save data
}
My question is : would multi threading improve performance? I would of thought that multi threads are allocated by default by the processor anyways?

In the described scenario, given that process is a time-consuming task, and given that the CPU has more than one core, multi-threading will indeed improve the performance.
The processor is not the one who allocates the threads. The processor is the one who provides the resources (virtual CPUs / virtual processors) that can be used by threads by providing more than one execution unit / execution context. Programs need to create multiple threads themselves in order to utilize multiple CPU cores at the same time.
The two major reasons for multi-threading are:
Making use of multiple CPU cores which would otherwise be unused or at least not contribute to reducing the time it takes to solve a given problem - if the problem can be divided into subproblems which can be processed independently of each other (parallelization possible).
Making the program act and react on multiple things at the same time (i.e. Event Thread vs. Swing Worker).
There are programming languages and execution environments in which threads will be created automatically in order to process problems that can be parallelized. Java is not (yet) one of them, but since Java 8 it's on a good way to that, and Java 9 maybe will bring even more.
Usually you do not want significantly more threads than the CPU provides CPU cores, for the simple reason that thread-switching and thread-synchronization is overhead that slows down.
The package java.util.concurrent provides many classes that help with typical problems of multithreading. What you want is an ExecutorService to which you assign the tasks that should be run and completed in parallel. The class Executors provides factor methods for creating popular types of ExecutorServices. If your problem just needs to be solved in parallel, you might want to go for Executors.newCachedThreadPool(). If your problem is urgent, you might want to go for Executors.newWorkStealingPool().
Your code thus could look like this:
final ExecutorService service = Executors.newWorkStealingPool();
for (final Object object : objectsToProcess) {
service.submit(() -> {
Go to database retrieve data.
process
save data
}
});
}
Please note that the sequence in which the objects would be processed is no longer guaranteed if you go for this approach of multithreading.
If your objectsToProcess are something which can provide a parallel stream, you could also do this:
objectsToProcess.parallelStream().forEach(object -> {
Go to database retrieve data.
process
save data
});
This will leave the decisions about how to handle the threads to the VM, which often will be better than implementing the multi-threading ourselves.
Further reading:
http://docs.oracle.com/javase/tutorial/collections/streams/parallelism.html#executing_streams_in_parallel
http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/package-summary.html

Depends on where the time is spent.
If you have a load of calculations to do then allocating work to more threads can help, as you say each thread may execute on a separate CPU. In such a situation there is no value in having more threads than CPUs. As Corbin says you have to figure out how to split the work across the threads and have responsibility for starting the threads, waiting for completion and aggregating the results.
If, as in your case, you are waiting for a database then there can be additional value in using threads. A database can serve several requests in paraallel (the database server itself is multi-threaded) so instead of coding
for(Object : objectsToProcess){
Go to database retrieve data.
process
save data
}
Where you wait for each response before issuing the next, you want to have several worker threads each performing
Go to database retrieve data.
process
save data
Then you get better throughput. The trick though is not to have too many worker threads. Several reasons for that:
Each thread is uses some resources, it has it's own stack, its own
connection to the database. You would not want 10,000 such threads.
Each request uses resources on the server, each connection uses memory, each database server will only serve so many requests in parallel. You have no benefit in submitting thousands of simultaneous requests if it can only server tens of them in parallel. Also If the database is shared you probably don't want to saturate the database with your requests, you need to be a "good citizen".
Net: you will almost certainly get benefit by having a number of worker threads. The number of threads that helps will be determined by factors such as the number of CPUs you have and the ratio between the amount of processing you do and the response time from the DB. You can only really determine that by experiment, so make the number of threads configurable and investigate. Start with say 5, then 10. Keep your eye on the load on the DB as you increase the number of threads.

How can i implement multithreading in java to process 2 million text files?

I have to process around 2 million text files and generate there triples.
Suppose I have a txt file xyz.txt(one of the files of 2 million input) , it is processed as below:
start(xyz.txt)---->module1(xyz.tpd)------>module2(xyz.adv)-------->module3(xyz.tpl)
suggest me a logic or concept so that i can process faster and in an optimized way on x64 4GB windows systems.
module1(working): it parses the txt file using a .bat file in which parser is invoked, it is a separate system thread and after 15 seconds it again starts parsing another txt file, and so on....
module2(working): it accepts .tpd file as input and generates .adv file.
module3(working): it accepts .adv file as input and generates .tpl(triples).
should i start threads from txt files or at some other point..?
i am afraid that if i the CPU get stuck in context switching.
can anyone have a better logic, so that i can try it..!?

Use a ThreadPoolExecutor .Tune it's parameters like number of active threads and others to suit your environment and system.

Most importantly, you have to write the program, profile it, and see where the bottleneck is. It is more than probable that the disk I/O operations will be the bottleneck and no amount of multithreading will solve your problems.
In that case using two(three? four?) separate hard drives may yield more speed gain than the best multithreaded solution.
Furthermore, the general rule is that you should optimize your application only when you have working code and you really know what to optimize. Profile, profile, profile.
Taking the future multithreaded optimizations into account when writing is OK; the architecture should be flexible enough to allow for future optimizations.

There is not much told here about your hardware environment; but the basic solution would be to use a fixed-size ExecutorService, where the size would, at first, be the number of your execution units:
private static final int NR_CPUS = Runtime.getRuntime().availableProcessors();
// Then:
final ExecutorService executor = Executors.newFixedThreadPool(NR_CPUS);
Then, for each file, you can create a Runnable to process it, and submit it to the thread pool using its .execute() method.
Note that .execute() is asynchronous; if the submitted runnable cannot be run right now, it will be queued.

..sounds like a typical batch application needed for data integration. Although, I do not intend to throw hyperlinks without completely understanding your needs at you, but, probably you need a solution which should work in a single VM and over the period of time you like to extend the solution for multiple VM/machines.. and may be we are not dealing with PBs of data to start with.. try Spring Batch not only will it solve the problem in the given context you will learn to structure your thoughts (think vocabulary!) to solve similar problems..

As a starting point, I would create one IO thread and a pool of CPU threads. The IO thread reads in text files and offers them to a BlockingQueue, while the CPU threads take the files from the BlockingQueue and process them. Then profile the application to see how many CPU threads you should use to keep pace with the IO thread (you can also dynamically determine this, e.g. start with one CPU thread and start another when the size of the BlockingQueue exceeds a threshold, probably something along the lines of 20 files). It's possible that you'll find that you only need one CPU thread to keep pace with the IO thread, in which case your program is IO bound and you'll need to e.g. place the text files next to each other on disk (so that you can use sequential reads on all but the first file) or put them on separate disks in order to speed up the application; one idea is to zip the files together and read them in with a ZipInputStream - this will reduce the number of disk seeks when reading the files and will also reduce the amount of data you need to read

Simulating OS processes in java / preemptive thread stop / classloader + thread process simulation

I have a thread pool (executor) which I would like to monitor for excessive resource usage (time since cpu and memory seem to be quite harder). I would like to 'kill' threads that are running too long, like killing an OS process. The workers spend most time calculating, but significant time is also spent waiting for I/O, mostly database...
I have been reading up on stopping threads in java and how it is deprecated for resource cleanup reasons (not properly releasing locks, closing sockets and files and so on). The recommended way is to periodically check in a worker thread whether it should stop and then exit. This obviously expect that client threads be written in certain ways and that they are not blocked waiting on some external I/O. There is also ThreadDeth and InterruptedException which might be able to do the job, but they may actually be circumvented in improperly/malicously written worker threads, and also I got an impression (though no testing yet) they InterruptedException might not work properly in some (or even all) cases when the worker thread is waiting for I/O.
Another way to mitigate it would be to use multiple OS processes to isolate parts of the system, but it brings some unwanted increases in resource consumption.
That led me to that old story about isolates and/or MVM from more than five years ago, but nothing seems to have happened on that front, maybe in java 8 or 9...
So, actually, this all has made me to wander whether some poor mans simulation of processes could be achieved through using threads that would each have their own classloader? Could that be used to simulate processes if each thread (or group) would be loaded in its own classloader? I am not sure how much an increase in resource consumption would that bring (as there would not be much code sharing and code is not tiny). At least process copy-on-write semantics enable code sharing..
Any recommendations/ideas?
EDIT:
I am asking because of general interest and kind of disappointment that no solutions for this exist in the JVM to date (I mean shared application servers are not really possible - application domains, or something like that, in .NET seem to address exactly this kind of problem). I understand that killing a process does not guarantee reverting all system state to some initial condition, but at least all resorces like handles, memory and cpu are released. I was thinking of using classloaders since they might help with releasing locks held by the thread, which is one of the reasons that Thread.stop is deprecated. In my current situation the only other thing should be released (I can think of currently) is database connection, that could be handled separately/externally (by watchdog thread) if needed..
Though, really, in my case Thread.stop might actually be workable, I just dislike using deprecated methods..
Also I am considering this as a safety net for misbehaving processes, Ideally they should behave nicely, and are in a quite high degree under my control.
So, to clarify, I am asking how do for example java people on the server side handle runaway threads? I suspect by using many machines in the cluster to offset the problem and restarting misbehaving ones - when the application is stateless at least..

The difference between a thread and a process is that thread implicitly share memory and resources like sockets and files (making thread local memory a workaround). Processes implicitly have private memory and resources.
Killing the thread is not the problem. The problem is that a poorly behaving thread or even a reasonable behaviour thread can leave resources in an inconsistent state. Using a class loader will not help you track this, or solve the problem for you. For processes its easier to track what resources they are using, as most of the resources are isolated. Even for processes they can leave locks, temporary files and shared IPC resources in an incorrect state if killed.
The real solution is to write code which behaves properly so it can be managed and working around and trying to handle every possible poorly behaving code is next to impossible. If you have a bad third party library you have to use, you can try killing and cleaning it up and you can come up with an ok solution, but you can't expect it to be a clean one.
EDIT: Here is a simple program which will deadlock between two processes or machines because it has a bug in it. The way to stop deadlocks is to fix the code.
public static void main(String... args) throws IOException {
switch(args.length) {
case 1: {
// server
ServerSocket ss = new ServerSocket(Integer.parseInt(args[0]));
Socket s = ss.accept();
ObjectInputStream ois = new ObjectInputStream(s.getInputStream());
ObjectOutputStream oos = new ObjectOutputStream(s.getOutputStream());
// will deadlock before it gets here
break;
}
case 2: {
Socket s = new Socket(args[0], Integer.parseInt(args[1]));
ObjectInputStream ois = new ObjectInputStream(s.getInputStream());
ObjectOutputStream oos = new ObjectOutputStream(s.getOutputStream());
// will deadlock before it gets here
break;
}
default:
System.err.println("Must provide either a port as server or hostname port as client");
}
}

Sporadic problems in running a multi-threaded Java project in Win7

I am working on a project that is both memory and computationally intensive. A significant portion of the execution utilizes multi-threading by a FixedThreadPool. In short; I have 1 thread for fetching data from several remote locations (using URL connections) and populating a BlockingQueue with objects to be analyzed and n threads that pick these objects and run the analysis. edit: see code below
Now this setup works like a charm on my Linux machine running OpenSUSE 11.3, but a colleague is testing it on a very similar machine running Win7 is getting custom notifications of timeouts on the queue polling (see code below), lots of them actually. I have been trying to monitor the processor use on her machine, and it appears that the software does not get any more than 15% of the CPUs while on my machine the processor usage hits the roof, just as I intended.
My question is, then, can this be a sign of "starvation" of the queue? Could it be so that the producer thread is not getting enough cpu time? If so how do I go about giving one particular thread in the pool higher priority?
UPDATE:
I have been trying to pinpoint the problem, with no joy... I did however gain some new insights.
Profiling the execution of the code with JVisualVM demonstrates a very peculiar behavior. The methods are called in short bursts of CPU-time with several seconds of no progress in between. This to me means that somehow the OS is hitting the brakes on the process.
Disabling the anti-virus and back-up daemons do not have any significant affect on the matter
Changing the priority of java.exe (the only instance) through task manager (adviced here) does not change anything either. (That being said, I could not give "realtime" priority to java, and had to be content with "high" prio)
Profiling the network usage shows good flow of data in and out, so I am guessing that is not the bottleneck (while it is a considerable part of the execution time of the process, but that I know already and is pretty much the same percentage as what I get on my Linux machine).
Any ideas as to how the Win7 OS might be limiting the cpu time to my project? if it's not the OS, what could be the limiting factor? I would like to stress yet again that the machine is NOT running any other computation intensive at the same time and there is almost no load on the cpus other than my software. This is driving me crazy...
EDIT: relevant code
public ConcurrencyService(Dataset d, QueryService qserv, Set<MyObject> s){
timeout = 3;
this.qs = qserv;
this.bq = qs.getQueue();
this.ds = d;
this.analyzedObjects = s;
this.drc = DebugRoutineContainer.getInstance();
this.started = false;
int nbrOfProcs = Runtime.getRuntime().availableProcessors();
poolSize = nbrOfProcs;
pool = (ThreadPoolExecutor) Executors.newFixedThreadPool(poolSize);
drc.setScoreLogStream(new PrintStream(qs.getScoreLogFile()));
}
public void serve() throws InterruptedException {
try {
this.ds.initDataset();
this.started = true;
pool.execute(new QueryingAction(qs));
for(;;){
MyObject p = bq.poll(timeout, TimeUnit.MINUTES);
if(p != null){
if (p.getId().equals("0"))
break;
pool.submit(new AnalysisAction(ds, p, analyzedObjects, qs.getKnownAssocs()));
}else
drc.log("Timed out while waiting for an object...");
}
} catch (Exception ex) {
ex.printStackTrace();
String exit_msg = "Unexpected error in core analysis, terminating execution!";
}finally{
drc.log("--DEBUG: Termination criteria found, shutdown initiated..");
drc.getMemoryInfo(true); // dump meminfo to log
pool.shutdown();
int mins = 2;
int nCores = poolSize;
long totalTasks = pool.getTaskCount(),
compTasks = pool.getCompletedTaskCount(),
tasksRemaining = totalTasks - compTasks,
timeout = mins * tasksRemaining / nCores;
drc.log("--DEBUG: Shutdown commenced, thread pool will terminate once all objects are processed, " +
"or will timeout in : " + timeout + " minutes... \n" + compTasks + " of " + (totalTasks -1) +
" objects have been analyzed so far, " + "mean process time is: " +
drc.getMeanProcTimeAsString() + " milliseconds.");
pool.awaitTermination(timeout, TimeUnit.MINUTES);
}
}
The class QueryingAction is a simple Runnable that calls the data acquisition method in the designated QueryService object which then populates a BlockingQueue. The AnalysisAction class does all the number-crunching for a single instance of MyObject.

I suspect the producer thread is not getting/loading the source data fast enough. This might not be a lack of CPU but an IO related issue. (not sure why you have time outs on your BlockingQueue)
It might be worth having a thread which periodically logs things like the number of tasks added and the length of the queue (e.g. every 5-15 seconds)

So, if I correctly understand your problem, you have one thread to fetch data, and several threads to analyse the fetched data. Your problem is that the threads are not correctly synchronized to run together and take full advantage of the processor.
You have a tipical producer-consumer problem with a single producer and several consumers.
I advise you to remake your code a bit to have, instead, several independent consumer threads that are always waiting for resources to be available and only then running. This way you guarantee the maximum processor use.
Consumer thread:
while (!terminate)
{
synchronized (Producer.getLockObject())
{
try
{
//sleep (no processing at all)
Producer.getLockObject().wait();
}
catch (Exceptions..)
}
MyObject p = Producer.getObjectFromQueue(); //this function should be synchronized
//Analyse fetched data, and submit it to somewhere...
}
Producer thread:
while (!terminate)
{
MyObject newData = fetchData(); //fetch data from remote location
addDataToQueueu(newData); //this should also be synchronized
synchronized (getLockObject())
{
//wake up one thread to deal with the data
getLockObject().notify();
}
}
You see that this way, your threads are always performing useful work or sleeping.
This is just draft code to exemplify.
See more explanation here: http://www.javamex.com/tutorials/wait_notify_how_to.shtml
and here: http://www.java-samples.com/showtutorial.php?tutorialid=306

Priority won't help, since the problem is not an issue of deciding who gets precious resources -- resource usage isn't maxed. The only way the producer thread would not be getting enough CPU time is if it wasn't ready-to-run. Priority won't help, since the problem is not an issue.
How many cores does the machine have? It's possible that the producer thread is running full speed and there still just isn't enough CPU to go around. It's also possible the producer is I/O bound.

You can try to separate the producer thread from the pool (i.e. create a distinct Thread and set the pool to have -1 the current capacity) and then set its priority to maximum via setPriority. See what happens, although priority rarely accounts for such a difference in performance.

When you say URL connection, do you mean local or remote? It could be that network speed is slowing your producer down

So after weeks of fiddling, wrestling in code and other types of suffering I think I had a breakthrough, "a moment of clarity" if you will...
I managed to show that the program can exhibits the same slow behavior on my Linux machine and can indeed run full throttle on the problematic Win-7 machine. The crux of the problem appears to be some sort of corruption of the system/cache files that are used to store the results of previous queries, and overall, speed up the analysis. You got to love the irony, in this case they appeared to be the reason for EXTREME slow analysis. In retrospect, I should have known (a la Occam's razor)...
I am still not sure what how the corruption occurs, but at least it's probably not related to different OS. Using the system files from my machine increases the output on the Win7 host up to about 40% only however. Profiling the process more has also revealed that, oddly enough, there is significantly more GC activity on Win7, which apparently took lots of CPU time from number crunching. Giving -Xmx2g takes care of excessive garbage collection and the CPU usage for the process shoots up to 95-96%, and threads run smoothly.
Now that my original question is answered, I have to say that overall java responsiveness is definitely better on Linux environment, even without allocating more heap memory, I can easily multi-task while I am running an extensive analysis in the background. Things are not as smooth in Win-7, e.x. resizing the GUI is significantly slow once the analysis takes off at full speed.
Thanks for all the replies, I am sorry for the partially misleading problem description. I merely shared what I found out while debugging to the best of my abilities. Anyways, I believe the bounty goes to Peter Lawrey, since he early on pointed to an I/O issue and it was his suggestion about a logger thread which eventually led me to the answer.

I would think it was some OS specific issue because that is the core difference between the two units. More specifically, something is slowing down the data arriving through the remote connection.
Find some traffic analysis tool such as Wireshark and/or Networx and try to discover if there is anything throttling the Win PC. Perhaps it is going through a proxy that has some kind of rate cap configured.

Sorry not really an answer but did not fit inside comment and still it is worth the read I think:
well i am not JAVA friendly
but i have recently the same problem with C++ projects for machine control through USB.
On XP or W2K all goes perfectly for months of 24/7 operation on any 2 or more core machine
On W7 and strong enough machine all goes OK but sometimes (cca 1x per few hours) freezes for few seconds without obvious reason.
On W7 and relatively weak machine (2 core 1.66GHz T2300E notebook) the threads are freezing for some time and run again which under/overflows USB/WIN/App FIFOs and collapse communication ...
it appears that nothing is blocked but the W7 sheduler just do not give CPU to the right threads occasionally.
i thought that USB driver (JUNGO) communication freezes bud that is not true I measured it and it is OK even in freeze
the freeze was about 6-15 seconds cca once per minute.
after adding some safety sleeps to threads loops the freeze has shorten to about 0.5 sec
but still there
even if App do not Under/Overflows FIFOs the windows USB driver side do (few times per minute for few ms)
Change of exe/threads priority and class do not affect performance on W7 (on XP,W2K work as it should)
As you can see it seems we have most likely the same problem. In my case:
is not I/O related (when i replace USB thread with simulation of device it behaves similar)
adding Sleep to time critical code helps a lot
error is present also in low count of threads [2 fast (17ms) + 1 slow (250ms) + App code = 4]
my CPU consumption on W7 slow machine is also not 100% but about 95% which is OK because I have sleeps everywhere
my Apps use about 40-100MB of memory but are CPU computation demanding ...
but not that much it could run safely on much slower machines
but because of USB driver connection and multiple device support it need at least 2 cores
my next step is to add some kind of execution time logging/analyze to see what is happening in more detail
and also little rewrite of send/receive threads to see if it helps
When i learn something new/useful will add it.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.