Android 3g downloads and multiple threads - java

If you have 10 tasks, and each one takes 5 seconds to download, so 50 seconds total, would you be able to accomplish the task quicker by running 10 parallel threads and downloading all 10 at once?
I think it would take slower, but I don't really know.
Would this hold true of Android? Would running 10 parallel downloads be faster then running them in a single priority queue?
edit: if it matters I'm seeing a average .20 Mbps - .50 Mbps average in the locations where the downloads would likely take place.

It depends where the bottleneck is.
If you're downloading from 10 different servers and they're all serving up the data relatively slowly, it's possible you have enough bandwidth on the device to download from all 10 (or at least some number greater than one) at the same time, in which case multiple threads will be faster.
However, if a single download from one server is already enough to use up all of your bandwidth, then 10 at the same time isn't going to be any faster. And the extra overhead and contention for resources will likely make it slower.

Related

Test Plan is not giving me accurate results

So I am wondering if someone can help me please, I am trying to load test a java rest application with thousands of requests per minute but something is a miss and I can't quite figure out what's happening.
I am executing the load test from my laptop (via terminal) and it's hitting a number of servers in AWS.
I am using the infinite loop function to fire in a lot of requests repeatedly. The 3 thread groups have the same config.
The problem I am having is the CPU is rising very high and the numbers do not match on what I have in production with the same enviornment with regards to CPU etc, the JMeter load test seems to be making the CPU work harder on my test enviorment.
Question 1 - Is my load test too fast for the server?
Question 2 - Is there a way to space out the load test so that I can say 10k rpm exactly?
Question 1 - Is my load test too fast for the server? - we don't know, if you see high CPU usage using 60 threads only and application responds slowly due to high CPU usage it looks like a bottleneck. Another question is the size of the machine, the number of processors and their frequency. So you need to find which function is consuming the CPU cycles using a profiler tool and look for the way to optimize the function
Question 2 - Is there a way to space out the load test so that I can say 10k rpm exactly? - it is, check out Constant Throughput Timer, but be aware of the next 2 facts:
Constant Throughput Timer can only pause the threads to limit JMeter's requests execution speed to the specified number of requests per minute. So you need to make sure to create sufficient number of threads in Thread Group(s) to produce the desired load, 60 might be not enough.
Application needs to be able to respond fast enough, i.e. 10000 requests per minute is approx 166 requests per second, with 60 threads it means that each thread needs to execute 2.7 requests per second which means that response time needs to be 370 ms or less
There are different aspects before we got for 10k requests.
Configure the tests for one user(/thread) and execute. Check for all
the request we are getting a proper response.
Incrementally increase the number of threads from 1 user, 5 users, 10 users, 20
users, 50 users etc.
Try for different duration scenarios like 10mins, 20 mins, 30 mins, 1 hour etc.
Collect metrics like error %, response time, number of request etc..
You can check probable breakpoints like:
CPU utilisaztion of machine getting high(100 %) from where you are executing the tests. in this case, you can setup few machines in master-slave configuration
error % getting high. Server may not be able to respond, so it might have crashed.
response time getting high. server may be getting busy due to the load.
Also, make sure, you have a reliable connectivity and bandwidth. Just imagine, you want to give a huge load, but the connection you have in few kbps. your tests will fail due to this.

How to estimate the authntiaction time

I designed an authentication protocol using java. The average excution time for a single authentication on my Desktop computer is 2.87 ms. My computer has the following specification. Windows 10 with a 1.99 GHz Intel Core i7 and 8GB of RAM.
If a number of users say 10 users preform the authentication simultaneously. What is the total computational time. Can I just say (2.87*10)?
A typical Core i7 CPU has between 4 and 8 cores, and can execute 6..12 threads in parallel (# cores times hyperthreading).
Assuming
you have an i7-8650U (1.9GHz, 4 cores, hyper-threaded),
your Java server is multithreaded (which should be the case if you use any popular implementation like Tomcat, Jetty, or alike) and
there is no other CPU-intensive workloads running
You can say your server can handle 8 authentication requests simultaneously: 4 at full speed + 4 more at ~30% speed because of hyper-threading, or 2.87ms*1.3=3.73ms for every 8 users.
10 users will thus take around ~ 3.73ms+2.87ms = 6.6ms.
What's important when measuring Java performance, however, is to measure steady state under load, in order to take garbage collection overhead into account. When measuring a single request, you may often miss the garbage collection step entirely.
You can't say that the time will be multiplied by the number of users.
First of all because it would mean that your authentication mechanisms works on 1 threads, which would be terrible.
Then the time registered on your computer will for sure be different on the production or staging environment. You will have to add network hops, but on the others end servers that would do only this, but you cannot compare this time to the one on your personal computer.
If you want to test this kind of things, use performance/load testing tools. I can recommend JMeter and open source tool from the Apache foundation, or Gatling another open source tool for this.
They are designed to be used on such use case. With them you could then call your authentication API, for example with 100 users in 10 seconds, and you would see reports in the end which will give you your answer.

How to prevent physical memory consuming when running parallel Java processes

I have big list (up to 500 000) of some functions.
My task is to generate some graph for each function (it can be do independently from other functions) and dump output to the file (it can be several files).
The process of generating graphs can be time consuming.
I also have server with 40 physical cores and 128GB ram.
I have tried to implement parallel processing using java Threads/ExecutorPool, but it seems not to use processors all resources.
On some inputs the program takes up to 25 hours to run and only 10-15 cores are working according to htop.
So the second thing I've tried is to create 40 distinct processes (using Runtime.exec) and split the list among them.
This method uses processor all resources (100% load on all 40 cores) and speedups performance up to 5 times on previous example (it takes only 5 hours which is reasonable for my task).
But the problem of this method is that, each java process runs separately and consumes memory independently from others. Is some scenarios all 128gb of ram is consumed after 5 minutes of parallel work. One solution that I am using now is to call System.gc() for each process if Runtime.totalMemory > 2GB. This slows down overall performance a bit (8 hours on previous input) but lefts memory usage in reasonable boundaries.
But this configuration works only for my server. If you run it on the server with 40 core and 64GB run, you need to tune Runtime.totalMemory > 2GB condition.
So the question is what is the best way to avoid such aggressive memory consuming?
Is it normal practice to run separate processes to do parallel jobs?
Is there any other parallel method in Java (maybe fork/join?) which uses 100% physical resources of processor.
You don't need to call System.gc() explicitly! The JVM will do it automatically when needed, and almost always does it better. You should, however, set the max heap size (-Xmx) to a number that works well.
If your program won't scale further you have some kind of congestion. You can either analyse your program and your java- and system settings and figure out why, or run it as multiple processes. If each process is multi-threaded, then you may get better performance using 5-10 processes instead of 40.
Note that you may get higher performance with more than one thread per core. Fiddle around with 1-8 threads per core and see if throughput increases.
From your description it sounds like you have 500,000 completely independent items of work and that each work item doesn't really need a lot of memory. If that is true, then memory consumption isn't really an issue. As long as each process has enough memory so it doesn't have to gc very often then gc isn't going to affect the total execution time by much. Just make sure you don't have any dangling references to objects you no longer need.
One of the problems here: it is still very hard to understand how many threads, cores, ... are actually available.
My personal suggestion: there are several articles on the java specialist newsletter which do a very deep dive into this subject.
For example this one: http://www.javaspecialists.eu/archive/Issue135.html
or a more recent new, on "the number of available processors": http://www.javaspecialists.eu/archive/Issue220.html

java- how to determine the optimum number of threads for a specific type of processing, in different types of servers

I have a java program which goes to some websites, converts the website's HTML into XML, then runs some xquery commands on the XML, finally stores the result into csv, which is then uploaded into Cloud file storage (like Amazon S3).
Now, I want to split the work into multiple threads so that it is done faster-- but how do I determine the number of threads that is optimum for my work?
I want to determine the number of threads that I should allow, for the different types of Amazon EC2 instances... Is there a library or framework that can help me with this?
Or, do I have to manually run the code on an Amazon EC2 instance, and keep changing the number of threads, and measure the time taken?
Specifically, I want to keep a balance between total time taken to process all threads, versus the number of threads that are allowed to run simultaneously... And if I could clearly see this correlation for different servers with different CPU/RAM capacities that would be great...Any advice/guidance would be appreciated...
The type of work you describe is almost certainly I/O bound -- most of the time is spent waiting for data to be downloaded or uploaded. If so, your goal is simply to make full use of upload / download bandwidth.
If so, the optimal number of threads will be more than the number of physical cores on the machine (which would be the right place to start for a CPU-bound process).
It's hard to say from this info what the optimum number of threads will be as it depends on how much you're downloading and how fast the link is. Try doubling the number of threads until performance starts to suffer.
I think you should profile your app with single thread using JHAT, MAT, etc... and then decide how many thread based on machine config you want to run. It will give you a general idea of how expensive your thread is. You can then run load test (like 10,000 items queued up against 10 threads) to validate the limits that you came up with, and tune accordingly.
To find the number of logical cores available you can use:
int processors = Runtime.getRuntime().availableProcessors();
and create a ThreadPool with that many. See also :
Finding Number of Cores in Java
Java: How to scale threads according to cpu cores?

High CPU, possibly due to context switching?

One of our servers is experiencing a very high CPU load with our application. We've looked at various stats and are having issues finding the source of the problem.
One of the current theories is that there are too many threads involved and that we should try to reduce the number of concurrently executing threads. There's just one main thread pool, with 3000 threads, and a WorkManager working with it (this is Java EE - Glassfish). At any given moment, there are about 620 separate network IO operations that need to be conducted in parallel (use of java.NIO is not an option either). Moreover, there are roughly 100 operations that have no IO involved and are also executed in parallel.
This structure is not efficient and we want to see if it is actually causing damage, or is simply bad practice. Reason being that any change is quite expensive in this system (in terms of man hours) so we need some proof of an issue.
So now we're wondering if context switching of threads is the cause, given there are far more threads than the required concurrent operations. Looking at the logs, we see that on average there are 14 different threads executed in a given second. If we take into account the existence of two CPUs (see below), then it is 7 threads per CPU. This doesn't sound like too much, but we wanted to verify this.
So - can we rule out context switching or too-many-threads as the problem?
General Details:
Java 1.5 (yes, it's old), running on CentOS 5, 64-bit, Linux kernel 2.6.18-128.el5
There is only one single Java process on the machine, nothing else.
Two CPUs, under VMware.
8GB RAM
We don't have the option of running a profiler on the machine.
We don't have the option of upgrading the Java, nor the OS.
UPDATE
As advised below, we've conducted captures of load average (using uptime) and CPU (using vmstat 1 120) on our test server with various loads. We've waited 15 minutes between each load change and its measurements to ensure that the system stabilized around the new load and that the load average numbers are updated:
50% of the production server's workload: http://pastebin.com/GE2kGLkk
34% of the production server's workload: http://pastebin.com/V2PWq8CG
25% of the production server's workload: http://pastebin.com/0pxxK0Fu
CPU usage appears to be reduced as the load reduces, but not on a very drastic level (change from 50% to 25% is not really a 50% reduction in CPU usage). Load average seems uncorrelated with the amount of workload.
There's also a question: given our test server is also a VM, could its CPU measurements be impacted by other VMs running on the same host (making the above measurements useless)?
UPDATE 2
Attaching the snapshot of the threads in three parts (pastebin limitations)
Part 1: http://pastebin.com/DvNzkB5z
Part 2: http://pastebin.com/72sC00rc
Part 3: http://pastebin.com/YTG9hgF5
Seems to me the problem is 100 CPU bound threads more than anything else. 3000 thread pool is basically a red herring, as idle threads don't consume much of anything. The I/O threads are likely sleeping "most" of the time, since I/O is measured on a geologic time scale in terms of computer operations.
You don't mention what the 100 CPU threads are doing, or how long they last, but if you want to slow down a computer, dedicating 100 threads of "run until time slice says stop" will most certainly do it. Because you have 100 "always ready to run", the machine will context switch as fast as the scheduler allows. There will be pretty much zero idle time. Context switching will have impact because you're doing it so often. Since the CPU threads are (likely) consuming most of the CPU time, your I/O "bound" threads are going to be waiting in the run queue longer than they're waiting for I/O. So, even more processes are waiting (the I/O processes just bail out more often as they hit an I/O barrier quickly which idles the process out for the next one).
No doubt there are tweaks here and there to improve efficiency, but 100 CPU threads are 100 CPU threads. Not much you can do there.
I think your constraints are unreasonable. Basically what you are saying is:
1.I can't change anything
2.I can't measure anything
Can you please speculate as to what my problem might be?
The real answer to this is that you need to hook a proper profiler to the application and you need to correlate what you see with CPU usage, Disk/Network I/O, and memory.
Remember the 80/20 rule of performance tuning. 80% will come from tuning your application. You might just have too much load for one VM instance and it could be time to consider solutions for scaling horizontally or vertically by giving more resources to the machine. It could be any one of the 3 billion JVM settings are not inline with your application's execution specifics.
I assume the 3000 thread pool came from the famous more threads = more concurrency = more performance theory. The real answer is a tuning change isn't worth anything unless you measure throughput and response time before/after the change and compared the results.
If you can't profile, I'd recommend taking a thread dump or two and seeing what your threads are doing. Your app doesn't have to stop to do it:
http://docs.oracle.com/javase/6/docs/technotes/guides/visualvm/threads.html
http://java.net/projects/tda/
http://java.sys-con.com/node/1611555
So - can we rule out context switching or too-many-threads as the problem?
I think you concerns over thrashing are warranted. A thread pool with 3000 threads (700+ concurrent operations) on a 2 CPU VMware instance certainly seems like a problem that may be causing context switching overload and performance problems. Limiting the number of threads could give you a performance boost although determining the right number is going to be difficult and probably will use a lot of trial and error.
we need some proof of an issue.
I'm not sure the best way to answer but here are some ideas:
Watch the load average of the VM OS and the JVM. If you are seeing high load values (20+) then this is an indicator that there are too many things in the run queues.
Is there no way to simulate the load in a test environment so you can play with the thread pool numbers? If you run simulated load in a test environment with pool size of X and then run with X/2, you should be able to determine optimal values.
Can you compare high load times of day with lower load times of day? Can you graph number of responses to latency during these times to see if you can see a tipping point in terms of thrashing?
If you can simulate load then make sure you aren't just testing under the "drink from the fire hose" methodology. You need simulated load that you can dial up and down. Start at 10% and slowing increase simulated load while watching throughput and latency. You should be able to see the tipping points by watching for throughput flattening or otherwise deflecting.
Usually, context switching in threads is very cheap computationally, but when it involves this many threads... you just can't know. You say upgrading to Java 1.6 EE is out of the question, but what about some hardware upgrades ? It would probably provide a quick fix and shouldn't be that expensive...
e.g. run a profiler on a similar machine.
try a newer version of Java 6 or 7. (It may not make a difference, in which case don't bother upgrading production)
try Centos 6.x
try not using VMware.
try reducing the number of threads. You only have 8 cores.
You many find all or none of the above options make a difference, but you won't know until you have a system you can test on with a known/repeatable work load.

Categories