Java Guava Ratelimiter used by Swing Worker, slow execution - java

I have an app which is using swing workers in combination with guavas ratelimiter. I have set the ratelimiter to 18 permits per second but my data gets loaded way slower than this. Am I doing something wrong here, or do I understand the basics wrong (coming from nodejs)?
I thought the iteration that is making the API calls will execute 18 times per second using this code. But it seems like it is way slower, as if it is always waiting for one iteration to completely finish before executing the next. From my understanding, with the ratelimiter it should be 18 per second or at least, no more.
#Override
protected Boolean doInBackground() throws Exception {
tickers = rc.getAll24HrPriceStatistics();
progressBar.setMaximum(tickers.size());
for(int i=0; i< tickers.size();i++){
Market market = new Market(screener, tickers.get(i).getSymbol(),timeframe);
rateLimiter.acquire();
market.initializeCandles(); //this executes a 3rd party API call
table.addMarket(market);
publish(i);
}
return true;
}
EDIT: When I leave out the ratelimiter it has no effect in speed of execution so my guess is, that these tasks are not executed in parallel but rather one after the other. This makes me question how to make the swingworker execute these concurrently by using the rate limiters permits per second.

Related

How to compute approximated values concurrently in Java?

This is probably a question with many possible answers, but I'm asking for the best design, rather than "how can this be done at all".
Let's assume we are implementing a program with a UI that computes Pi. I can hit a "Start" button to start the computation and a "Stop" button to abort the computation, giving me a message box with the highest precision value of Pi computed so far.
I guess the straight forward approach would be starting a Runnable in a new Thread. The runnable computes Pi, and stores the current value in a shared variable, both threads have access to. "Stop" would abort the Thread, and display the shared variable.
I have a feeling this could be implemented more elegantly, though, but I'm not sure how. Maybe using a CompletableFuture?
I'd rather solve this without adding any new libraries to my project, but if you know a library that supports this particularly well, please leave it in the comments.
Obviously, computing Pi will never finish. It would be great though, if the solution also supports e.g. computing the best move in a game of chess. Which will finish, given enough time, but usually has to be aborted, returning the best move so far.
Referring to your examples of computing Pi or computing the best moves in chess approximately, you approximation algorithm has be iterative in nature. Like random sampling for Pi and MCMC for chess. This lets me think of two appraoches.
1. Using a threadsafe flag
Cou can use AtomicBoolean which is a threadsafe boolean variable. You need to pass it to your Runnable and make it check its state while computing the approximation. At the same time you button listener which stops the computation is able to set the variable.
2. Computing small chunks
The iterative nature of the algorithm makes it possible to split the computation and later aggregate it again. E.g you compute 1000 iterations, you can split it in chunks of 200 iterations compute these 5 chunks and aggregate the result.
I would now suggest to use an ExecutorCompletionService and a TimerTask. The idea is to compute a small amount of iterations, which take only a short amount of time and repellingly "refill" the Executor with new Runnables using the TimerTask. Lets say computing 5 runnables would take 1 second your timer task would put 5 Runnables into the Executor every 1 second. When you hit the stop button you would stop spawning and just wait for the pending tasks finish collect their results and have an result.
Ofcourse you also need a variable which tells the TimerTask to stop ,after calling the shutdown methof the the completion service, but this one has not to be threadsafe. The additional benefit of this approach is that you computation is concurrent and that you can fully utilize any CPU easily just be spawning more Runnables. Doing this concurrently allows you to compute more in lesser time and obtain better approximations.
Your problem is how to implement a stoppable task that still delivers a result. Approximating values is a good example but can be ignored for the solution.
A FutureTask for example wouldn't work because the contract of those is that they decide themselves when they are done and they can only either have a result or be cancelled.
A shared (e.g. volatile) variable sounds reasonable but has it's drawbacks. When updated regularly in a tight loop you might observe worse performance than using a local variable and reading the state of a shared object is only safe when the object is e.g. immutable or one can guarantee otherwise that reading and writing happen in the correct order.
You can also build something with a result-delivery BlockingQueue where the computing thread puts the current result (or even regular updates to the result) once interruption is requested.
But the best solution is probably a (shared) CompletableFuture. Sort of a single result-item queue but it has nicer semantics for reporting exceptions.
Example:
CompletableFuture<Integer> sharedFuture = new CompletableFuture<>();
Thread computing = new Thread(() -> {
int value = 1;
try {
while (!Thread.currentThread().isInterrupted() &&
!sharedFuture.isDone()) { // check could be omitted
value = value * 32 + 7;
}
sharedFuture.complete(value);
} catch (Throwable t) {
sharedFuture.completeExceptionally(t);
}
});
computing.start();
try {
Thread.sleep((long) (5000 * Math.random()));
} catch (InterruptedException ignored) {
}
computing.interrupt();
System.out.println(sharedFuture.get());
http://ideone.com/8bpEGV
Its not really important how you execute that task. Instead of above Thread you can also use an ExecutorService and then cancel the Future instead of interrupting the thread.

Java threads without affecting performance

Long story short; I've written a program that contains an infinite loop, in which a function is run continuously, and must run as quickly as is possible.
However, whilst this function completes in a microsecond time scale, I need to spawn another thread that will take considerably longer to run, but it must not affect the previous thread.
Hopefully this example will help explain things:
while (updateGUI == true) { //So, forever until terminated
final String tableContents = parser.readTable(location, header);
if (tableContents.length() == 0) {//No table there, nothing to do
} else {
SwingUtilities.invokeLater(new Runnable() {
#Override
public void run() {
Thread.currentThread().setPriority(Thread.MAX_PRIORITY);
//updateTable updates a JTable
updateTable(tableContents, TableModel);
TableColumnModel tcm = guiTable.getColumnModel();
}
});
}
***New thread is needed here!
}
So what I need is for the readTable function to run an infinite number of times, however I then need to start a second thread that will also run an infinite number of times, however it will take milliseconds/seconds to complete, as it has to perform some file I/O and can take a bit of time to complete.
I've played around with extending the Thread class, and using the Executors.newCacheThreadPool to try spawning a new thread. However, anything I do causes the readTable function to slow down, and results in the table not being updated correctly, as it cannot read the data fast enough.
Chances are I need to redesign the way this loop runs, or possible just start two new threads and put the infinite looping within them instead.
The reason for it being designed this way was due to the fact that once the updateTable function runs, it returns a string that is used to update a JTable, which (as far as I know), must be done on Java's Main Dispatch Thread, as that is where the GUI's table was created.
If anyone has any suggestions I'd greatly appreciate them.
Thanks
As you are updating a JTable, SwingWorker will be convenient. In this case, one worker can coexist with another, as suggested here.
You have to be very careful to avoid overloading your machine. You long running task need to be made independent of you thread which must be fast. You also need to put a cap on how many of these are running at once. I would put a cap of one to start with.
Also you screen can only update so fast, and you can only see the screen updating so fast. I would limit the number of updates per second to 20 to start with.
BTW Setting the priority only helps if your machine is overloaded. Your goal should be to ensure it is not overloaded in the first place and then the priority shouldn't matter.
It's very hard to guess what's going on here, but you said "results in the table not being updated correctly, as it cannot read the data fast enough". If you really mean the correctness of the code is affected by the timing not being fast enough, then your code is not thread safe and you need to use proper synchronization.
Correctness must not depend on timing, as timing of thread execution is not deterministic on standard JVMs.
Also, do not fiddle with thread priorities. Unless you are a concurrency guru trying to do something very unusual, you don't need to do this and it may make things confusing and/or break.
So if you want your "infinite" looping thread to have max priority, why are you setting priority to MAX for EDT insted of you "most precious one"?
Thread.currentThread().setPriority(Thread.MAX_PRIORITY);
//updateTable updates a JTable
updateTable(tableContents, TableModel);
TableColumnModel tcm = guiTable.getColumnModel();
In this piece of code current thread will be and EDT, or EDT spawned one. Why not moving that line before intering whileloop?

Why Thread.sleep is bad to use

Apologies for this repeated question but I haven't found any satisfactory answers yet. Most of the question had their own specific use case:
Java - alternative to thread.sleep
Is there any better or alternative way to skip/avoid using Thread.sleep(1000) in Java?
My question is for the very generic use case. Wait for a condition to complete. Do some operation. Check for a condition. If the condition is not true, wait for some time and again do the same operation.
For e.g. Consider a method that creates a DynamoDB table by calling its createAPI table. DynamoDB table takes some time to become active so that method would call its DescribeTable API to poll for status at regular intervals until some time(let's say 5 mins - deviation due to thread scheduling is acceptable). Returns true if the table becomes active in 5 mins else throws exception.
Here is pseudo code:
public void createDynamoDBTable(String name) {
//call create table API to initiate table creation
//wait for table to become active
long endTime = System.currentTimeMillis() + MAX_WAIT_TIME_FOR_TABLE_CREATE;
while(System.currentTimeMillis() < endTime) {
boolean status = //call DescribeTable API to get status;
if(status) {
//status is now true, return
return
} else {
try {
Thread.sleep(10*1000);
} catch(InterruptedException e) {
}
}
}
throw new RuntimeException("Table still not created");
}
I understand that by using Thread.sleep blocks the current thread, thereby consuming resources. but in a fairly mid size application, is one thread a big concern?
I read somewhere that use ScheduledThreadPoolExecutor and do this status polling there. But again, we would have to initialize this pool with at least 1 thread where runnable method to do the polling would run.
Any suggestions on why using Thread.sleep is said to be such a bad idea and what are the alternative options for achieving same as above.
http://msmvps.com/blogs/peterritchie/archive/2007/04/26/thread-sleep-is-a-sign-of-a-poorly-designed-program.aspx
It's fine to use Thread.sleep in that situation. The reason people discourage Thread.sleep is because it's frequently used in an ill attempt to fix a race condition, used where notification based synchronization is a much better choice etc.
In this case, AFAIK you don't have an option but poll because the API doesn't provide you with notifications. I can also see it's a infrequent operation because presumably you are not going to create thousand tables.
Therefore, I find it fine to use Thread.sleep here. As you said, spawning a separate thread when you are going to block the current thread anyways seems to complicate things without merit.
Yes, one should try to avoid usage of Thread.sleep(x) but it shouldn't be totally forgotten:
Why it should be avoided
It doesn't release the lock
It doesn't gurantee that the execution will start after sleeping time (So it may keep waiting forever - obviously a rare case)
If we mistakenly put a foreground processing thread on sleep then we wouldn't be able to close that application till x milliseconds.
We now full loaded with new concurrency package for specific problems (like design patterns (ofcourse not exactly), why to use Thread.sleep(x) then.
Where to use Thread.sleep(x):
For providing delays in background running threads
And few others.

Balancing multiple queues

I suspect this is really easy but I’m unsure if there’s a naïve way of doing it in Java. Here’s my problem, I have two scripts for processing data and both have the same inputs/outputs except one is written for the single CPU and the other is for GPUs. The work comes from a queue server and I’m trying to write a program that sends the data to either the CPU or GPU script depending on which one is free.
I do not understand how to do this.
I know with executorservice I can specify how many threads I want to keep running but not sure how to balance between two different ones. I have 2 GPU’s and 8 CPU cores on the system and thought I could have threadexecutorservice keep 2 GPU and 8 CPU processes running but unsure how to balance between them since the GPU will be done a lot quicker than the CPU tasks.
Any suggestions on how to approach this? Should I create two queues and keep pooling them to see which one is less busy? or is there a way to just put all the work units(all the same) into one queue and have the GPU or CPU process take from the same queue as they are free?
UPDATE: just to clarify. the CPU/GPU programs are outside the scope of the program I'm making, they are simply scripts that I call via two different method. I guess the simplified version of what I'm asking is if two methods can take work from the same queue?
Can two methods take work from the same queue?
Yes, but you should use a BlockingQueue to save yourself some synchronization heartache.
Basically, one option would be to have a producer which places tasks into the queue via BlockingQueue.offer. Then design your CPU/GPU threads to call BlockingQueue.take and perform work on whatever they receive.
For example:
main (...) {
BlockingQueue<Task> queue = new LinkedBlockingQueue<>();
for (int i=0;i<CPUs;i++) {
new CPUThread(queue).start();
}
for (int i=0;i<GPUs;i++) {
new GPUThread(queue).start();
}
for (/*all data*/) {
queue.offer(task);
}
}
class CPUThread {
public void run() {
while(/*some condition*/) {
Task task = queue.take();
//do task work
}
}
}
//etc...
Obviously there is more than one way to do it, usually simplest is the best. I would suggest threadpools, one with 2 threads for CPU tasks, second with 8 threads will run GPU tasks. Your work unit manager can submit work to the pool that has idle threads at the moment (I would recommend synchronizing that block of code). Standard Java ThreadPoolExecutor has getActiveCount() method you can use for it, see
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ThreadPoolExecutor.html#getActiveCount().
Use Runnables like this:
CPUGPURunnable implements Runnable {
run() {
if ( Thread.currentThread() instance of CPUGPUThread) {
CPUGPUThread t = Thread.currentThread();
if ( t.isGPU())
runGPU();
else
runCPU();
}
}
}
CPUGPUThreads is a Thread subclass that knows if it runs in CPU or GPU mode, using a flag. Have a ThreadFactory for ThreadPoolExecutors that creates either a CPU of GPU thread. Set up a ThreadPoolExecutor with two workers. Make sure the Threadfactory creates a CPU and then a GPU thread instance.
I suppose you have two objects that represents two GPUs, with methods like boolean isFree() and void execute(Runnable). Then you should start 8 threads which in a loop take next job from the queue, put it in a free GPU, if any, otherwise execute the job itself.

Understanding Threads + Asynchronous

So I have a program that I made that needs to send a lot (like 10,000+) of GET requests to a URL and I need it to be as fast as possible. When I first created the program I just put the connections into a for loop but it was really slow because it would have to wait for each connection to complete before continuing. I wanted to make it faster so I tried using threads and it made it somewhat faster but I am still not satisfied.
I'm guessing the correct way to go about this and making it really fast is using an asynchronous connection and connecting to all of the URLs. Is this the right approach?
Also, I have been trying to understand threads and how they work but I can't seem to get it. The computer I am on has an Intel Core i7-3610QM quad-core processor. According to Intel's website for the specifications for this processor, it has 8 threads. Does this mean I can create 8 threads in a Java application and they will all run concurrently? Any more than 8 and there will be no speed increase?
What exactly does the number represent next to "Threads" in the task manager under the "Performance" tab? Currently, my task manager is showing "Threads" as over 1,000. Why is it this number and how can it even go past 8 if that's all my processor supports?
I also noticed that when I tried my program with 500 threads as a test, the number in the task manager increased by 500 but it had the same speed as if I set it to use 8 threads instead. So if the number is increasing according to the number of threads I am using in my Java application, then why is the speed the same?
Also, I have tried doing a small test with threads in Java but the output doesn't make sense to me.
Here is my Test class:
import java.text.SimpleDateFormat;
import java.util.Date;
public class Test {
private static int numThreads = 3;
private static int numLoops = 100000;
private static SimpleDateFormat dateFormat = new SimpleDateFormat("[hh:mm:ss] ");
public static void main(String[] args) throws Exception {
for (int i=1; i<=numThreads; i++) {
final int threadNum = i;
new Thread(new Runnable() {
public void run() {
System.out.println(dateFormat.format(new Date()) + "Start of thread: " + threadNum);
for (int i=0; i<numLoops; i++)
for (int j=0; j<numLoops; j++);
System.out.println(dateFormat.format(new Date()) + "End of thread: " + threadNum);
}
}).start();
Thread.sleep(2000);
}
}
}
This produces an output such as:
[09:48:51] Start of thread: 1
[09:48:53] Start of thread: 2
[09:48:55] Start of thread: 3
[09:48:55] End of thread: 3
[09:48:56] End of thread: 1
[09:48:58] End of thread: 2
Why does the third thread start and end right away while the first and second take 5 seconds each? If I add more that 3 threads, the same thing happens for all threads above 2.
Sorry if this was a long read, I had a lot of questions.
Thanks in advance.
Your processor has 8 cores, not threads. This does in fact mean that only 8 things can be running at any given moment. That doesn't mean that you are limited to only 8 threads however.
When a thread is synchronously opening a connection to a URL it will often sleep while it waits for the remote server to get back to it. While that thread is sleeping other threads can be doing work. If you have 500 threads and all 500 are sleeping then you aren't using any of the cores of your CPU.
On the flip side, if you have 500 threads and all 500 threads want to do something then they can't all run at once. To handle this scenario there is a special tool. Processors (or more likely the operating system or some combination of the two) have a scheduler which determines which threads get to be actively running on the processor at any given time. There are many different rules and sometimes random activity that controls how these schedulers work. This may explain why in the above example thread 3 always seems to finish first. Perhaps the scheduler is preferring thread 3 because it was the most recent thread to be scheduled by the main thread, it can be impossible to predict the behavior sometimes.
Now to answer your question regarding performance. If opening a connection never involved a sleep then it wouldn't matter if you were handling things synchronously or asynchronously you would not be able to get any performance gain above 8 threads. In reality, a lot of the time involved in opening a connection is spent sleeping. The difference between asynchronous and synchronous is how to handle that time spent sleeping. Theoretically you should be able to get nearly equal performance between the two.
With a multi-threaded model you simply create more threads than there are cores. When the threads hit a sleep they let the other threads do work. This can sometimes be easier to handle because you don't have to write any scheduling or interaction between the threads.
With an asynchronous model you only create a single thread per core. If that thread needs to sleep then it doesn't sleep but actually has to have code to handle switching to the next connection. For example, assume there are three steps in opening a connection (A,B,C):
while (!connectionsList.isEmpty()) {
for(Connection connection : connectionsList) {
if connection.getState() == READY_FOR_A {
connection.stepA();
//this method should return immediately and the connection
//should go into the waiting state for some time before going
//into the READY_FOR_B state
}
if connection.getState() == READY_FOR_B {
connection.stepB();
//same immediate return behavior as above
}
if connection.getState() == READY_FOR_C {
connection.stepC();
//same immediate return behavior as above
}
if connection.getState() == WAITING {
//Do nothing, skip over
}
if connection.getState() == FINISHED {
connectionsList.remove(connection);
}
}
}
Notice that at no point does the thread sleep so there is no point in having more threads than you have cores. Ultimately, whether to go with a synchronous approach or an asynchronous approach is a matter of personal preference. Only at absolute extremes will there be performance differences between the two and you will need to spend a long time profiling to get to the point where that is the bottleneck in your application.
It sounds like you're creating a lot of threads and not getting any performance gain. There could be a number of reasons for this.
It's possible that your establishing a connection isn't actually sleeping in which case I wouldn't expect to see a performance gain past 8 threads. I don't think this is likely.
It's possible that all of the threads are using some common shared resource. In this case the other threads can't work because the sleeping thread has the shared resource. Is there any object that all of the threads share? Does this object have any synchronized methods?
It's possible that you have your own synchronization. This can create the issue mentioned above.
It's possible that each thread has to do some kind of setup/allocation work that is defeating the benefit you are gaining by using multiple threads.
If I were you I would use a tool like JVisualVM to profile your application when running with some smallish number of threads (20). JVisualVM has a nice colored thread graph which will show when threads are running, blocking, or sleeping. This will help you understand the thread/core relationship as you should see that the number of running threads is less than the number of cores you have. In addition if you see a lot of blocked threads then that can help lead you to your bottleneck (if you see a lot of blocked threads use JVisualVM to create a thread dump at that point in time and see what the threads are blocked on).
Some concepts:
You can have many threads in the system, but only some of them (max 8 in your case) will be "scheduled" on the CPU at any point of time. So, you cannot get more performance than 8 threads running in parallel. In fact the performance will probably go down as you increase the number of threads, because of the work involved in creating, destroying and managing threads.
Threads can be in different states : http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Thread.State.html
Out of those states, the RUNNABLE threads stand to get a slice of CPU time. Operating System decides assignment of CPU time to threads. In a regular system with 1000's of threads, it can be completely unpredictable when a certain thread will get CPU time and how long it will be on CPU.
About the problem you are solving:
You seem to have figured out the correct solution - making parallel asynchronous network requests. However, practically speaking starting 10000+ threads and that many network connections, at the same time, may be a strain on the system resources and it may just not work. This post has many suggestions for asynchronous I/O using Java. (Tip: Don't just look at the accepted answer)
This solution is more specific to the general problem of trying to make 10k requests as fast as possible. I would suggest that you abandon the Java HTTP libraries and use Apache's HttpClient instead. They have several suggestions for maximizing performance which may be useful. I have heard the Apache HttpClient library is just faster in general as well, lighter weight and less overhead.

Categories