I have an application that makes HTTP requests to a site, ant then retrives the responses, inspects them and if the contain specific keywords, writes both the HTTP request and response to an XML file. This application uses a spider to map out all the URLS of a site and then sends request(each URL in the sitemap is fed to a separate thread that sends the request). This way I wont be able to know when all the requests have been sent. At the end of all I request i want to convert the XML file to some other format. So in order to find out when the request have ended I use the following strategy :
I store the time of each request in a varible (when a new request is sent at a time later than the time in the variable, the varible is updated). Also I start a thread to monitor this time, and if the difference in the current time and the time in the varible is more than 1 min, I know that the sending of requests has ceased. I use the following code for this purpose :
class monitorReq implements Runnable{
Thread t;
monitorReq(){
t=new Thread(this);
t.start();
}
public void run(){
while((new Date().getTime()-last_request.getTime()<60000)){
try{
Thread.sleep(30000);//Sleep for 30 secs before checking again
}
catch(IOException e){
e.printStackTrace();
}
}
System.out.println("Last request happened 1 min ago at : "+last_request.toString());
//call method for conversion of file
}
}
Is this approach correct? Or is there a better way in which I can implement the same thing.
Your current approach is not reliable. You will get into race conditions - if the thread is updating the time & the other thread is reading it at the same time. Also it will be difficult to do the processing of requests in multiple threads. You are assuming that task finishes in 60 seconds..
The following are better approaches.
If you know the number of requests you are going to make before hand you can use a CountDownLatch
main() {
int noOfRequests = ..;
final CountDownLatch doneSignal = new CountDownLatch(noOfRequests);
// spawn threads or use an executor service to perform the downloads
for(int i = 0;i<noOfRequests;i++) {
new Thread(new Runnable() {
public void run() {
// perform the download
doneSignal.countDown();
}
}).start();
}
doneSignal.await(); // This will block till all threads are done.
}
If you don't know the number of requests before hand then you can use the executorService to perform the downloads / processing using a thread pool
main() {
ExecutorService executor = Executors.newCachedThreadPool();
while(moreRequests) {
executor.execute(new Runnable() {
public void run() {
// perform processing
}
});
}
// finished submitting all requests for processing. Wait for completion
executor.shutDown();
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.Seconds);
}
General notes:
classes in Java should start with Capital Letters
there seems to be no synchronization between your threads; access to last_request should probably be synchronized
Using System.currentTimeMillis() would save you some objects' creation overhead
swallowing an exception like this is not a good practice
Answer:
Your way of doing it is acceptable. There is not much busy waiting and the idea is as simple as it gets. Which is good.
I would consider changing the wait time to a lower value; there is so little data, that even doing this loop every second will not take too much processing power, and will certainly improve the rection time from you app.
Related
My goal is to run multiple objects concurrently without creating new Thread due to scalability issues. One of the usage would be running a keep-alive Socket connection.
while (true) {
final Socket socket = serverSocket.accept();
final Thread thread = new Thread(new SessionHandler(socket)).start();
// this will become a problem when there are 1000 threads.
// I am looking for alternative to mimic the `start()` of Thread without creating new Thread for each SessionHandler object.
}
For brevity, I will use Printer anology.
What I've tried:
Use CompletableFuture, after checking, it use ForkJoinPool which is a thread pool.
What I think would work:
Actor model. Honestly, the concept is new to me today and I am still figuring out how to run an Object method without blocking the main thread.
main/java/SlowPrinter.java
public class SlowPrinter {
private static final Logger logger = LoggerFactory.getLogger(SlowPrinter.class);
void print(String message) {
try {
Thread.sleep(100);
} catch (InterruptedException ignored) {
}
logger.debug(message);
}
}
main/java/NeverEndingPrinter.java
public class NeverEndingPrinter implements Runnable {
private final SlowPrinter printer;
public NeverEndingPrinter(SlowPrinter printer) {
this.printer = printer;
}
#Override
public void run() {
while (true) {
printer.print(Thread.currentThread().getName());
}
}
}
test/java/NeverEndingPrinterTest.java
#Test
void withThread() {
SlowPrinter slowPrinter = new SlowPrinter();
NeverEndingPrinter neverEndingPrinter = new NeverEndingPrinter(slowPrinter);
Thread thread1 = new Thread(neverEndingPrinter);
Thread thread2 = new Thread(neverEndingPrinter);
thread1.start();
thread2.start();
try {
Thread.sleep(1000);
} catch (InterruptedException ignored) {
}
}
Currently, creating a new Thread is the only solution I know of. However, this became issue when there are 1000 of threads.
The solution that many developers in the past have come up with is the ThreadPool. It avoids the overhead of creating many threads by reusing the same limited set of threads.
It however requires that you split up your work in small parts and you have to link the small parts step by step to execute a flow of work that you would otherwise do in a single method on a separate thread. So that's what has resulted in the CompletableFuture.
The Actor model is a more fancy modelling technique to assign the separate steps in a flow, but they will again be executed on a limited number of threads, usually just 1 or 2 per actor.
For a very nice theoretical explanation of what problems are solved this way, see https://en.wikipedia.org/wiki/Staged_event-driven_architecture
If I look back at your original question, your problem is that you want to receive keep-alive messages from multiple sources, and don't want to use a separate thread for each source.
If you use blocking IO like while (socket.getInputStream().read() != -1) {}, you will always need a thread per connection, because that implementation will sleep the thread while waiting for data, so the thread cannot do anything else in the mean time.
Instead, you really should look into NIO. You would only need 1 selector and 1 thread where you continuously check the selector for incoming messages from any source (without blocking the thread), and use something like a HashMap to keep track of which source is still sending messages.
See also Java socket server without using threads
The NIO API is very low-level, BTW, so using a framework like Netty might be easier to get started.
You're looking for a ScheduledExecutorService.
Create an initial ScheduledExecutorService with a fixed appropriate number of threads, e.g. Executors.newScheduledThreadPool(5) for 5 threads, and then you can schedule a recurring task with e.g. service.scheduleAtFixedRate(task, initialDelay, delayPeriod, timeUnit).
Of course, this will use threads internally, but it doesn't have the problem of thousands of threads that you're concerned about.
sorry for the long edit,
I am trying to download 100k urls and I started to download using executor service as below,
ExecutorService executorService = Executors.newFixedThreadPool(100);
for (int i = 0; i < list.size(); i++) {
try {
Callable callable = new Callable() {
public List<String> call() throws Exception {
//http connection
}
};
Future future = executorService.submit(callable);
but the above method is downloading the data only one url at a time..
and so I tried to create daemon threads (as shown below) and this method created muliple download connections (as expected) ..
for(int i=0; i<10; i++) {
Thread t = new Thread("loadtest " + i);
t.setDaemon(true);
t.start();
}
while(true) {
boolean flag = true;
Set<Thread> threads = Thread.getAllStackTraces().keySet();
for(Thread t : threads) {
if(t.isDaemon() && t.getName().startsWith("loadtest")) {
flag = false;
break;
}
}
if(flag)
break;
Thread.sleep(5000);
}
Can the same method be used for load testing on servers ?
Any other suggestions of how load testing can be done will also be of great help..
Thanks in advance !
I will hazard a guess that your ExecutorService is not working because you are calling get() on the Future instances it returns inside your loop. This mistake will indeed cause your processing to be serialized as if you had only one thread, because another task isn't submitted until the first completes.
If you really need to use Callable, don't get() the result until you are ready to block for some indefinite time as the tasks complete—and they can't complete if they haven't been submitted yet. For downloading URLs, it would be better to use Runnable, where the main thread submits URLs and then forgets about the task; the task can independently complete processing of its URL.
If you produce new tasks quickly, there's a chance you could queue up so many that you run out of memory. In that case, you can use a bounded queue and set up an appropriate rejection handler using ThreadPoolExecutor directly.
A daemon thread is a thread which does not prevent JVM to exits when all other thread finished.
I believe if you want to wait for your main thread until the time daemon threads not finished then I suggest not to use daemon thread as it is supposed not be used for that use case. You can use Thread#join to wait for your main thread.
for(int i=0; i<10; i++) {
Thread t = new Thread("loadtest " + i);
t.setDaemon(true);
t.start();
t.join(); // main or parent thread will wait util the child thread finished
}
I believe in your use case you should use normal thread instead of daemon.
Load testing is not only about "hammering" your server with requests, well-behaved load test needs to represent real user using real browser with all associated stuff like:
headers
cookies
cache
handling embedded resources (images, scripts, styles, fonts)
So I would recommend using a specialized load testing tool which are capable of representing real user as close as possible and automatically taking care of aforementioned points. Also normally the load testing tools allow you to set rendezvous points and provide a lot of metrics and charts so you will be able to see connect time, network latency, throughput, correlate increasing load with increasing response time/number of errors, etc.
inside a Spring web application I have a scheduled task that is called every five minutes.
#Scheduled(fixedDelay = 300000)
public void importDataTask()
{
importData(); //db calls, file manipulations, etc..
}
Usually the task runs smoothly for days, but sometimes happens that the example method importaData()will not terminate, so importDataTask()will not be called again and everything will be blocked until I restart the application.
The question is: is there a feasibile method to be sure that a method will not be indefinitely blocked (waybe waiting for a resource, or something else)?
The question is: is there a feasibile method to be sure that a method
will not be indefinitely blocked (waybe waiting for a resource, or
something else)?
If the scheduling cannot be planned at a precise regular interval, you should maybe not use a fixed delay but use two conditions : delay + last execution done.
You could schedule a task which checks if the two conditions are met and if it the case, you run the important processing. Otherwise, it waits for the next schedule.
In this way, you should not be blocked. You could wait for some time if the task exceeds the fixed delay. If it is a problem because the fixed delay is often exceeded, you should probably not use a fixed delay or so you should increase sensitively it in order that it is less common.
Here an example (writing without editor. Sorry if any mistake) :
private boolean isLastImportDataTaskFinished;
#Scheduled(fixedDelay = 300000)
public void importDataTaskManager(){
if (isLastImportDataTaskFinished()){
new Thread(new ImportantDataProcessing())).start();
}
else{
// log the problem if you want
}
}
private isLastImportDataTaskFinished(){
// to retrieve this information, you can do as you want : use a variable
// in this class or a data in database,file...
// here a simple implementation
return isLastImportDataTaskFinished;
}
Runnable class :
public class ImportantDataProcessing implements Runnable{
public void run(){
importData(); //db calls, file manipulations, etc..
}
}
Comment:
But if I run it as a thread how can I kill it if I find it's exceeding
the time limit since I don't have any reference to it (in the idea of
using a second task to determine the stuck state)?
You can use an ExecutorService (you have a question about it here : How to timeout a thread).
Here a very simple example :
ExecutorService executor = Executors.newSingleThreadExecutor();
Future future = executor.submit(new ImportantDataProcessing());
try {
future.get(100, TimeUnit.SECONDS);
}
catch (InterruptedException e) {
e.printStackTrace();
}
catch (ExecutionException e) {
e.printStackTrace();
}
catch (TimeoutException e) {
// the timeout to handle but other exceptions should be handled :)
e.printStackTrace();
}
executor.shutdown();
If interesting information may be returned by ImportantDataProcessing processing , you can use a task instead of a runnable instance to type the future.
Firstly, sure. There are many feasibile methods to remind you if the process is blocked, such as log/message/email which embed in you code.
Secondly, it is decided by if you want it block or not. If block is not you intention, new thread or timeout may be you choice.
I am using the Java ExecutorService framework to submit callable tasks for execution.
These tasks communicate with a web service and a web service timeout of 5 mins is applied.
However I've seen that in some cases the timeout is being ignored and thread 'hangs' on an API call - hence, I want to cancel all the tasks that take longer than say, 5 mins.
Currently, I have a list of futures and I iterate through them and call future.get until all tasks are complete. Now, I've seen that the future.get overloaded method takes a timeout and throws a timeout when the task doesnt complete in that window. So I thought of an approach where I do a future.get() with timeout and in case of TimeoutException I do a future.cancel(true) to make sure that this task is interrupted.
My main questions
1. Is the get with a timeout the best way to solve this issue?
2. Is there the possibility that I'm waiting with the get call on a task that hasnt yet been placed on the thread pool(isnt an active worker). In that case I may be terminating a thread that, when it starts may actually complete within the required time limit?
Any suggestions would be deeply appreciated.
Is the get with a timeout the best way to solve this issue?
This will not suffice. For instance, if your task is not designed to response to interruption, it will keep on running or be just blocked
Is there the possibility that I'm waiting with the get call on a task that hasnt yet been placed on the thread pool(isnt an active worker). In that case I may be terminating a thread that, when it starts may actually complete within the required time limit?
Yes, You might end up cancelling as task which is never scheduled to run if your thread-pool is not configured properly
Following code snippet could be one of the way you can make your task responsive to interruption when your task contains Non-interruptible Blocking. Also it does not cancel the task which are not scheduled to run. The idea here is to override interrupt method and close running tasks by say closing sockets, database connections etc. This code is not perfect and you need to make changes as per requirements, handle exceptions etc.
class LongRunningTask extends Thread {
private Socket socket;
private volatile AtomicBoolean atomicBoolean;
public LongRunningTask() {
atomicBoolean = new AtomicBoolean(false);
}
#Override
public void interrupt() {
try {
//clean up any resources, close connections etc.
socket.close();
} catch(Throwable e) {
} finally {
atomicBoolean.compareAndSet(true, false);
//set the interupt status of executing thread.
super.interrupt();
}
}
public boolean isRunning() {
return atomicBoolean.get();
}
#Override
public void run() {
atomicBoolean.compareAndSet(false, true);
//any long running task that might hang..for instance
try {
socket = new Socket("0.0.0.0", 5000);
socket.getInputStream().read();
} catch (UnknownHostException e) {
} catch (IOException e) {
} finally {
}
}
}
//your task caller thread
//map of futures and tasks
Map<Future, LongRunningTask> map = new HashMap<Future, LongRunningTask>();
ArrayList<Future> list = new ArrayList<Future>();
int noOfSubmittedTasks = 0;
for(int i = 0; i < 6; i++) {
LongRunningTask task = new LongRunningTask();
Future f = execService.submit(task);
map.put(f, task);
list.add(f);
noOfSubmittedTasks++;
}
while(noOfSubmittedTasks > 0) {
for(int i=0;i < list.size();i++) {
Future f = list.get(i);
LongRunningTask task = map.get(f);
if (task.isRunning()) {
/*
* This ensures that you process only those tasks which are run once
*/
try {
f.get(5, TimeUnit.MINUTES);
noOfSubmittedTasks--;
} catch (InterruptedException e) {
} catch (ExecutionException e) {
} catch (TimeoutException e) {
//this will call the overridden interrupt method
f.cancel(true);
noOfSubmittedTasks--;
}
}
}
}
execService.shutdown();
Is the get with a timeout the best way to solve this issue?
Yes it is perfectly fine to get(timeout) on a Future object, if the task that the future points to is already executed it will return immediately. If the task is yet to be executed or is being executed then it will wait until timeout and is a good practice.
Is there the possibility that I'm waiting with the get call on a task
that hasnt yet been placed on the thread pool(isnt an active worker)
You get Future object only when you place a task on the thread pool so it is not possible to call get() on a task without placing it on thread pool. Yes there is a possibility that the task has not yet been taken by a free worker.
The approach that you are talking about is ok. But most importantly before setting a threshold on the timeout you need to know what is the perfect value of thread pool size and timiout for your environment. Do a stress testing which will reveal whether the no of worker threads that you configured as part of Threadpool is fine or not. And this may even reduce the timeout value. So this test is most important i feel.
Timeout on get is perfectly fine but you should add to cancel the task if it throws TimeoutException. And if you do the above test properly and set your thread pool size and timeout value to ideal than you may not even need to cancel tasks externally (but you can have this as backup). And yes sometimes in canceling a task you may end up canceling a task which is not yet picked up by the Executor.
You can of course cancel a Task by using
task.cancel(true)
It is perfectly legal. But this will interrupt the thread if it is "RUNNING".
If the thread is waiting to acquire an intrinsic lock then the "interruption" request has no effect other than setting the thread's interrupted status. In this case you cannot do anything to stop it. For the interruption to happen, the thread should come out from the "blocked" state by acquiring the lock it was waiting for (which may take more than 5 mins). This is a limitation of using "intrinsic locking".
However you can use explicit lock classes to solve this problem. You can use "lockInterruptibly" method of the "Lock" interface to achieve this. "lockInterruptibly" will allow the thread to try to acquire a lock while remaining responsive to the interruption. Here is a small example to achieve that:
public void workWithExplicitLock()throws InterruptedException{
Lock lock = new ReentrantLock();
lock.lockInterruptibly()();
try {
// work with shared object state
} finally {
lock.unlock();
}
}
What is the best way to handle RejectedExecutionException while using a ThreadPoolExecutor in Java?
I want to ensure that the task submitted should not be overlooked and should surely get executed. As of now there are no hard real time requirements to get the task done.
One of the things I thought could be done was waiting in a loop till I know that there is space in the runnable queue, and then go on and add it to the queue.
Would be glad if people can share their experiences.
Adding the possible solution I though of:
while(executor.getQueue().remainingCapacity <= 0){
// keep looping
Thread.sleep(100);
};
//if the loop exits ,indicates that we have space in the queue hence
//go ahead and add to the queue
executor.execute(new ThreadInstance(params));
I would change the behaviour of your queue. e.g.
public class MyBlockingQueue<E> extends ArrayBlockingQueue<E> {
private final long timeoutMS;
public MyBlockingQueue(int capacity, long timeoutMS) {
super(capacity);
this.timeoutMS = timeoutMS;
}
#Override
public boolean offer(E e) {
try {
return super.offer(e, timeoutMS, TimeUnit.MILLISECONDS);
} catch (InterruptedException e1) {
Thread.currentThread().interrupt();
return false;
}
}
}
This will wait for the queue to drain before giving up.
If you have constrained your thread pool to only allow a certain number of concurrent threads (generally a good thing), then the application needs to somehow push-back on the calling code, so when you receive a RejectedExecutionException from the ThreadPoolExecutor you need to indicate this to the caller and the caller will need to handle the retry.
An analogous situation is a web server under heavy load. A client connects, the web server should return a 503 - Service Unavailable (generally a temporary condition) and the client decides what to do about it.