Im reading files from SQS in an unbounded stream. As I read each file I want to submit it to a second queue for processing. I can simultaneously process several files so I put these into threads and want to block further reads from the queue when all threads are in use.
To that end I used this:
ExecutorService executorService =
new ThreadPoolExecutor(
maxThreads, // core thread pool size
maxThreads, // maximum thread pool size
1, // time to wait before resizing pool
TimeUnit.MINUTES,
new ArrayBlockingQueue<Runnable>(maxThreads, true),
new ThreadPoolExecutor.CallerRunsPolicy());
Where maxThreads = 2.
Files are read in blocks of ten and processed as such:
for (Message msg : resp.getMessages()) {
Gson g = new Gson();
MessageBody messageBody = g.fromJson(msg.getBody(), MessageBody.class);
MessageRecords messageRecords = g.fromJson(messageBody.getMessage(), MessageRecords.class);
List<MessageRecords.Record> records = messageRecords.getRecords();
executorService.submit(new Runnable() {
#Override
public void run() {
... do some work based on file type
}
});
Watching the thread count Im seeing it climb steadily until the system runs out of memory, closes the job with a unable to create native thread exception. After this the VM(AWS) doesn't accept SSH logins until it gets stopped / restarted.
It seems like there must be a step where a given thread is released / cleaned up but Im not seeing where it should happen.
What am I doing wrong?
Edit:
yes, run() does finish and exit
nothing else interacts with these threads. the run() method gets a file, looks at the type and calls a fn() based on the type. Function parses the file and returns. Then run() is finished.
The issue is my use of ThreadPoolExecutor. Specifically where it appear(s/ed) in the app process. I was creating a new thread pool for each loop through a block of SQS messages and not closing it afterwards. Moving this creation outside the loop and reusing the same block fixes the problem.
So - a big hairy UFU.
Related
My goal is to run multiple objects concurrently without creating new Thread due to scalability issues. One of the usage would be running a keep-alive Socket connection.
while (true) {
final Socket socket = serverSocket.accept();
final Thread thread = new Thread(new SessionHandler(socket)).start();
// this will become a problem when there are 1000 threads.
// I am looking for alternative to mimic the `start()` of Thread without creating new Thread for each SessionHandler object.
}
For brevity, I will use Printer anology.
What I've tried:
Use CompletableFuture, after checking, it use ForkJoinPool which is a thread pool.
What I think would work:
Actor model. Honestly, the concept is new to me today and I am still figuring out how to run an Object method without blocking the main thread.
main/java/SlowPrinter.java
public class SlowPrinter {
private static final Logger logger = LoggerFactory.getLogger(SlowPrinter.class);
void print(String message) {
try {
Thread.sleep(100);
} catch (InterruptedException ignored) {
}
logger.debug(message);
}
}
main/java/NeverEndingPrinter.java
public class NeverEndingPrinter implements Runnable {
private final SlowPrinter printer;
public NeverEndingPrinter(SlowPrinter printer) {
this.printer = printer;
}
#Override
public void run() {
while (true) {
printer.print(Thread.currentThread().getName());
}
}
}
test/java/NeverEndingPrinterTest.java
#Test
void withThread() {
SlowPrinter slowPrinter = new SlowPrinter();
NeverEndingPrinter neverEndingPrinter = new NeverEndingPrinter(slowPrinter);
Thread thread1 = new Thread(neverEndingPrinter);
Thread thread2 = new Thread(neverEndingPrinter);
thread1.start();
thread2.start();
try {
Thread.sleep(1000);
} catch (InterruptedException ignored) {
}
}
Currently, creating a new Thread is the only solution I know of. However, this became issue when there are 1000 of threads.
The solution that many developers in the past have come up with is the ThreadPool. It avoids the overhead of creating many threads by reusing the same limited set of threads.
It however requires that you split up your work in small parts and you have to link the small parts step by step to execute a flow of work that you would otherwise do in a single method on a separate thread. So that's what has resulted in the CompletableFuture.
The Actor model is a more fancy modelling technique to assign the separate steps in a flow, but they will again be executed on a limited number of threads, usually just 1 or 2 per actor.
For a very nice theoretical explanation of what problems are solved this way, see https://en.wikipedia.org/wiki/Staged_event-driven_architecture
If I look back at your original question, your problem is that you want to receive keep-alive messages from multiple sources, and don't want to use a separate thread for each source.
If you use blocking IO like while (socket.getInputStream().read() != -1) {}, you will always need a thread per connection, because that implementation will sleep the thread while waiting for data, so the thread cannot do anything else in the mean time.
Instead, you really should look into NIO. You would only need 1 selector and 1 thread where you continuously check the selector for incoming messages from any source (without blocking the thread), and use something like a HashMap to keep track of which source is still sending messages.
See also Java socket server without using threads
The NIO API is very low-level, BTW, so using a framework like Netty might be easier to get started.
You're looking for a ScheduledExecutorService.
Create an initial ScheduledExecutorService with a fixed appropriate number of threads, e.g. Executors.newScheduledThreadPool(5) for 5 threads, and then you can schedule a recurring task with e.g. service.scheduleAtFixedRate(task, initialDelay, delayPeriod, timeUnit).
Of course, this will use threads internally, but it doesn't have the problem of thousands of threads that you're concerned about.
sorry for the long edit,
I am trying to download 100k urls and I started to download using executor service as below,
ExecutorService executorService = Executors.newFixedThreadPool(100);
for (int i = 0; i < list.size(); i++) {
try {
Callable callable = new Callable() {
public List<String> call() throws Exception {
//http connection
}
};
Future future = executorService.submit(callable);
but the above method is downloading the data only one url at a time..
and so I tried to create daemon threads (as shown below) and this method created muliple download connections (as expected) ..
for(int i=0; i<10; i++) {
Thread t = new Thread("loadtest " + i);
t.setDaemon(true);
t.start();
}
while(true) {
boolean flag = true;
Set<Thread> threads = Thread.getAllStackTraces().keySet();
for(Thread t : threads) {
if(t.isDaemon() && t.getName().startsWith("loadtest")) {
flag = false;
break;
}
}
if(flag)
break;
Thread.sleep(5000);
}
Can the same method be used for load testing on servers ?
Any other suggestions of how load testing can be done will also be of great help..
Thanks in advance !
I will hazard a guess that your ExecutorService is not working because you are calling get() on the Future instances it returns inside your loop. This mistake will indeed cause your processing to be serialized as if you had only one thread, because another task isn't submitted until the first completes.
If you really need to use Callable, don't get() the result until you are ready to block for some indefinite time as the tasks complete—and they can't complete if they haven't been submitted yet. For downloading URLs, it would be better to use Runnable, where the main thread submits URLs and then forgets about the task; the task can independently complete processing of its URL.
If you produce new tasks quickly, there's a chance you could queue up so many that you run out of memory. In that case, you can use a bounded queue and set up an appropriate rejection handler using ThreadPoolExecutor directly.
A daemon thread is a thread which does not prevent JVM to exits when all other thread finished.
I believe if you want to wait for your main thread until the time daemon threads not finished then I suggest not to use daemon thread as it is supposed not be used for that use case. You can use Thread#join to wait for your main thread.
for(int i=0; i<10; i++) {
Thread t = new Thread("loadtest " + i);
t.setDaemon(true);
t.start();
t.join(); // main or parent thread will wait util the child thread finished
}
I believe in your use case you should use normal thread instead of daemon.
Load testing is not only about "hammering" your server with requests, well-behaved load test needs to represent real user using real browser with all associated stuff like:
headers
cookies
cache
handling embedded resources (images, scripts, styles, fonts)
So I would recommend using a specialized load testing tool which are capable of representing real user as close as possible and automatically taking care of aforementioned points. Also normally the load testing tools allow you to set rendezvous points and provide a lot of metrics and charts so you will be able to see connect time, network latency, throughput, correlate increasing load with increasing response time/number of errors, etc.
I get this error in the log file while thread is running, i don't know where this error occurs since the threads didn't stop and process data with no issues and only my problem that this error appears multiple times in the log file
java.util.concurrent.RejectedExecutionException: Task
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask#419a9977
rejected from
java.util.concurrent.ScheduledThreadPoolExecutor#2522cdb9[Terminated,
pool size = 0, active threads = 0, queued tasks = 0, completed tasks =
2123929]
I did some research, i found that in some places i shutdown the task, but that did not happen at all.
Without looking at the code we can't really inform you more about the problem. If you look at the exception then it clearly states that the threads have been terminated and their active count is zero. It seems even after shutting down the executor you are trying to process more code using executors. Are you trying to add more task after the call executor.shutdown()
As per docs, New tasks submitted in method execute(Runnable) will be rejected when the Executor has been shut down, and also when the Executor uses finite bounds for both maximum threads and work queue capacity, and is saturated. In either case, the execute method invokes the RejectedExecutionHandler.rejectedExecution(Runnable, ThreadPoolExecutor) method of its RejectedExecutionHandler.
Look at the doc here: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadPoolExecutor.html
Old question, but I had the issue and the comment of #lambad saves my day. I had this piece of code:
ttlExecutorService.schedule(new Runnable() {
public void run() {
...
...
...
}
}, 1, TimeUnit.MINUTES);
ttlExecutorService.shutdown();
I removed the shutdown call and exception was no longer thrown
I have an application that makes HTTP requests to a site, ant then retrives the responses, inspects them and if the contain specific keywords, writes both the HTTP request and response to an XML file. This application uses a spider to map out all the URLS of a site and then sends request(each URL in the sitemap is fed to a separate thread that sends the request). This way I wont be able to know when all the requests have been sent. At the end of all I request i want to convert the XML file to some other format. So in order to find out when the request have ended I use the following strategy :
I store the time of each request in a varible (when a new request is sent at a time later than the time in the variable, the varible is updated). Also I start a thread to monitor this time, and if the difference in the current time and the time in the varible is more than 1 min, I know that the sending of requests has ceased. I use the following code for this purpose :
class monitorReq implements Runnable{
Thread t;
monitorReq(){
t=new Thread(this);
t.start();
}
public void run(){
while((new Date().getTime()-last_request.getTime()<60000)){
try{
Thread.sleep(30000);//Sleep for 30 secs before checking again
}
catch(IOException e){
e.printStackTrace();
}
}
System.out.println("Last request happened 1 min ago at : "+last_request.toString());
//call method for conversion of file
}
}
Is this approach correct? Or is there a better way in which I can implement the same thing.
Your current approach is not reliable. You will get into race conditions - if the thread is updating the time & the other thread is reading it at the same time. Also it will be difficult to do the processing of requests in multiple threads. You are assuming that task finishes in 60 seconds..
The following are better approaches.
If you know the number of requests you are going to make before hand you can use a CountDownLatch
main() {
int noOfRequests = ..;
final CountDownLatch doneSignal = new CountDownLatch(noOfRequests);
// spawn threads or use an executor service to perform the downloads
for(int i = 0;i<noOfRequests;i++) {
new Thread(new Runnable() {
public void run() {
// perform the download
doneSignal.countDown();
}
}).start();
}
doneSignal.await(); // This will block till all threads are done.
}
If you don't know the number of requests before hand then you can use the executorService to perform the downloads / processing using a thread pool
main() {
ExecutorService executor = Executors.newCachedThreadPool();
while(moreRequests) {
executor.execute(new Runnable() {
public void run() {
// perform processing
}
});
}
// finished submitting all requests for processing. Wait for completion
executor.shutDown();
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.Seconds);
}
General notes:
classes in Java should start with Capital Letters
there seems to be no synchronization between your threads; access to last_request should probably be synchronized
Using System.currentTimeMillis() would save you some objects' creation overhead
swallowing an exception like this is not a good practice
Answer:
Your way of doing it is acceptable. There is not much busy waiting and the idea is as simple as it gets. Which is good.
I would consider changing the wait time to a lower value; there is so little data, that even doing this loop every second will not take too much processing power, and will certainly improve the rection time from you app.
I am using java taillistener to monitor my log files.Whenever log files are updated,it will print the log message.when motoring one or two log files,it working fine.But when trying to monitoring more file(say 10 files),there is no messages displayed in console even logs are updated in log file.My code is given below.
ScheduledThreadPoolExecutor logMonitorThreadPoolExec;
if (listOfFiles[i].isFile())
{
files = listOfFiles[i].getName();
File pcounter_log = new File(files);
Tailer logMessages = new Tailer(pcounter_log, new FileListener(files,element.getLogPattern()),
5000, true);
logMonitorThreadPoolExec.scheduleWithFixedDelay(logMessages, 5, 20,
TimeUnit.SECONDS);
}
public class FileListener extends TailerListenerAdapter {
private final String fileName;
public FileListener(String fileName, ArrayList<String> pattern) {
this.fileName = fileName;
}
public void handle(String line) {
System.out.println(fileName+"<---->"+line);
}
}
Can u please help me to handle this?
I think that the problem is that you are using the Tailer the wrong way.
You are trying to use the Tailer using the thread pool of an executor service. But a Tailer has the property that it won't exit from it's run() method until something externally calls Tailer.stop(). And in your code, that's not going to happen.
Worse still, you are using a ScheduledThreadPoolExecutor, and telling it to start a new Tailer thread every 20 seconds!
So what will happen is that the first N Tailer runs scheduled will each grab one of the executor service's threads ... and hang onto it forever. When all threads are in use, the Executor will wait for one of the threads to finish ... and that won't happen.
The solution is to run each Tailer instance in its own dedicated Thread. You shouldn't try to use a Thread from a finite thread pool because you'll exhaust the pool. And you shouldn't try to use an executor service, for basically the same reason.
If using dedicated threads doesn't work, I'm out of ideas. You'll need to take a look at the Tailer code yourself and/or run your application under a debugger so that you can see what the Tailer threads are actually doing.