I have hundreds of files to process. I do each file one at a time and it takes 30 minutes.
I'm thinking I can do this processing in 10 simultaneous threads, 10 files at a time, and I might be able to do it in 3 minutes instead of 30.
My question is, what is the "correct" way to manage my 10 threads? And when one is done, create a new one to a max number of 10.
This is what I have so far ... is this the "correct" way to do it?
public class ThreadTest1 {
public static int idCounter = 0;
public class MyThread extends Thread {
private int id;
public MyThread() {
this.id = idCounter++;
}
public void run() {
// this run method represents the long-running file processing
System.out.println("I'm thread '"+this.id+"' and I'm going to sleep for 5 seconds!");
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("I'm thread '"+this.id+"' and I'm done sleeping!");
}
}
public void go() {
int MAX_NUM_THREADS = 10;
List<MyThread> threads = new ArrayList<MyThread>();
// this for loop represents the 200 files that need to be processed
for (int i=0; i<200; i++) {
// if we've reached the max num of threads ...
while (threads.size() == MAX_NUM_THREADS) {
// loop through the threads until we find a dead one and remove it
for (MyThread t : threads) {
if (!t.isAlive()) {
threads.remove(t);
break;
}
}
}
// add new thread
MyThread t = new MyThread();
threads.add(t);
t.start();
}
}
public static void main(String[] args) {
new ThreadTest1().go();
}
}
You can use ExecutorService to manage you threads.
And you can add while loop to thread run method to execute file processing task repeatedly.
Also you can read about BlockingQueue usage. I think it will fit perfectly to allocate new files (tasks) between threads.
I would suggest using Camel's File component if you are open to it. The component will handle all the issues with concurrency to ensure that multiple threads don't try to process the same file. The biggest challenge with making your code multi-threaded is making sure the threads don't interact. Let a framework take care of this for you.
Example:
from("file://incoming?maxMessagesPerPoll=1&idempotent=true&moveFailed=failed&move=processed&readLock=none")
.threads(10).process()
Related
I am trying to learn multi-threads, and parallel execution in Java. I wrote example code like this:
public class MemoryManagement1 {
public static int counter1 = 0;
public static int counter2 = 0;
public static final Object lock1= new Object();
public static final Object lock2= new Object();
public static void increment1() {
synchronized(lock1) {
counter1 ++;
}
}
public static void increment2() {
synchronized(lock2) {
counter2 ++;
}
}
public static void processes() {
Thread thread1 = new Thread(new Runnable() {
#Override
public void run() {
for (int i = 0; i < 4; i++) {
increment1();
}
}
});
Thread thread2 = new Thread(new Runnable() {
#Override
public void run() {
for (int i = 0; i < 4; i++) {
increment2();
}
}
});
thread1.start();
thread2.start();
try {
thread1.join();
thread2.join();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Counter value is :" + counter1);
System.out.println("Counter value is :" + counter2);
}
public static void main(String[] args) {
processes();
}
}
The code is running properly, but how can I know that the code is running according to time-slicing or whether it is running with parallel execution. I have a CPU with 4 cores. As I understand it, the program should be run with parallel execution, but I am not sure.
The code is running properly, but how can I know that the code is
running according to time-slicing or whether it is running with
parallel execution.
A complete answer to this question would have to cover several factors, but I will be concise and focus mainly on the two most relevant points (IMO) to this question. For simplicity, let us assume that whenever possible each thread (created by the application) will be assigned to a different core.
First, it depends on the number of cores of the hardware that the application is being executed on, and how many threads (created by the application) are running simultaneously. For instance, if the hardware only has a single core or if the application creates more threads than the number of cores available, then some of those threads will inevitably not be executing truly in parallel (i.e., will be mapped to the same core).
Second, it depends if the threads executing their work synchronize with each other or not. In your code, two threads are created, synchronizing using a different object, and since your machine has 4 cores, in theory, each thread is running in parallel to each other.
It gets more complex than that because you can have parts of your code that are executed in parallel, and other parts that are executed sequentially by the threads involved. For instance, if the increment1 and increment2 methods were synchronizing on the same object, then those methods would not be executed in parallel.
Your program is indeed running in parallel execution. In this particular example however you don't need locks in your code, it would run perfectly well without them.
So I'm writing code that will parse through multiple text files in a folder, gather information on them, and deposit that information in two static List instance variables. The order of which the information is deposited does not matter since I will end up sorting it anyways. But for some reason, increase the number of threads does not impact the speed. Here's my run method and the portion of my main method that utilizes multithreading.
public void run() {
parseFiles();
}
public static void main(String[] args) {
while (filesLeft != 0) {
Thread t = new Thread(new fileParser());
t.start();
try {
t.join();
}
catch (InterruptedException e) {
System.out.println("error.");
}
}
If extra information is required, I basically have a static instance variable as an array of the files I need to go through, as well as a constant being the number of threads (which is manually changed for testing purposes). If I were to have, say, 4 threads and 8 files, each call to parseFiles goes through the next 2 files of the array, the indices being monitored by a static instance variable. If I had, say, 4 threads and 9 files, the first thread parses 3 files, the following parse 2, with a statement something along the lines of filesToParse = Math.ceil(filesLeft / threadsLeft), the latter two variables within the ceiling function being static as well.
Is there any error in my code or should I simply be testing larger text files with more words to see a decrease in speed with added threads (currently I have 5 text files each with 20+ paragraphs and I get around 60-70ms).
Wrote a list piece of code that might be useful
public static void main(String[] args) {
long startTime = System.nanoTime();
final List<Runnable> tasks = generateTasks(NUM_TASKS);
List<Thread> threadPool = new LinkedList<>();
for(int i = 0; i < NUM_THREADS; i++) {
Thread thread = new Thread(() -> {
Runnable task = null;
while ((task = getTask(tasks)) != null) {
task.run();
}
});
threadPool.add(thread);
thread.start();
}
for(Thread thread: threadPool) {
try {
thread.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
long runTimeMs = (System.nanoTime() - startTime) / 1000000;
System.out.println(String.format("Ran %d tasks with %d threads in %d ms", NUM_TASKS, NUM_THREADS, runTimeMs));
}
private static Runnable getTask(List<Runnable> tasks) {
synchronized (tasks) {
return tasks.isEmpty() ? null : tasks.remove(0);
}
}
My program has an arraylist of websites which I do I/O with image processing, scrape data from sites and update/insert into database. Right now it is slow because all of the I/O being done. I would like to speed this up by allowing my program to run with threads. Nothing is ever removed from the list and every website in the list is separate from each other so to me it seems okay to have instances looping through the list at the same time to speed this up.
Let's say my list is 10 websites, right now of course it's looping through position 0 through 9 until my program is done processing for all websites.
And let's say I want to have 3 threads looping through this list of 10 websites at once doing all the I/O and database updates in their own separate space at the same time but using the same list.
website.get(0) // thread1
website.get(1) // thread2
website.get(2) // thread3
Then say if thread2 reaches the end of the loop it first it comes back and works on the next position
website.get(3) // thread2
Then thread3 completes and gets the next position
website.get(4) // thread3
and then thread1 finally completes and works on the next position
website.get(5) // thread1
etc until it's done. Is this easy to set up? Is there somewhere I can find a good example of it being done? I've looked online to try to find somewhere else talking about my scenario but I haven't found it.
In my app, I use ExecutorService like this, and it works well:
Main code:
ExecutorService pool = Executors.newFixedThreadPool(3); //number of concurrent threads
for (String name : website) { //Your ArrayList
pool.submit(new DownloadTask(name, toPath));
}
pool.shutdown();
pool.awaitTermination(5, TimeUnit.SECONDS); //Wait for all the threads to finish, adjust as needed.
The actual class where you do the work:
private static class DownloadTask implements Runnable {
private String name;
private final String toPath;
public DownloadTask(String name, String toPath) {
this.name = name;
this.toPath = toPath;
}
#Override
public void run() {
//Do your parsing / downloading / etc. here.
}
}
Some cautions:
If you are using a database, you have to ensure that you don't have two threads writing to that database at the same time.
See here for more info.
As mentioned in other comments/answer you just need a thread pool executor with fixed size (say 3 as per your example) which runs 3 threads which iterate over the same list without picking up duplicate websites.
So apart from thread pool executor, you probably need to also need to correctly work out the next index in each thread to pick the element from that list in such a way that thread does not pick up same element from list and also not miss any element.
Hence i think you can use BlockingQueue instead of list which eliminates the index calculation part and guarantees that the element is correctly picked from the collection.
public class WebsitesHandler {
public static void main(String[] args) {
BlockingQueue<Object> websites = new LinkedBlockingQueue<>();
ExecutorService executorService = Executors.newFixedThreadPool(3);
Worker[] workers = new Worker[3];
for (int i = 0; i < workers.length; i++) {
workers[i] = new Worker(websites);
}
try {
executorService.invokeAll(Arrays.asList(workers));
} catch (InterruptedException e) {
e.printStackTrace();
}
}
private static class Worker implements Callable {
private BlockingQueue<Object> websites;
public Worker(BlockingQueue<Object> websites) {
this.websites = websites;
}
public String call() {
try {
Object website;
while ((website = websites.poll(1, TimeUnit.SECONDS)) != null) {
// execute the task
}
} catch (InterruptedException e) {
e.printStackTrace();
}
return "done";
}
}
}
I think you need to update yourself with latest version of java i.e Java8
And study about Streams API,That will definitely solve your problem
I want to start max 40 http requests each second and after 1 second, I want it to run another 40 from its own queue(like threadpooltaskexecutor's blocking queue). I am looking for an executor or thread pool implementation for this requirement.
Any recommendations?
Thx
Ali
EDIT: Fix rate is not efficient for the obvious reasons. As the queue items start one by one, the ones on the back of the queue will be just started but ones that has been started for a while may be finished.
Extra EDIT: The problem is to call only 40 request in a second, not have max 40 active. It can be 80 at other second but in 1 second there should only 40 newly created connections.
One way to do this is to use another architecture, it will make the process that much easiser.
1) Create a Thread class that implements the runnable.
2) It takes as parameters a list<>() of http requests that you want to make
3) Make the run() function loop the entire list (size 40)
4) Let the thread live for one second.
Here is a sample example:
class MyClass extends Thread
private ArrayList<...> theList;
public MyClass(ArrayList<..> theList){
this.theList = theList;
}
public void run(){
//Here, you simply want to loop for the entier list (max 40)
for(Req r: theList){
r.sendRequest()
)
}
public statc void main(String args[]){
//Create an instance of your thread:
MyClass t = new MyClass(ReqList<..>());
//Now that you have your thread, simply do the following:
while(true){
t = new MyClass( (insert your new list));
t.start();
try{
Thread.sleep(1000);
}catch(Exception e){
}
)
}
}
And there you have it
First define a class that implements Callable which will do your thread's treatment :
class MyClass implements Callable<String>
{
/**
* Consider this as a new Thread.
*/
#Override
public String call()
{
//Treatment...
return "OK"; //Return whatever the thread's result you want to be then you can access it and do the desired treatment.
}
}
Next step is to create an ExecutorService in my example, a Thread pool and throw in some tasks.
int nbThreadToStart = 40;
ExecutorService executor = Executors.newFixedThreadPool(/* Your thread pool limit */);
List<Future<String>> allTasks = new ArrayList<Future<String>>(/* Specify a number here if you want to limit your thread pool */);
for(int i = 0; i < 10; i++)//Number of iteration you want
{
for(int i = 0; i < nbThreadToStart; i++)
{
try
{
allTasks.add(executor.submit(new MyClass()));
}
catch(Exception e)
{
e.printStackTrace();
}
}
try
{
Thread.sleep(1000);
}
catch(Exception e)
{
e.printStackTrace();
}
}
executor.shutdown();
//You can then access all your thread(Tasks) and see if they terminated and even add a timeout :
try
{
for(Future<String> task : allTasks)
task.get(60, TimeUnit.SECONDS);//Timeout of 1 seconds. The get will return what you specified in the call method.
}
catch (TimeOutException te)
{
...
}
catch(InterruptedException ie)
{
...
}
catch(ExecutionException ee)
{
...
}
I'm not sure what you really want, but I think you should handle multi-threading with a thread pool specially if you're planning on receiving a lot of requests to avoid any undesired memory leak etc.
If my example is not clear enough, note that there is many other methods offered by ExexutorService,Future etc. that are very usefull when dealing with Thread.
Check this out :
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executors.html
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executor.html
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html
That's it for my recommandations.
I'm having a-bit of trouble with threads in java. Basically Im creating an array of threads and starting them. the point of the program is to simulate a race, total the time for each competitor ( i.e. each thread ) and pick the winner.
The competitor moves one space, waits ( i.e. thread sleeps for a random period of time between 5 and 6 seconds ) and then continues. The threads don't complete in the order that they started as expected.
Now for the problem. I can get the total time it takes for a thread to complete; what I want is to store all the times from the threads into a single array and be able to calculate the fastest time.
To do this should I place the array in the main.class file? Would I be right in assuming so because if it was placed in the Thread class it wouldn't work. Or should I create a third class?
I'm alittle confused :/
It's fine to declare it in the method where you invoke the threads, with a few notes:
each thread should know its index in the array. Perhaps you should pass this in constructor
then you have three options for filling the array
the array should be final, so that it can be used within anonymous classes
the array can be passed to each thread
the threads should notify a listener when they're done, which in turn will increment an array.
consider using Java 1.5 Executors framework for submitting Runnables, rather than working directly with threads.
EDIT: The solution below assumes you need the times only after all competitors have finished the race.
You can use a structure that looks like below, (inside your main class). Typically you want to add a lot of you own stuff; this is the main outline.
Note that concurrency is not an issue at all here because you get the value from the MyRunnable instance once its thread has finished running.
Note that using a separate thread for each competitor is probably not really necessary with a modified approach, but that would be a different issue.
public static void main(String[] args) {
MyRunnable[] runnables = new MyRunnable[NUM_THREADS];
Thread[] threads = new Thread[NUM_THREADS];
for (int i = 0; i < NUM_THREADS; i++) {
runnables[i] = new MyRunnable();
threads[i] = new Thread(runnables[i]);
}
// start threads
for (Thread thread : threads) {
thread.start();
}
// wait for threads
for (Thread thread : threads) {
try {
thread.join();
} catch (InterruptedException e) {
// ignored
}
}
// get the times you calculated for each thread
for (int i = 0; i < NUM_THREADS; i++) {
int timeSpent = runnables[i].getTimeSpent();
// do something with the time spent
}
}
static class MyRunnable implements Runnable {
private int timeSpent;
public MyRunnable(...) {
// initialize
}
public void run() {
// whatever the thread should do
// finally set the time
timeSpent = ...;
}
public int getTimeSpent() {
return timeSpent;
}
}