My program has an arraylist of websites which I do I/O with image processing, scrape data from sites and update/insert into database. Right now it is slow because all of the I/O being done. I would like to speed this up by allowing my program to run with threads. Nothing is ever removed from the list and every website in the list is separate from each other so to me it seems okay to have instances looping through the list at the same time to speed this up.
Let's say my list is 10 websites, right now of course it's looping through position 0 through 9 until my program is done processing for all websites.
And let's say I want to have 3 threads looping through this list of 10 websites at once doing all the I/O and database updates in their own separate space at the same time but using the same list.
website.get(0) // thread1
website.get(1) // thread2
website.get(2) // thread3
Then say if thread2 reaches the end of the loop it first it comes back and works on the next position
website.get(3) // thread2
Then thread3 completes and gets the next position
website.get(4) // thread3
and then thread1 finally completes and works on the next position
website.get(5) // thread1
etc until it's done. Is this easy to set up? Is there somewhere I can find a good example of it being done? I've looked online to try to find somewhere else talking about my scenario but I haven't found it.
In my app, I use ExecutorService like this, and it works well:
Main code:
ExecutorService pool = Executors.newFixedThreadPool(3); //number of concurrent threads
for (String name : website) { //Your ArrayList
pool.submit(new DownloadTask(name, toPath));
}
pool.shutdown();
pool.awaitTermination(5, TimeUnit.SECONDS); //Wait for all the threads to finish, adjust as needed.
The actual class where you do the work:
private static class DownloadTask implements Runnable {
private String name;
private final String toPath;
public DownloadTask(String name, String toPath) {
this.name = name;
this.toPath = toPath;
}
#Override
public void run() {
//Do your parsing / downloading / etc. here.
}
}
Some cautions:
If you are using a database, you have to ensure that you don't have two threads writing to that database at the same time.
See here for more info.
As mentioned in other comments/answer you just need a thread pool executor with fixed size (say 3 as per your example) which runs 3 threads which iterate over the same list without picking up duplicate websites.
So apart from thread pool executor, you probably need to also need to correctly work out the next index in each thread to pick the element from that list in such a way that thread does not pick up same element from list and also not miss any element.
Hence i think you can use BlockingQueue instead of list which eliminates the index calculation part and guarantees that the element is correctly picked from the collection.
public class WebsitesHandler {
public static void main(String[] args) {
BlockingQueue<Object> websites = new LinkedBlockingQueue<>();
ExecutorService executorService = Executors.newFixedThreadPool(3);
Worker[] workers = new Worker[3];
for (int i = 0; i < workers.length; i++) {
workers[i] = new Worker(websites);
}
try {
executorService.invokeAll(Arrays.asList(workers));
} catch (InterruptedException e) {
e.printStackTrace();
}
}
private static class Worker implements Callable {
private BlockingQueue<Object> websites;
public Worker(BlockingQueue<Object> websites) {
this.websites = websites;
}
public String call() {
try {
Object website;
while ((website = websites.poll(1, TimeUnit.SECONDS)) != null) {
// execute the task
}
} catch (InterruptedException e) {
e.printStackTrace();
}
return "done";
}
}
}
I think you need to update yourself with latest version of java i.e Java8
And study about Streams API,That will definitely solve your problem
Related
I have a static Array, arr, whose elements are getting squarred and stored back again using the 'squarring' method. I want two threads to simultaneously modify the array. Each thread works on half of the array.
public class SimplerMultiExample {
private static int[] arr = new int[10];
public static void squarring(int start, int end)
{
for(int i=start; i<end; i++)
{
arr[i]*=arr[i];
System.out.println("here "+Thread.currentThread().getName());
}
}
private static Runnable doubleRun = new Runnable() {
#Override
public void run() {
if(Integer.parseInt(Thread.currentThread().getName())==1)
squarring(0,arr.length/2); //Thread named "1" is operaing on
//the 1st half of the array.
else
squarring(arr.length/2,arr.length);
}
};
public static void main(String[] args){
Thread doubleOne = new Thread(doubleRun);
doubleOne.setName("1");
Thread doubleTwo = new Thread(doubleRun);
doubleTwo.setName("2");
doubleOne.start();
doubleTwo.start();
}
}
The sysout in the 'squarring' method tells me that the threads are going into the method serially, that is, one of threads finishes before the other one accesses it. As a result, one of the threads finishes early whereas the other ones takes considerably longer to complete. I have tested the code with 1 million elements. Please advice on what I can do to ensure that the threads operate in parallel.
P.S - I am using a dual core system.
You don't have to program this from scratch:
Arrays.parallelSetAll(arr, x -> x * x);
parallelSetAll creates a parallel stream which does all the work for you:
Set all elements of the specified array, in parallel, using the provided generator function to compute each element.
If you'd like to know how to control the number of threads used for parallel processing, checkout this question.
I recommend you add the following code to the end of your main method to ensure they each finish their work:
try {
doubleOne.join(); //Waits for this thread to die.
doubleTwo.join(); //Waits for this thread to die.
} catch (InterruptedException e) {
e.printStackTrace();
}
You are creating two threads that could be executed in parallel provided that the scheduler chooses to interleave them. However, the scheduler is not guaranteed to interleave the execution. Your code works in parallel on my system (...sometimes.. but other times, the threads execute in series).
Since you are not waiting for the threads to complete (using the join method), it is less likely that you would observe the interleaving (since you only observe part of the program's execution).
I have hundreds of files to process. I do each file one at a time and it takes 30 minutes.
I'm thinking I can do this processing in 10 simultaneous threads, 10 files at a time, and I might be able to do it in 3 minutes instead of 30.
My question is, what is the "correct" way to manage my 10 threads? And when one is done, create a new one to a max number of 10.
This is what I have so far ... is this the "correct" way to do it?
public class ThreadTest1 {
public static int idCounter = 0;
public class MyThread extends Thread {
private int id;
public MyThread() {
this.id = idCounter++;
}
public void run() {
// this run method represents the long-running file processing
System.out.println("I'm thread '"+this.id+"' and I'm going to sleep for 5 seconds!");
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("I'm thread '"+this.id+"' and I'm done sleeping!");
}
}
public void go() {
int MAX_NUM_THREADS = 10;
List<MyThread> threads = new ArrayList<MyThread>();
// this for loop represents the 200 files that need to be processed
for (int i=0; i<200; i++) {
// if we've reached the max num of threads ...
while (threads.size() == MAX_NUM_THREADS) {
// loop through the threads until we find a dead one and remove it
for (MyThread t : threads) {
if (!t.isAlive()) {
threads.remove(t);
break;
}
}
}
// add new thread
MyThread t = new MyThread();
threads.add(t);
t.start();
}
}
public static void main(String[] args) {
new ThreadTest1().go();
}
}
You can use ExecutorService to manage you threads.
And you can add while loop to thread run method to execute file processing task repeatedly.
Also you can read about BlockingQueue usage. I think it will fit perfectly to allocate new files (tasks) between threads.
I would suggest using Camel's File component if you are open to it. The component will handle all the issues with concurrency to ensure that multiple threads don't try to process the same file. The biggest challenge with making your code multi-threaded is making sure the threads don't interact. Let a framework take care of this for you.
Example:
from("file://incoming?maxMessagesPerPoll=1&idempotent=true&moveFailed=failed&move=processed&readLock=none")
.threads(10).process()
I want to start max 40 http requests each second and after 1 second, I want it to run another 40 from its own queue(like threadpooltaskexecutor's blocking queue). I am looking for an executor or thread pool implementation for this requirement.
Any recommendations?
Thx
Ali
EDIT: Fix rate is not efficient for the obvious reasons. As the queue items start one by one, the ones on the back of the queue will be just started but ones that has been started for a while may be finished.
Extra EDIT: The problem is to call only 40 request in a second, not have max 40 active. It can be 80 at other second but in 1 second there should only 40 newly created connections.
One way to do this is to use another architecture, it will make the process that much easiser.
1) Create a Thread class that implements the runnable.
2) It takes as parameters a list<>() of http requests that you want to make
3) Make the run() function loop the entire list (size 40)
4) Let the thread live for one second.
Here is a sample example:
class MyClass extends Thread
private ArrayList<...> theList;
public MyClass(ArrayList<..> theList){
this.theList = theList;
}
public void run(){
//Here, you simply want to loop for the entier list (max 40)
for(Req r: theList){
r.sendRequest()
)
}
public statc void main(String args[]){
//Create an instance of your thread:
MyClass t = new MyClass(ReqList<..>());
//Now that you have your thread, simply do the following:
while(true){
t = new MyClass( (insert your new list));
t.start();
try{
Thread.sleep(1000);
}catch(Exception e){
}
)
}
}
And there you have it
First define a class that implements Callable which will do your thread's treatment :
class MyClass implements Callable<String>
{
/**
* Consider this as a new Thread.
*/
#Override
public String call()
{
//Treatment...
return "OK"; //Return whatever the thread's result you want to be then you can access it and do the desired treatment.
}
}
Next step is to create an ExecutorService in my example, a Thread pool and throw in some tasks.
int nbThreadToStart = 40;
ExecutorService executor = Executors.newFixedThreadPool(/* Your thread pool limit */);
List<Future<String>> allTasks = new ArrayList<Future<String>>(/* Specify a number here if you want to limit your thread pool */);
for(int i = 0; i < 10; i++)//Number of iteration you want
{
for(int i = 0; i < nbThreadToStart; i++)
{
try
{
allTasks.add(executor.submit(new MyClass()));
}
catch(Exception e)
{
e.printStackTrace();
}
}
try
{
Thread.sleep(1000);
}
catch(Exception e)
{
e.printStackTrace();
}
}
executor.shutdown();
//You can then access all your thread(Tasks) and see if they terminated and even add a timeout :
try
{
for(Future<String> task : allTasks)
task.get(60, TimeUnit.SECONDS);//Timeout of 1 seconds. The get will return what you specified in the call method.
}
catch (TimeOutException te)
{
...
}
catch(InterruptedException ie)
{
...
}
catch(ExecutionException ee)
{
...
}
I'm not sure what you really want, but I think you should handle multi-threading with a thread pool specially if you're planning on receiving a lot of requests to avoid any undesired memory leak etc.
If my example is not clear enough, note that there is many other methods offered by ExexutorService,Future etc. that are very usefull when dealing with Thread.
Check this out :
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executors.html
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executor.html
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html
That's it for my recommandations.
I'm having a-bit of trouble with threads in java. Basically Im creating an array of threads and starting them. the point of the program is to simulate a race, total the time for each competitor ( i.e. each thread ) and pick the winner.
The competitor moves one space, waits ( i.e. thread sleeps for a random period of time between 5 and 6 seconds ) and then continues. The threads don't complete in the order that they started as expected.
Now for the problem. I can get the total time it takes for a thread to complete; what I want is to store all the times from the threads into a single array and be able to calculate the fastest time.
To do this should I place the array in the main.class file? Would I be right in assuming so because if it was placed in the Thread class it wouldn't work. Or should I create a third class?
I'm alittle confused :/
It's fine to declare it in the method where you invoke the threads, with a few notes:
each thread should know its index in the array. Perhaps you should pass this in constructor
then you have three options for filling the array
the array should be final, so that it can be used within anonymous classes
the array can be passed to each thread
the threads should notify a listener when they're done, which in turn will increment an array.
consider using Java 1.5 Executors framework for submitting Runnables, rather than working directly with threads.
EDIT: The solution below assumes you need the times only after all competitors have finished the race.
You can use a structure that looks like below, (inside your main class). Typically you want to add a lot of you own stuff; this is the main outline.
Note that concurrency is not an issue at all here because you get the value from the MyRunnable instance once its thread has finished running.
Note that using a separate thread for each competitor is probably not really necessary with a modified approach, but that would be a different issue.
public static void main(String[] args) {
MyRunnable[] runnables = new MyRunnable[NUM_THREADS];
Thread[] threads = new Thread[NUM_THREADS];
for (int i = 0; i < NUM_THREADS; i++) {
runnables[i] = new MyRunnable();
threads[i] = new Thread(runnables[i]);
}
// start threads
for (Thread thread : threads) {
thread.start();
}
// wait for threads
for (Thread thread : threads) {
try {
thread.join();
} catch (InterruptedException e) {
// ignored
}
}
// get the times you calculated for each thread
for (int i = 0; i < NUM_THREADS; i++) {
int timeSpent = runnables[i].getTimeSpent();
// do something with the time spent
}
}
static class MyRunnable implements Runnable {
private int timeSpent;
public MyRunnable(...) {
// initialize
}
public void run() {
// whatever the thread should do
// finally set the time
timeSpent = ...;
}
public int getTimeSpent() {
return timeSpent;
}
}
I am writing a multithreaded parser.
Parser class is as follows.
public class Parser extends HTMLEditorKit.ParserCallback implements Runnable {
private static List<Station> itemList = Collections.synchronizedList(new ArrayList<Item>());
private boolean h2Tag = false;
private int count;
private static int threadCount = 0;
public static List<Item> parse() {
for (int i = 1; i <= 1000; i++) { //1000 of the same type of pages that need to parse
while (threadCount == 20) { //limit the number of simultaneous threads
try {
Thread.sleep(50);
} catch (InterruptedException ex) {
ex.printStackTrace();
}
}
Thread thread = new Thread(new Parser());
thread.setName(Integer.toString(i));
threadCount++; //increase the number of working threads
thread.start();
}
return itemList;
}
public void run() {
//Here is a piece of code responsible for creating links based on
//the thread name and passed as a parameter remained i,
//connection, start parsing, etc.
//In general, nothing special. Therefore, I won't paste it here.
threadCount--; //reduce the number of running threads when current stops
}
private static void addItem(Item item) {
itenList.add(item);
}
//This method retrieves the necessary information after the H2 tag is detected
#Override
public void handleText(char[] data, int pos) {
if (h2Tag) {
String itemName = new String(data).trim();
//Item - the item on which we receive information from a Web page
Item item = new Item();
item.setName(itemName);
item.setId(count);
addItem(item);
//Display information about an item in the console
System.out.println(count + " = " + itemName);
}
}
#Override
public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
if (HTML.Tag.H2 == t) {
h2Tag = true;
}
}
#Override
public void handleEndTag(HTML.Tag t, int pos) {
if (HTML.Tag.H2 == t) {
h2Tag = false;
}
}
}
From another class parser runs as follows:
List<Item> list = Parser.parse();
All is good, but there is a problem. At the end of parsing in the final list "List itemList" contains 980 elements onto, instead of 1000. But in the console there is all of 1000 elements (items). That is, some threads for some reason did not call in the handleText method the addItem method.
I already tried to change the type of itemList to ArrayList, CopyOnWriteArrayList, Vector. Makes the method addItem synchronized, changed its call on the synchronized block. All this only changes the number of elements a little, but the final thousand can not be obtained.
I also tried to parse a smaller number of pages (ten). As the result the list is empty, but in the console all 10.
If I remove multi-threading, then everything works fine, but, of course, slowly. That's not good.
If decrease the number of concurrent threads, the number of items in the list is close to the desired 1000, if increase - a little distanced from 1000. That is, I think, there is a struggle for the ability to record to the list. But then why are synchronization not working?
What's the problem?
After your parse() call returns, all of your 1000 Threads have been started, but it is not guaranteed that they are finished. In fact, they aren't that's the problem you see. I would heavily recommend not write this by yourself but use the tools provided for this kind of job by the SDK.
The documentation Thread Pools and the ThreadPoolExecutor are e.g. a good starting point. Again, don't implement this yourself if you are not absolutely sure you have too, because writing such multi-threading code is pure pain.
Your code should look something like this:
ExecutorService executor = Executors.newFixedThreadPool(20);
List<Future<?>> futures = new ArrayList<Future<?>>(1000);
for (int i = 0; i < 1000; i++) {
futures.add(executor.submit(new Runnable() {...}));
}
for (Future<?> f : futures) {
f.get();
}
There is no problem with the code, it is working as you have coded. the problem is with the last iteration. rest all iterations will work properly, but during the last iteration which is from 980 to 1000, the threads are created, but the main process, does not waits for the other thread to complete, and then return the list. therefore you will be getting some odd number between 980 to 1000, if you are working with 20 threads at a time.
Now you can try adding Thread.wait(50), before returning the list, in that case your main thread will wait, some time, and may be by the time, other threads might finish the processing.
or you can use some syncronization API from java. Instead of Thread.wait(), use CountDownLatch, this will help you to wait for the threads to complete the processing, and then you can create new threads.