How can multithreading help increase performance in this situation? - java

I have a piece of code like this:
while(){
x = jdbc_readOperation();
y = getTokens(x);
jdbc_insertOperation(y);
}
public List<String> getTokens(String divText){
List<String> tokenList = new ArrayList<String>();
Matcher subMatcher = Pattern.compile("\\[[^\\]]*]").matcher(divText);
while (subMatcher.find()) {
String token = subMatcher.group();
tokenList.add(token);
}
return tokenList;
}
What I know is using multithreading can save time when one thread is get blocked by I/O or network. In this synchronous operations every step have to wait for its previous step get finished. What I want here is to maximize cpu utilization on getTokens().
My first thought is put getTokens() in the run method of a class, and create multiple threads. But I think it will not work since it seems not able to get performance benefit by having multiple threads on pure computation operations.
Is adoption of multithreading going to help increase performance in this case? If so, how can I do that?

It will depend on the pace that jdbc_readOperation() produces data to be processed in comparison with the pace that getTokens(x) processes the data. Knowing that will help you figure out if multi-threading is going to help you.
You could try something like this (just for you to get the idea):
int workToBeDoneQueueSize = 1000;
int workDoneQueueSize = 1000;
BlockingQueue<String> workToBeDone = new LinkedBlockingQueue<>(workToBeDoneQueueSize);
BlockingQueue<String> workDone = new LinkedBlockingQueue<>(workDoneQueueSize);
new Thread(() -> {
try {
while (true) {
workToBeDone.put(jdbc_readOperation());
}
} catch (InterruptedException e) {
e.printStackTrace();
// handle InterruptedException here
}
}).start();
int numOfWorkerThreads = 5; // just an example
for (int i = 0; i < numOfWorkerThreads; i++) {
new Thread(() -> {
try {
while (true) {
workDone.put(getTokens(workToBeDone.take()));
}
} catch (InterruptedException e) {
e.printStackTrace();
// handle InterruptedException here
}
}).start();
}
new Thread(() -> {
// you could improve this by making a batch operation
try {
while (true) {
jdbc_insertOperation(workDone.take());
}
} catch (InterruptedException e) {
e.printStackTrace();
// handle InterruptedException here
}
}).start();
Or you could learn how to use the ThreadPoolExecutor. (https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadPoolExecutor.html)

Okay to speed up getTokens() you can split the inputted String divText by using String.substring() method. You split it into as many substrings as you will run Threads running the getTokens() method. Then every Thread will "run" on a certain substring of divText.
Creating more Threads than the CPU can handle should be avoided since context switches create inefficiency.
https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#substring-int-int-
An alternative could be splitting the inputted String of getTokens with the String.split method http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split%28java.lang.String%29 e.g. in case the text is made up of words seperated by spaces or other symbols. Then specific parts of the resulting String array could be passed to different Threads.

Related

How to multithread with threads generated in a loop?

So I'm writing code that will parse through multiple text files in a folder, gather information on them, and deposit that information in two static List instance variables. The order of which the information is deposited does not matter since I will end up sorting it anyways. But for some reason, increase the number of threads does not impact the speed. Here's my run method and the portion of my main method that utilizes multithreading.
public void run() {
parseFiles();
}
public static void main(String[] args) {
while (filesLeft != 0) {
Thread t = new Thread(new fileParser());
t.start();
try {
t.join();
}
catch (InterruptedException e) {
System.out.println("error.");
}
}
If extra information is required, I basically have a static instance variable as an array of the files I need to go through, as well as a constant being the number of threads (which is manually changed for testing purposes). If I were to have, say, 4 threads and 8 files, each call to parseFiles goes through the next 2 files of the array, the indices being monitored by a static instance variable. If I had, say, 4 threads and 9 files, the first thread parses 3 files, the following parse 2, with a statement something along the lines of filesToParse = Math.ceil(filesLeft / threadsLeft), the latter two variables within the ceiling function being static as well.
Is there any error in my code or should I simply be testing larger text files with more words to see a decrease in speed with added threads (currently I have 5 text files each with 20+ paragraphs and I get around 60-70ms).
Wrote a list piece of code that might be useful
public static void main(String[] args) {
long startTime = System.nanoTime();
final List<Runnable> tasks = generateTasks(NUM_TASKS);
List<Thread> threadPool = new LinkedList<>();
for(int i = 0; i < NUM_THREADS; i++) {
Thread thread = new Thread(() -> {
Runnable task = null;
while ((task = getTask(tasks)) != null) {
task.run();
}
});
threadPool.add(thread);
thread.start();
}
for(Thread thread: threadPool) {
try {
thread.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
long runTimeMs = (System.nanoTime() - startTime) / 1000000;
System.out.println(String.format("Ran %d tasks with %d threads in %d ms", NUM_TASKS, NUM_THREADS, runTimeMs));
}
private static Runnable getTask(List<Runnable> tasks) {
synchronized (tasks) {
return tasks.isEmpty() ? null : tasks.remove(0);
}
}

Stream generate at fixed rate

I'm using Stream.generate to get data from Instagram. As instagram limits calls per hour I want generate to run less frequent then every 2 seconds.
I've chosen such title because I moved from ScheduledExecutorService.scheduleAtFixedRate and that's what I was searching for. I do realise that stream intermediate operations are lazy and cannot be called on schedule. If you have better idea for title let me know.
So again I want to have at least 2 second delay between genations.
My attempt wich doesn't take into consideration time consumed by operations after generate, which might take longer then 2s:
Stream.generate(() -> {
List<MediaFeedData> feedDataList = null;
while (feedDataList == null) {
try {
Thread.sleep(2000);
feedDataList = newData();
} catch (InstagramException e) {
notifyError(e.getMessage());
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
return feedDataList;
})
A solution would be to decouple the generator from the Stream, for example using a BlockingQueue
final BlockingQueue<Integer> queue = new LinkedBlockingQueue<>(100);
ScheduledExecutorService scheduler = new ScheduledThreadPoolExecutor(1);
scheduler.scheduleAtFixedRate(() -> {
// Generate new data every 2s, regardless of their processing rate
ThreadLocalRandom random = ThreadLocalRandom.current();
queue.offer(random.nextInt(10));
}, 0, 2, TimeUnit.SECONDS);
Stream.generate(() -> {
try {
// Accept new data if ready, or wait for some more to be generated
return queue.take();
} catch (InterruptedException e) {}
return -1;
}).forEach(System.out::println);
If the data processing takes more than 2s, new data will be enqueued and wait to be consumed. If it takes less than 2s, the take method in the generator will wait for new data to be produced by the scheduler.
This way, you are guaranteed to make less than N calls per hour to Instagram !
As far as I understand, your question is about solving two problems:
waiting at a fixed rate rather than a fixed delay
creating a stream for an unknown number of items which allows processing until some point of time (i.e. is not infinite)
You can solve the first task by using a deadline-based waiting and the second by implementing a Spliterator:
Stream<List<MediaFeedData>> stream = StreamSupport.stream(
new Spliterators.AbstractSpliterator<List<MediaFeedData>>(Long.MAX_VALUE, 0) {
long lastTime=System.currentTimeMillis();
#Override
public boolean tryAdvance(Consumer<? super List<MediaFeedData>> action) {
if(quitCondition()) return false;
List<MediaFeedData> feedDataList = null;
while (feedDataList == null) {
lastTime+=TimeUnit.SECONDS.toMillis(2);
while(System.currentTimeMillis()<lastTime)
LockSupport.parkUntil(lastTime);
try {
feedDataList=newData();
} catch (InstagramException e) {
notifyError(e.getMessage());
if(QUIT_ON_EXCEPTION) return false;
}
}
action.accept(feedDataList);
return true;
}
}, false);
Make a Timer and a semaphore. The timer raises the semaphore every 2 seconds, and in the stream you wait on every call for the semaphore.
This keeps the waits to the specified minimum (2 s), and - funnily - would even work with .parallel().
private final volatile Semaphore tickingSemaphore= new Semaphore(1, true);
In its own thread:
Stream.generate(() -> {
tickingSemaphore.acquire();
...
};
In the timer:
tickingSemaphore.release();

how to maintain a list of threads?

I have hundreds of files to process. I do each file one at a time and it takes 30 minutes.
I'm thinking I can do this processing in 10 simultaneous threads, 10 files at a time, and I might be able to do it in 3 minutes instead of 30.
My question is, what is the "correct" way to manage my 10 threads? And when one is done, create a new one to a max number of 10.
This is what I have so far ... is this the "correct" way to do it?
public class ThreadTest1 {
public static int idCounter = 0;
public class MyThread extends Thread {
private int id;
public MyThread() {
this.id = idCounter++;
}
public void run() {
// this run method represents the long-running file processing
System.out.println("I'm thread '"+this.id+"' and I'm going to sleep for 5 seconds!");
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("I'm thread '"+this.id+"' and I'm done sleeping!");
}
}
public void go() {
int MAX_NUM_THREADS = 10;
List<MyThread> threads = new ArrayList<MyThread>();
// this for loop represents the 200 files that need to be processed
for (int i=0; i<200; i++) {
// if we've reached the max num of threads ...
while (threads.size() == MAX_NUM_THREADS) {
// loop through the threads until we find a dead one and remove it
for (MyThread t : threads) {
if (!t.isAlive()) {
threads.remove(t);
break;
}
}
}
// add new thread
MyThread t = new MyThread();
threads.add(t);
t.start();
}
}
public static void main(String[] args) {
new ThreadTest1().go();
}
}
You can use ExecutorService to manage you threads.
And you can add while loop to thread run method to execute file processing task repeatedly.
Also you can read about BlockingQueue usage. I think it will fit perfectly to allocate new files (tasks) between threads.
I would suggest using Camel's File component if you are open to it. The component will handle all the issues with concurrency to ensure that multiple threads don't try to process the same file. The biggest challenge with making your code multi-threaded is making sure the threads don't interact. Let a framework take care of this for you.
Example:
from("file://incoming?maxMessagesPerPoll=1&idempotent=true&moveFailed=failed&move=processed&readLock=none")
.threads(10).process()

run threads according to the time limit

I want to start max 40 http requests each second and after 1 second, I want it to run another 40 from its own queue(like threadpooltaskexecutor's blocking queue). I am looking for an executor or thread pool implementation for this requirement.
Any recommendations?
Thx
Ali
EDIT: Fix rate is not efficient for the obvious reasons. As the queue items start one by one, the ones on the back of the queue will be just started but ones that has been started for a while may be finished.
Extra EDIT: The problem is to call only 40 request in a second, not have max 40 active. It can be 80 at other second but in 1 second there should only 40 newly created connections.
One way to do this is to use another architecture, it will make the process that much easiser.
1) Create a Thread class that implements the runnable.
2) It takes as parameters a list<>() of http requests that you want to make
3) Make the run() function loop the entire list (size 40)
4) Let the thread live for one second.
Here is a sample example:
class MyClass extends Thread
private ArrayList<...> theList;
public MyClass(ArrayList<..> theList){
this.theList = theList;
}
public void run(){
//Here, you simply want to loop for the entier list (max 40)
for(Req r: theList){
r.sendRequest()
)
}
public statc void main(String args[]){
//Create an instance of your thread:
MyClass t = new MyClass(ReqList<..>());
//Now that you have your thread, simply do the following:
while(true){
t = new MyClass( (insert your new list));
t.start();
try{
Thread.sleep(1000);
}catch(Exception e){
}
)
}
}
And there you have it
First define a class that implements Callable which will do your thread's treatment :
class MyClass implements Callable<String>
{
/**
* Consider this as a new Thread.
*/
#Override
public String call()
{
//Treatment...
return "OK"; //Return whatever the thread's result you want to be then you can access it and do the desired treatment.
}
}
Next step is to create an ExecutorService in my example, a Thread pool and throw in some tasks.
int nbThreadToStart = 40;
ExecutorService executor = Executors.newFixedThreadPool(/* Your thread pool limit */);
List<Future<String>> allTasks = new ArrayList<Future<String>>(/* Specify a number here if you want to limit your thread pool */);
for(int i = 0; i < 10; i++)//Number of iteration you want
{
for(int i = 0; i < nbThreadToStart; i++)
{
try
{
allTasks.add(executor.submit(new MyClass()));
}
catch(Exception e)
{
e.printStackTrace();
}
}
try
{
Thread.sleep(1000);
}
catch(Exception e)
{
e.printStackTrace();
}
}
executor.shutdown();
//You can then access all your thread(Tasks) and see if they terminated and even add a timeout :
try
{
for(Future<String> task : allTasks)
task.get(60, TimeUnit.SECONDS);//Timeout of 1 seconds. The get will return what you specified in the call method.
}
catch (TimeOutException te)
{
...
}
catch(InterruptedException ie)
{
...
}
catch(ExecutionException ee)
{
...
}
I'm not sure what you really want, but I think you should handle multi-threading with a thread pool specially if you're planning on receiving a lot of requests to avoid any undesired memory leak etc.
If my example is not clear enough, note that there is many other methods offered by ExexutorService,Future etc. that are very usefull when dealing with Thread.
Check this out :
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executors.html
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executor.html
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html
That's it for my recommandations.

How to make the main thread wait for the other threads to complete in ThreadPoolExecutor

I am using the ThreadPoolExecutor to implement threading in my Java Application.
I have a XML which I need to parse and add each node of it to a thread to execute the completion. My implementation is like this:
parse_tp is a threadpool object created & ParseQuotesXML is the class with the run method.
try {
List children = root.getChildren();
Iterator iter = children.iterator();
//Parsing the XML
while(iter.hasNext()) {
Element child = (Element) iter.next();
ParseQuotesXML quote = new ParseQuotesXML(child, this);
parse_tp.execute(quote);
}
System.out.println("Print it after all the threads have completed");
catch(Exception ex) {
ex.printStackTrace();
}
finally {
System.out.println("Print it in the end.");
if(!parse_tp.isShutdown()) {
if(parse_tp.getActiveCount() == 0 && parse_tp.getQueue().size() == 0 ) {
parse_tp.shutdown();
} else {
try {
parse_tp.awaitTermination(30, TimeUnit.SECONDS);
} catch (InterruptedException ex) {
log.info("Exception while terminating the threadpool "+ex.getMessage());
ex.printStackTrace();
}
}
}
parse_tp.shutdown();
}
The problem is, the two print out statements are printed before the other threads exit. I want to make the main thread wait for all other threads to complete.
In normal Thread implementation I can do it using join() function but not getting a way to achieve the same in ThreadPool Executor. Also would like to ask if the code written in finally block to close the threadpool proper ?
Thanks,
Amit
A CountDownLatch is designed for this very purpose. Examples may be found here and here. When the number of threads is not known in advance, consider a Phaser, new in Java 1.7, or an UpDownLatch.
To answer your second question, I think you are doing a reasonable job trying to clean up your thread pool.
With respect to your first question, I think the method that you want to use is submit rather than execute. Rather than try to explain it all in text, here's an edited fragment from a unit test that I wrote that makes many tasks, has each of them do a fragment of the total work and then meets back at the starting point to add the results:
final AtomicInteger messagesReceived = new AtomicInteger(0);
// ThreadedListenerAdapter is the class that I'm testing
// It's not germane to the question other than as a target for a thread pool.
final ThreadedListenerAdapter<Integer> adapter =
new ThreadedListenerAdapter<Integer>(listener);
int taskCount = 10;
List<FutureTask<Integer>> taskList = new ArrayList<FutureTask<Integer>>();
for (int whichTask = 0; whichTask < taskCount; whichTask++) {
FutureTask<Integer> futureTask =
new FutureTask<Integer>(new Callable<Integer>() {
#Override
public Integer call() throws Exception {
// Does useful work that affects messagesSent
return messagesSent;
}
});
taskList.add(futureTask);
}
for (FutureTask<Integer> task : taskList) {
LocalExecutorService.getExecutorService().submit(task);
}
for (FutureTask<Integer> task : taskList) {
int result = 0;
try {
result = task.get();
} catch (InterruptedException ex) {
Thread.currentThread().interrupt();
} catch (ExecutionException ex) {
throw new RuntimeException("ExecutionException in task " + task, ex);
}
assertEquals(maxMessages, result);
}
int messagesSent = taskCount * maxMessages;
assertEquals(messagesSent, messagesReceived.intValue());
I think this fragment is similar to what you're trying to do. The key components were the submit and get methods.
First of all you can use ThreadPoolExecutor.submit() method, which returns Future instance, then after you submitted all your work items you can iterate trough those futures and call Future.get() on each of them.
Alternatively, you can prepare your runnable work items and submit them all at once using ThreadPoolExecutor.invokeAll(), which will wait until all work items completed and then you can get the execution results or exception calling the same Future.get() method.

Categories