I need to create a multithreaded Java application for recursive directory search wherein I need to search for all files/folders based on the search-string.
Example : :
Search-string - 'hello'
Search-directory : 'C:\'
Expectation here is that I need to recursively search all files & folders in C:\ having
name as hello
My idea is to spawn a thread per directory to have a better performance.
Challenge is that we have a timeout factor wherein all the matching files/folders are to be shown within the timeout interval - if timeout happens before complete search is done, we need to show whatever results are available. I am pretty confused on how to handle this timeout - can you please help?
Cheers,
Jay
Think you can spawn a child thread for search - say 'ThreadA' from main thread. Have a waited-join for Main Thread and ThreadA. Let this child thread 'ThreadA' spawn other threads and have a join for the same. Let all the child threads use ConcurrentLinkedQueue to capture the results - so, after the Main Thread returns after the join timeout, it can print the values in the Queue
Related
Hello !
I would like to create a child process which could run after the end of the main process in java (I did a simple schema of the execution I'd like : result wanted. Is that possible ?
You can't start a child process from a process that is no longer running, so the child will have to be started beforehand. And in that case, how will the child know the parent has terminated?
Your design is upside down. The "main work" should be done in a child, and the parent should sit there waiting for the child to terminate before it does whatever it needs to do. That's an easy implementation and is a common design pattern. It is, for example, what any Unix shell does to run an external program.
(I assume that when you say "process" that is what you mean - i.e., something in an entirely separate address space)
Finally I've found a simple response. I don't know why I was hesitating so much.
I'd just start a thread from my main fonction and according to my log, the thread continued after the principal treatment.
I'm guessing I have a daemon thread because the child is not killed.
I'm trying to design a process which will spawn multiple sub process(instances) and finally the outcome of the sub-processes will decide the main process flow.
As I agree there can by multiple ways to design the same, but wanted to to check from all experts in this forum.
My query is what is the followed model in this case.
Do we create two individual processes -
1. to spawn the process and then wrap it up.
2. To keep track the sub processes and inform the main process(1).
Please help me with your suggestions.
Thanks & regards,
BPMN developer.
For this use case, BPMN has multi-instanc Activities - see the BPMN Specification, page 432 (page 462 in PDF).
You can create a sub process of the type multi-instance Activity and define - via the isSequential attribute - whether the sub processes should be spawned sequentially or in parallel.
Via the completionCondition Expression, you can define a check that is executed every time an instance completes and that cancels all other instances if it returns true.
Status: solved
I had to make a pastebin as I had to point out line numbers.
note: not using executorsService or thread pools. just to understand that what is wrong in starting and using threads this way. If I use 1 thread. the app works Perfect!
related links:
http://www.postgresql.org/docs/9.1/static/transaction-iso.html
http://www.postgresql.org/docs/current/static/explicit-locking.html
main app, http://pastebin.com/i9rVyari
logs, http://pastebin.com/2c4pU1K8 , http://pastebin.com/2S3301gD
I am starting many threads (10) in a for loop with instantiating a runnable class but it seems I am getting same result from db (I am geting some string from db, then changing it) but with each thread, I get same string (despite each thread changed it.) . using jdbc for postgresql what might be the usual issues ?
line 252
and line 223
the link is marked as processed. (true) in db. other threads of crawler class also do it. so when line 252 should get a link. it should be processed = false. but I see all threads take same link.
when one of the threads crawled the link . it makes it processed = true. the others then should not crawl it. (get it) is its marked processed = true.
getNonProcessedLinkFromDB() returns a non processed link
public String getNonProcessedLink(){ line 645
public boolean markLinkAsProcesed(String link){ line 705
getNonProcessedLinkFromDB will see for processed = false links and give one out of them . limit 1
each thread has a starting interval gap of 20 secs.
within one thread. 1 or 2 seconds (estimate processing time for crawling)
line 98 keepS threads from grabbing the same url
if you see the result. one thread made it true. still others access it. waaaay after some time.
all thread are seperate. even one races. the db makes the link true at the moment the first thread processes it
This is a situation of not a concise question being asked. There is lots of code in there and you have no idea what is going on. You need to break it down so that you can understand where it is going wrong, then show us that bit.
Some things of potential conflict.
You are opening a database connections for almost every process. The normal flow of an application is to open a few connections, do some processing, then close them.
Are you handling database commits? I don't remember what the default setting is for a postres database, you'll have to look into it.
There are 3 states a single url is in. Unprocessed, being processed, processed. I don't think you are handling the 'being processed' state at all. Because being processed takes time and may fail, you have to account for those situations.
I did not read the logs because they are useless to me.
-edit for comment-
Databases generally have transactions. Modifications you make in one transaction are not seen in other transactions until they are committed. Transaction can be rolled back. You'll need to look into fetching the row you just updated and see if the value has really changed. Do this in another transaction or on another connection.
The gap of 20 seconds looks like it is only when the process is started. Imagine a situation where Thread1 processes URL1 and Thread2 processes URL2. They both finish at about the same time. They both look for the next unprocessed URL (say URL3). They would both start processing this Url because they don't know another thread has started it. You need one process handing out the Url, possibly a queue is what you'd want to look at.
Logging might be improved if you knew which threads were working on which URLs. You also need a smaller sample size so that you can get your head around what is going on.
Despite the comments and response by helpers in this post were also correct.
at the start of crawl() method body.
synchronized(Crawler.class){
url = getNonProcessedLinkFromDB();
new BasicDAO().markLinkAsProcesed(url);
}
and at the bottom of crawl() method body (when it has done processing):
crawl(nonProcessedLinkFromDB);
actually solved the issue.
It was the gap between marking a link processed true and fetching a new one and letting other threads get the same link while the current was working on it.
Synchonized block helped further.
Thanks to helper. "Fuber" on IRC channels. Quakenet servers #java and Freenode servers ##javaee
and ALL who supported me!
I have many threads performing different operations on object and when nearly 50% of the task finished then I want to serialize everything(might be I want to shut down my machine ).
When I come back then I want to start from the point where I had left.
How can we achieve?
This is like saving state of objects of any game while playing.
Normally we save the state of the object and retrieve back. But here we are storing its process's count/state.
For example:
I am having a thread which is creating salary excel sheet for 50 thousand employee.
Other thread is creating appraisal letters for same 50 thousand employee.
Another thread is writing "Happy New Year" e-mail to 50 thousand employee.
so imagine multiple operations.
Now I want to shut down in between 50% of task finishes. say 25-30 thousand employee salary excel-sheet have been written and appraisal letters done for 25-30 thousand and so on.
When I will come back next day then I want to start the process from where I had left.
This is like resume.
I'm not sure if this might help, but you can achieve this if the threads communicate via in-memory queues.
To serialize the whole application, what you need to do is to disable the consumption of the queues, and when all the threads are idle you'll reach a "safe-point" where you can serialize the whole state. You'll need to keep track of all the threads you spawn, to know if they are in are idle.
You might be able to do this with another technology (maybe a java agent?) that freezes the JVM and allows you to dump the whole state, but I don't know if this exists.
well its not much different than saving state of object.
just maintain separate queues for different kind of inputs. and on every launch (1st launch or relaunch) check those queues, if not empty resume your 'stopped process' by starting new process but with remaining data.
say for ex. an app is sending messages, and u quit the app with 10 msg remaining. Have a global queue, which the app's senderMethod will check on every launch. so in this case it will have 10msg in pending queue, so it will continue sending remaining msgs.
Edit:
basically, for all resumable process' say pr1, pr2....prN, maintain queue of inputs, say q1, q2..... qN. queue should remove processed elements, to contain only pending inputs. as soon as u suspend system. store these queues, and on relaunching restore them. have a common routine say resumeOperation, which will call all resumable process (pr1, pr2....prN). So it will trigger the execution of methods with non-0 queues. which in tern replicate resuming behavior.
Java provides the java.io.Serializable interface to indicate serialization support in classes.
You don't provide much information about the task, so it's difficult to give an answer.
One way to think about a task is in terms of a general algorithm which can split in several steps. Each of these steps in turn are tasks themselves, so you should see a pattern here.
By cutting down each algorithms in small pieces until you cannot divide further you get a pretty good idea of where your task can be interrupted and recovered later.
The result of a task can be:
a success: the task returns a value of the expected type
a failure: somehow, something didn't turn right while doing computation
an interrupted computation: the work wasn't finished, but it may be resumed later, and the return value is the state of the task
(Note that the later case could be considered a subcase of a failure, it's up to you to organize your protocol as you see fit).
Depending on how you generate the interruption event (will it be a message passed from the main thread to the worker threads? Will it be an exception?), that event will have to bubble within the task tree, and trigger each task to evaluate if its work can be resumed or not, and then provide a serialized version of itself to the larger task containing it.
I don't think serialization is the correct approach to this problem. What you want is persistent queues, which you remove an item from when you've processed it. Every time you start the program you just start processing the queue from the beginning. There are numerous ways of implementing a persistent queue, but a database comes to mind given the scale of your operations.
I have a scenario in which I have to register a number of users and than run parallel with threads as the number of registered users and execute same set of actions by all the users in parallel. for this I have a jmx with few actions that should happened only once (in a setup thread with one thread count) and another thread group that runs with say 5 threads which is the number of the previously registered users, and I execute some operations using these users.
Now I want execute this whole scenario in parallel using 5 threads.
How do I come about doing this?
I used the include controller but thread groups are not executed as expected, I don't get 25 iterations for the actions that happen in a 5 threads group in the included jmx.
I'm not precisely sure what you're doing, and I know little of jmx, but here's a couple of ideas. One (or both) might be relevant.
The first one is that your threads might be sharing an instance field. If they have a common counter, for instance, you will do something 5 times rather than 25 times. Make sure your common variables (instance and class fields) are properly synchronized. Use local variables whenever possible. You must use them when their value applies to each thread rather than all threads.
The second is that you might be displaying results--or event stopping the program--before all threads have done their work. It's worst on single-core machines, but threads can and do run in any order imaginable, and in a few orders that are not. They can run one at a time, with the one started last running first. One can stop in the middle, and let all the other run to completion, then start up again. A bunch can run simultaneously (on different cores or swapping rapidly) while others do nothing.
I'd suggest putting in a bunch of logging/output statements (System.out.println is good enough) and seeing for yourself what's happening. It'll take you a while to make sense of your output, but once you do, you'll be able to start bringing things under control.