ForkJoinFramework only uses two workers - java

I have an application which crawls around six thousand urls.To minimize this work i created a RecursiveTask which consumes a ConcurrentLinkedQueue of all URLs to crawl. It splits up to 50 off and if the que is empty it crawls it directly but if not it first creates a new instance of itself and forks it, after that it crawls the subset of 50 and after that it will join the forked task.
Now comes my problem, until each thread has worked of his 50 all four work quick anf at the same time. But after two stop working and waiting for join and only the other two are working and creating new forks and crawling pages.
To visualize this i count the number how mouch URLs a Thread crawls and let a JavaFX gui show it.
What do i wrong so the ForkJoinFramewok only uses two of my four allowed threads? What can i do to change it?
Here is my compute method of the task:
LOG.debug(
Thread.currentThread().getId() + " Starting new Task with "
+ urlsToCrawl.size() + " left."
);
final ConcurrentLinkedQueue<D> urlsToCrawlSubset = new ConcurrentLinkedQueue<>();
for (int i = 0; i < urlsToCrawl.size() && i < config.getMaximumUrlsPerTask(); i++)
{
urlsToCrawlSubset.offer(urlsToCrawl.poll());
}
LOG.debug(
Thread.currentThread().getId() + " Crated a Subset with "
+ urlsToCrawlSubset.size() + "."
);
LOG.debug(
Thread.currentThread().getId()
+ " Now the Urls to crawl only left " + urlsToCrawl.size() + "."
);
if (urlsToCrawl.isEmpty())
{
LOG.debug(Thread.currentThread().getId() + " Crawling the subset.");
crawlPage(urlsToCrawlSubset);
}
else
{
LOG.debug(
Thread.currentThread().getId()
+ " Creating a new Task and crawling the subset."
);
final AbstractUrlTask<T, D> otherTask = createNewOwnInstance();
otherTask.fork();
crawlPage(urlsToCrawlSubset);
taskResults.addAll(otherTask.join());
}
return taskResults;
And here is an snapshot of my diagram:
P.s. If i allow up to 80 threads it will us them until every has 50 URLs crawled an then uses only two.
And if you're interested, here is the complete source code: https://github.com/mediathekview/MServer/tree/feature/cleanup

I fixed it. My error was, that i splitted then worked a small protion and than waited instead of split it into half, and then call my self again with the rest other half etc.
In other words before i splitted and worked directly but correct is to split till all is splitted and then start working.
Here is my code how it looks now:
#Override
protected Set<T> compute()
{
if (urlsToCrawl.size() <= config.getMaximumUrlsPerTask())
{
crawlPage(urlsToCrawl);
}
else
{
final AbstractUrlTask<T, D> rightTask = createNewOwnInstance(createSubSet(urlsToCrawl));
final AbstractUrlTask<T, D> leftTask = createNewOwnInstance(urlsToCrawl);
leftTask.fork();
taskResults.addAll(rightTask.compute());
taskResults.addAll(leftTask.join());
}
return taskResults;
}
private ConcurrentLinkedQueue<D> createSubSet(final ConcurrentLinkedQueue<D> aBaseQueue)
{
final int halfSize = aBaseQueue.size() / 2;
final ConcurrentLinkedQueue<D> urlsToCrawlSubset = new ConcurrentLinkedQueue<>();
for (int i = 0; i < halfSize; i++)
{
urlsToCrawlSubset.offer(aBaseQueue.poll());
}
return urlsToCrawlSubset;
}

Related

Parallel stream doesn't look like working in parallel, completely

1. Set's parallelStream doesn't use enough thread.
Java8 parallelStream doesn't working exactly parallel.
In my computer, java8 set's parallelStream is not using enough thread when task's count is smaller than processor's count.
public class ParallelStreamSplitTest {
#Test
public void setStreamParallelTest() {
System.out.printf("Total processor count : %d \n", Runtime.getRuntime().availableProcessors());
long start = System.currentTimeMillis();
IntStream.range(1, 8).boxed().collect(Collectors.toCollection(HashSet::new)).parallelStream().forEach((index) -> {
System.out.println("Starting " + Thread.currentThread().getName() + ", index=" + index + ", " + new Date());
try {
Thread.sleep(1000);
} catch (Exception e) {
}
});
long end = System.currentTimeMillis();
System.out.println(Thread.currentThread().getName() + "'s elapsed time : " + (end - start));
}
#Test
public void intStreamParallelTest() {
System.out.printf("Total processor count : %d \n", Runtime.getRuntime().availableProcessors());
long start = System.currentTimeMillis();
IntStream.range(1, 8).parallel().forEach(index -> {
System.out.println("Starting " + Thread.currentThread().getName() + ", index=" + index + ", " + new Date());
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
}
});
long end = System.currentTimeMillis();
System.out.println(Thread.currentThread().getName() + "'s elapsed time : " + (end - start));
}
}
In my code, setStreamParallelTest takes 4 seconds whereas intStreamParallelTest takes 1 second.
I expect that setStreamParallelTest also done in 1 seconds.
Is it bug?
2. Is it okay to use parallel stream to call another api in web application? If it is wrong, why?
My web application need to call another api server in parallel. So I use parallel stream to call api.
Sets.newHashSet(api1, api2, api3, api4).parallelStream().forEach(api -> callApiSync(api))
I think all requests bound for my server share a fork-join pool. so, It looks dangerous when one of api's response is slow.
Is it correct?
The contract for parallelStream says:
Returns a possibly parallel Stream with this collection as its source. It is allowable for this method to return a sequential stream.
If you want to invoke several tasks in parallel, use an ExecutorService.

NPE in a do/while loop due to EOF...catching the EOF earlier to avoid the NPE [duplicate]

This question already has answers here:
What is a NullPointerException, and how do I fix it?
(12 answers)
Closed 5 years ago.
I have written this program to compare 2 files. They are 500mb to 2.8gb in size and are created every 6 hours. I have 2 files from 2 sources (NMD and XMP). They are broken up into lines of text that have fields separated by the pipe(|) character. Each line is a single record and may be up to 65,000 characters long. The data is about TV shows and movies, showing times and descriptive content. I have determined that any particular show or movie has a minimum of 3 pieces of data that will uniquely identify that show or movie. IE: CallSign, ProgramId and StartLong. The two sources for this data are systems called NMD and XMP hence that acronym added to various variables. So my goal is to compare a file created by NMD and one created by XMP and confirm that everything that NMD produces is also produced by XMP and that the data in each matched record is the same.
What I am trying to accomplish here is this: 1. Read the NMD file record by record for the 3 unique data fields. 2. Read the XMP file record by record and look for a match for the current record in the NMD file. 3.The NMD file should iterate one record at a time. Each NMD record should then be searched for in the entire XMD file, record by record for that same record. 4. Write a log entry in one of 2 files indicating success or failure and what that data was.
I have an NPE issue when I reach the end of the testdataXMP.txt file. I assume the same thing will happen for testdataNMD.txt. I'm trying to break out of the loop right after the readLine since the epgsRecordNMD or epgsRecordXMP will have just reached the end of the file if it at that point in the file. The original NPE was for trying to do a string split on null data at the end of the file. Now I'm getting an NPE here according to the debugger.
if (epgsRecordXMP.equals(null)) {
break;
}
Am I doing this wrong? If I'm really at the end of the file, the readLine ought to return null right?
I did it this way too, but to my limited experience they feel like they are effectively the same thing. It too threw an NPE.
if (epgsRecordXMP.equals(null)) break;
Here's the code...
public static void main(String[] args) throws java.io.IOException {
String epgsRecordNMD = null;
String epgsRecordXMP = null;
BufferedWriter logSuccessWriter = null;
BufferedWriter logFailureWriter = null;
BufferedReader readXMP = null;
BufferedReader readNMD = null;
int successCount = 0;
readNMD = new BufferedReader(new FileReader("d:testdataNMD.txt"));
readXMP = new BufferedReader(new FileReader("d:testdataXMP.txt"));
do {
epgsRecordNMD = readNMD.readLine();
if (epgsRecordNMD.equals(null)) {
break;
}
String[] epgsSplitNMD = epgsRecordNMD.split("\\|");
String epgsCallSignNMD = epgsSplitNMD[0];
String epgsProgramIdNMD = epgsSplitNMD[2];
String epgsStartLongNMD = epgsSplitNMD[9];
System.out.println("epgsCallsignNMD: " + epgsCallSignNMD + " epgsProgramIdNMD: " + epgsProgramIdNMD + " epgsStartLongNMD: " + epgsStartLongNMD );
do {
epgsRecordXMP = readXMP.readLine();
if (epgsRecordXMP.equals(null)) {
break;
}
String[] epgsSplitXMP = epgsRecordXMP.split("\\|");
String epgsCallSignXMP = epgsSplitXMP[0];
String epgsProgramIdXMP = epgsSplitXMP[2];
String epgsStartLongXMP = epgsSplitXMP[9];
System.out.println("epgsCallsignXMP: " + epgsCallSignXMP + " epgsProgramIdXMP: " + epgsProgramIdXMP + " epgsStartLongXMP: " + epgsStartLongXMP);
if (epgsCallSignXMP.equals(epgsCallSignNMD) && epgsProgramIdXMP.equals(epgsProgramIdNMD) && epgsStartLongXMP.equals(epgsStartLongNMD)) {
logSuccessWriter = new BufferedWriter (new FileWriter("d:success.log", true));
logSuccessWriter.write("NMD match found in XMP " + "epgsCallsignNMD: " + epgsCallSignNMD + " epgsProgramIdNMD: " + epgsProgramIdNMD + " epgsStartLongNMD: " + epgsStartLongNMD);
logSuccessWriter.write("\n");
successCount++;
logSuccessWriter.write("Successful matches: " + successCount);
logSuccessWriter.write("\n");
logSuccessWriter.close();
System.out.println ("Match found");
System.out.println ("Successful matches: " + successCount);
}
} while (epgsRecordXMP != null);
readXMP.close();
if (successCount == 0) {
logFailureWriter = new BufferedWriter (new FileWriter("d:failure.log", true));
logFailureWriter.write("NMD match not found in XMP" + "epgsCallsignNMD: " + epgsCallSignNMD + " epgsProgramIdNMD: " + epgsProgramIdNMD + " epgsStartLongNMD: " + epgsStartLongNMD);
logFailureWriter.write("\n");
logFailureWriter.close();
System.out.println ("Match NOT found");
}
} while (epgsRecordNMD != null);
readNMD.close();
}
}
You should not make this:
if (epgsRecordXMP.equals(null)) {
break;
}
If you want to know if epgsRecordXMPis null then the if should be like this:
if (epgsRecordXMP == null) {
break;
}
To sum up: your app throws NPE when try to call equals method in epgsRecordXMP.

Pause execution of a loop in main method till all Threads finish Java 1.5

I am reading multiple arguments from command line using Java 1.5 . The arguments are names of flat files. I loop thru the arguments in the main method and call a method which in turn creates a bunch of threads to process the file. I need to pause the loop till all threads processing the first argument complete and then move on to create threads for the second argument. How can I queue the arguments or pause the loop execution in my main method till all threads processing current argument complete?
Use Threadpools and an Executor. Take a look at the java.util.concurrent package.
for(String argument:args){
//you said you want multiple threads to work on a single argument.
//create callables instead and use a ThreadPool
List<Callable<YourResult>> lstCallables = createCallablesFor(argument);
List<Future<YourResult>> futures = Executors.newCachedThreadPool().invokeAll(lstCallables);
for(Future<YourResult> future:futures){
//this get() waits until the thread behind the current future is done.
// it also returns whatever your callable might return.
future.get();
}
// at this point, all the threads working on the current argument are finished
// and the next loop iteration works on the next argument
}
I wonder if you are looking for something like cyclic barriers.
You need to start the thread job inside the loop for one argument so that after one job is finished next loop is started and next thread job for next argument is started. And further you can work in your thread job where you defined that.
Example: this is just a snippet
for (int i = 0; i < count; i++) {
t[i] = new RunDemo();
String[] serverList = srv[i].split(",");
String logName = filename + "_" + serverList[0] + "_log";
String sql = "INSERT INTO .....(any query)";
t[i].setStr("sqlplus -L " + username[i] + "/" + password[i] + "#"
+ serverList[1] + ":" + serverList[2] + "/" + serverList[3]
+ " #" + filename1);
t[i].setLogName(logName);
t[i].setDirectory(dir);
try{
conn.UpdateQuery(sql);
log.info("Inserted into the table data with query " + sql);
}
catch (Exception e){
log.info("The data can't be inserted into table with " + e.getMessage() + " sql query " + sql);
}
new Thread(t[i]).start();
}
Here in every loop new thread with different serverList is created and started.
Now the job definition is given below:
public void run() {
JShell jshell = new JShell();
try {
log.info("Command is: " + this.str + " log name: " + this.LogName + " in directory: " + this.directory);
jshell.executeCommand(this.str, this.LogName, this.directory);
log.info("Executed command successfully");
} catch (Exception e1) {
log.info("Error at executing command with error stack: ");
e1.printStackTrace();
}
DBConnection conn1 = new DBConnection();
String sql = "UPDATE patcheventlog SET ENDTIME=SYSDATE WHERE LOGFILE='" + this.directory + this.LogName + "'";
try {
//conn1.callConnection("192.168.8.81", "d2he");
conn1.callConnection(ip, sid);
conn1.UpdateQuery(sql);
conn1.disposeConnection();
} catch (SQLException e) {
e.printStackTrace();
} catch (ClassNotFoundException e) {
e.printStackTrace();
}
System.out.print(this.LogName);
}
So this is how you work with the threads inside the loop. You don't need to pause your loop.
Hope that helps.

Java - Updating static variables

I have two classes in java that need to run at the same time - A Crawler class ( that basically implements a web crawler, and keeps printing out urls as it encounters them ), and an Indexer class, which as of now, is supposed to simply print the urls crawled.
For this, my Indexer class has a Queue :
public static Queue<String> urls = new LinkedList();
And in the toVisit() function of my Crawler class, I have the following :
Indexer.urls.add( url ) // where url is a String
The Crawler is working totally fine, since it prints out all the urls that it has encountered, but for some reason, these urls do not get added to the Queue in my Indexer class. Any idea why this may be the case ?
The toVisit() method from Crawler.java is as follows :
public void visit(Page page) {
int docid = page.getWebURL().getDocid();
String url = page.getWebURL().getURL();
String domain = page.getWebURL().getDomain();
String path = page.getWebURL().getPath();
String subDomain = page.getWebURL().getSubDomain();
String parentUrl = page.getWebURL().getParentUrl();
System.out.println("Docid: " + docid);
System.out.println("URL: " + url);
System.out.println("Domain: '" + domain + "'");
System.out.println("Sub-domain: '" + subDomain + "'");
System.out.println("Path: '" + path + "'");
System.out.println("Parent page: " + parentUrl);
Indexer.urls.add( url );
System.out.println("=============");
}
Code from my Indexer class :
public static Queue<String> urls = new LinkedList();
public static void main( String[] args )
{
while( urls.isEmpty() )
{
//System.out.println("Empty send queue");
Thread.sleep(sleepTime);
}
System.out.println( urls.poll() );
}
Okay, so I solved my problem by doing as suggested by BigMike. I implemented the Runnable interface in my two classes, and then ran those 2 classes as threads within the main function of a new third class.
Thanks everyone for all your help ! :)

Threads in Java, states? Also what is the right way to use them?

I'm up for my exame presentation the day after tomorrow, so i need to get some straight before it which i hope you guys can help me with.
First i do know that there are 4 states of Threads (i.e Running, Ready, Blocked, Terminated), however i'm not quite sure how it works in Java. In my code i use the thread.sleep(3000) to do some waiting in the program, does this make the thread Blocked or Ready?
Also it have come to my attention that i might not have used the threads the right way, let me show you some code
public class BattleHandler implements Runnable {
private Player player;
private Monster enemyMonster;
private Dungeon dungeon;
private JTextArea log;
private GameScreen gScreen;
public void run() {
try {
runBattle();
}
catch(Exception e) { System.out.println(e);}
}
public BattleHandler(Player AttackingPlayer, JTextArea log, GameScreen gScreen) {
this.player = AttackingPlayer;
this.log = log;
this.gScreen = gScreen;
}
public void setDungeon(Dungeon dungeon) {
this.dungeon = dungeon;
}
public Dungeon getDungeon() {
return dungeon;
}
public Monster getEnemyMonster() {
return enemyMonster;
}
public void setMonster() {
// First check if dungeon have been init, if not we can't generate the mob
if(dungeon != null) {
enemyMonster = new Monster();
// Generate monster stats
enemyMonster.generateStats(dungeon);
}else {
System.out.println("Dungeon was not initialized");
}
}
public void runBattle() throws InterruptedException {
// Start battle, and run until a contester is dead.
while(player.getHealth() > 0 && enemyMonster.getHealth() > 0) {
int playerStrikeDmg = player.strike();
if(enemyMonster.blockDefend()) {
log.setText( log.getText() + "\n" + player.getName() +" tried to strike " + enemyMonster.getName()+ ", but " + enemyMonster.getName() + " Blocked.");
}else if(enemyMonster.dodgeDefend()) {
log.setText( log.getText() + "\n" + player.getName() +" tried to strike " + enemyMonster.getName()+ ", but " + enemyMonster.getName() + " Blocked.");
}else {
enemyMonster.defend(playerStrikeDmg);
log.setText( log.getText() + "\n" + player.getName() +" strikes " + enemyMonster.getName()+ " for: " + playerStrikeDmg + " left: "+ enemyMonster.getHealth());
}
if(enemyMonster.getHealth() < 1) break;
Thread.sleep(3000);
// Monster Turn
int monsterDmg = enemyMonster.strike();
if(player.blockDefend()) {
log.setText( log.getText() + "\n" + enemyMonster.getName() +" tried to strike " + player.getName()+ ", but " + player.getName()+ " Blocked.");
}else if(player.dodgeDefend()) {
log.setText( log.getText() + "\n" + enemyMonster.getName() +" tried to strike " + player.getName()+ ", but " + player.getName()+ " Dodged.");
}else {
player.defend(monsterDmg);
log.setText( log.getText() + "\n" + enemyMonster.getName() +" strikes " + player.getName()+ " for: " + monsterDmg + " left: "+ player.getHealth());
}
gScreen.updateBot();
Thread.sleep(3000);
}
When i coded this i thought it was cool, but i have seen some make a class just for controlling the Thread itself. I have just made the class who uses the Sleep runable(Which is not shown in the code, but its a big class).
Would be good to get this straight, so i can point i out before they ask me about it, you know take away there ammunition. :D
Hope you guys can help me :).
Thx
Threads have more than 4 states. Also, I recommend reading Lesson: Concurrency for more information regarding threads.
Note that if you're looking to execute a task at a set interval, I highly recommend using the Executors framework.
Blocked - it will not run at all until timeout. Ready is 'runnable now but there is no processor available to run it - will run as soon as a processor becomes available'.
As all the other guys state, there are more than those, here's a simple listing:
Running - Guess what, it's running
Waiting - It waits for another thread to complete its calculation (that's the wait() method in Java). Basically such a thread can also be run by the scheduler, like the "ready" state threads.
Ready - Means that the Thread is ready for execution, once the OS-Scheduler turns to this Thread, it will execute it
Blocked - Means that there is another operation, blocking this threads execution, such as IO.
Terminated - Guess what, it's done and will be removed by the OS-Scheduler.
For a complete listing, look at the famous Wikipedia ;)
http://en.wikipedia.org/wiki/Process_state

Categories