How to execute the url of each individual machine every x minutes? - java

I am working on a project in which I have three datacenters - DC1, DC2 and DC3.
In DC1 I have 2 machines (machineA and machineB), in DC2 I have two machine (machineC and machineD) and in DC3 I have two machines again (machineE and machineF).
Each machine URL in each datacenter is like this and it returns back the string as the response -
http://machineName:8080/textbeat
For DC1-
http://machineA:8080/textbeat
http://machineB:8080/textbeat
For DC2-
http://machineC:8080/textbeat
http://machineD:8080/textbeat
For DC3-
http://machineE:8080/textbeat
http://machineF:8080/textbeat
Here is the response string I see in general after hitting the url for any particular machine -
state: READY server_uptime: 12462125 data_syncs: 29
Problem Statement:-
Now I need to iterate all the machines in each datacenters and execute the URL and then extract data_syncs from it. And this has to be done every 1 minute.
And now if machineA data_syncs is always zero continuously for a period of 5 minutes, then I would like to print DC1 and machineA. Similarly for machineB and other datacenters.
The logic that I was thinking -
Ping each individual machine from each datacenter, extract the data_syncs value if it is zero, increment the counter by one,
Then try again after one minute, if the value is still zero, increment the same counter again by one.
If the counter reaches 5 (as it is 5 minutes) and it was still zero continuously, then I would add this machine and datacenter name in my map.
But suppose during three continuous tries it was zero and in fourth try it became non zero, then my counter will get reset to zero for that machine in the datacenter and start the process again for that machine.
Below is my map in which I am putting the datacenter and its machines if they have met above condition -
final Map<String, List<String>> holder = new LinkedHashMap<String, List<String>>();
Here key is the datacenter name and value is the list of machines for that datacenter which has met the condition.
Below is the code I came up with to solve the above problem but it doesn't work the way as I am supposed to do I guess. Here my counter is same for all the machines I guess which is not what I want.
public class MachineTest {
private static int counter = 0;
private final static ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(2);
public static void main(String[] args) {
final ScheduledFuture<?> taskUtility = scheduler.scheduleAtFixedRate(new Runnable() {
public void run() {
try {
generalUtility();
} catch (Exception ex) {
// log an exception
}
}
}, 0, 1L, TimeUnit.MINUTES);
}
protected static void generalUtility() {
try {
final Map<String, List<String>> holder = new LinkedHashMap<String, List<String>>();
List<String> datacenters = Arrays.asList("DC1", "DC2", "DC3");
for (String datacenter : datacenters) {
LinkedList<String> machines = new LinkedList<String>();
List<String> childrenInEachDatacenter = getMachinesInEachDatacenter(datacenter);
for (String hosts : childrenInEachDatacenter) {
String host_name = hosts;
String url = "http://" + host_name + ":8080/textbeat";
MachineMetrics metrics = GeneralUtilities.getMetricsOfMachine(host_name, url); // execute the url and populate the MachineMetrics object
if (metrics.getDataSyncs().equalsIgnoreCase("0")) {
counter++;
if (counter == 5) {
machines.add(hosts);
}
}
}
if(!machines.isEmpty()) {
holder.put(datacenter, machines);
}
}
if (!holder.isEmpty()) {
// log the datacenter and its machine as our criteria is met
System.out.println(holder);
}
} catch (Exception e) {
e.printStackTrace();
}
}
// Below method will return list of machines given the name of datacenter
private static List<String> getMachinesInEachDatacenter(String datacenter) {
// this will return list of machines for a given datacenter
}
}
And here is my MachineMetrics class -
public class MachineMetrics {
private String machineName;
private String dataSyncs;
// getters and setters
}
Is this possible to do using ScheduledExecutorService as this is not one time process? It has to be done repeatedly
Basically for each machine if data_syncs is 0 for a period of 5 minutes continuously then I need to log that datacenter and its machines.

public class Machine{
private String dataCenter;
private String machineName;
private String hostname;
private int zeroCount = 0;
//getters setters, except for zeroCount
// constructor with datacenter,machine as args
private boolean isEligibleForLogging(String dataSyncs){
if(dataSyncs.equals("0")){
zeroCount++;
}else{
zeroCount = 0;
}
if(zeroCount > 5){
zeroCount = 0;
return true;
}
return false;
}
}
static List<Machine> machines = new ArrayList<Machine>();
static{
Machine machine1 = new Machine("DC1", "name1","hostname1");
machines.add(machine1);
//repeat the above two lines per each machine.
}
protected static void generalUtility() {
try {
for (Machine machine : machines) {
String host_name = machine.getHostName();
String url = "http://" + host_name + ":8080/textbeat";
String dataSyncs = //execute url and get datasyncs
if(machine.isEligibleForLogging()){
System.out.println(machine.getName() + ... +machine.getDataCenter() + ... + dataSyncs......);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}

Related

How to use parallel processing in most efficient and elegant way in java

I have different sources of data from which I want to request in parallel (since each of this request is an http call and may be pretty time consuming). But I'm going to use only 1 response from these requests. So I kind of prioritize them. If the first response is invalid I'm going to check the second one. If it's also invalid I want to use the third, etc.
But I want to stop processing and return the result as soon as I receive the first correct response.
To simulate the problem I created the following code, where I'm trying to use java parallel streaming. But the problem is that I receive final results only after processing all requests.
public class ParallelExecution {
private static Supplier<Optional<Integer>> testMethod(String strInt) {
return () -> {
Optional<Integer> result = Optional.empty();
try {
result = Optional.of(Integer.valueOf(strInt));
System.out.printf("converted string %s to int %d\n",
strInt,
result.orElse(null));
} catch (NumberFormatException ex) {
System.out.printf("CANNOT CONVERT %s to int\n", strInt);
}
try {
int randomValue = result.orElse(10000);
TimeUnit.MILLISECONDS.sleep(randomValue);
System.out.printf("converted string %s to int %d in %d milliseconds\n",
strInt,
result.orElse(null), randomValue);
} catch (InterruptedException e) {
e.printStackTrace();
}
return result;
};
}
public static void main(String[] args) {
Instant start = Instant.now();
System.out.println("Starting program: " + start.toString());
List<Supplier<Optional<Integer>>> listOfFunctions = new ArrayList();
for (String arg: args) {
listOfFunctions.add(testMethod(arg));
}
Integer value = listOfFunctions.parallelStream()
.map(function -> function.get())
.filter(optValue -> optValue.isPresent()).map(val-> {
System.out.println("************** VAL: " + val);
return val;
}).findFirst().orElse(null).get();
Instant end = Instant.now();
Long diff = end.toEpochMilli() - start.toEpochMilli();
System.out.println("final value:" + value + ", worked during " + diff + "ms");
}
}
So when I execute the program using the following command:
$java ParallelExecution dfafj 34 1341 4656 dfad 245df 5767
I want to get the result "34" as soon as possible (around after 34 milliseconds) but in fact, I'm waiting for more than 10 seconds.
Could you help to find the most efficient solution for this problem?
ExecutorService#invokeAny looks like a good option.
List<Callable<Optional<Integer>>> tasks = listOfFunctions
.stream()
.<Callable<Optional<Integer>>>map(f -> f::get)
.collect(Collectors.toList());
ExecutorService service = Executors.newCachedThreadPool();
Optional<Integer> value = service.invokeAny(tasks);
service.shutdown();
I converted your List<Supplier<Optional<Integer>>> into a List<Callable<Optional<Integer>>> to be able to pass it in invokeAny. You may build Callables initially. Then, I created an ExecutorService and submitted the tasks.
The result of the first successfully executed task will be returned as soon as that result is returned from a task. Other tasks will end up interrupted.
You also may want to look into CompletionService.
List<Callable<Optional<Integer>>> tasks = Arrays
.stream(args)
.<Callable<Optional<Integer>>>map(arg -> () -> testMethod(arg).get())
.collect(Collectors.toList());
final ExecutorService underlyingService = Executors.newCachedThreadPool();
final ExecutorCompletionService<Optional<Integer>> service = new ExecutorCompletionService<>(underlyingService);
tasks.forEach(service::submit);
Optional<Integer> value = service.take().get();
underlyingService.shutdownNow();
You can use a queue to put your results in:
private static void testMethod(String strInt, BlockingQueue<Integer> queue) {
// your code, but instead of returning anything:
result.ifPresent(queue::add);
}
and then call it with
for (String s : args) {
CompletableFuture.runAsync(() -> testMethod(s, queue));
}
Integer result = queue.take();
Note that this will only handle the first result, as in your sample.
I have tried it using competableFutures and anyOf method. It will return when any one of the future is completed. Now, key to stop other tasks is to provide your own executor service to the completableFuture(s) and shutting it down when required.
public static void main(String[] args) {
Instant start = Instant.now();
System.out.println("Starting program: " + start.toString());
CompletableFuture<Optional<Integer>> completableFutures[] = new CompletableFuture[args.length];
ExecutorService es = Executors.newFixedThreadPool(args.length,r -> {
Thread t = new Thread(r);
t.setDaemon(false);
return t;
});
for (int i = 0;i < args.length; i++) {
completableFutures[i] = CompletableFuture.supplyAsync(testMethod(args[i]),es);
}
CompletableFuture.anyOf(completableFutures).
thenAccept(res-> {
System.out.println("Result - " + res + ", Time Taken : " + (Instant.now().toEpochMilli()-start.toEpochMilli()));
es.shutdownNow();
});
}
PS :It will throw interrupted exceptions that you can ignore in try catch block and not print the stack trace.Also, your thread pool size ideally should be same as length of args array.

Multiple Threads in a thread pool writing data in same List

I'm having multiple threads running in my threadPool Each thread reads a huge file and returns the data from this file in a List.
Code looks like :
class Writer{
ArrayList finalListWhereDataWillBeWritten = new Array<Integer>()
for(query q : allQueries){ //all the read queries to read file
threadPool.submit(new GetDataFromFile(fileName,filePath));
}//all the read queries have been submitted.
}
Now I know that following section of code will occur some where in my code but I don't know where to place it.
Because if I place it just after submit() in for loop it'll not add it because each file is very huge and may not have completed its processing.
synchronized(finalListWhereDataWillBeWritten){
//process the data obtained from single file and add it to target list
finalListWhereDataWillBeWritten.addAll(dataFromSingleThread);
}
So can anyone please tell me that where do I place this chunk of code and what other things I need to make sure of so that Critical Section Problem donot occur.
class GetDataFromFile implements Runnable<List<Integer>>{
private String fileName;
private String filePath;
public List<Integer> run(){
//code for streaming the file fileName
return dataObtainedFromThisFile;
}
}
And do i need to use wait() / notifyAll() methods in my code given that I'm only reading data from files parallely in threads and placing them in a shared List
Instead of reinventing the wheel you should simply implement Callable<List<Integer>> and submit it to the JDK's standard Executor Service. Then, as the futures complete, you collect the results into the list.
final ExecutorService threadPool =
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
final List<Future<List<Integer>>> futures = new ArrayList<>();
for(query q : allQueries) {
futures.add(threadPool.submit(new GetDataFromFile(fileName, filePath)));
}
for (Future<List<Integer>> f : futures) {
finalListWhereDataWillBeWritten.addAll(f.get());
}
And this is all assuming you are below Java 8. With Java 8 you would of course use a parallel stream:
final List<Integer> finalListWhereDataWillBeWritten =
allQueries.parallelStream()
.flatMap(q -> getDataFromFile(q.fileName, q.filePath))
.collect(toList());
UPDATE Please consider the answer provided by Marko which is far better
If you want to ensure that your threads all complete before you work on your list, do the following:
import java.util.List;
import java.util.Vector;
public class ThreadWork {
public static void main(String[] args) {
int count = 5;
Thread[] threads = new ListThread[count];
List<String> masterList = new Vector<String>();
for(int index = 0; index < count; index++) {
threads[index] = new ListThread(masterList, "Thread " + (index + 1));
threads[index].start();
}
while(isOperationRunning(threads)) {
// do nothing
}
System.out.println("Done!! Print Your List ...");
for(String item : masterList){
System.out.println("[" + item + "]");
}
}
private static boolean isOperationRunning(Thread[] threads) {
boolean running = false;
for(Thread thread : threads) {
if(thread.isAlive()) {
running = true;
break;
}
}
return running;
}
}
class ListThread extends Thread {
private static String items[] = { "A", "B", "C", "D"};
private List<String> list;
private String name;
public ListThread(List<String> masterList, String threadName) {
list = masterList;
name = threadName;
}
public void run() {
for(int i = 0; i < items.length;++i) {
randomWait();
String data = "Thread [" + name + "][" + items[i] + "]";
System.out.println( data );
list.add( data );
}
}
private void randomWait() {
try {
Thread.currentThread();
Thread.sleep((long)(3000 * Math.random()));
}
catch (InterruptedException x) {}
}
}

How to measure the time Spark needs to run an action on partitioned RDD?

I wrote a small Spark application which should measure the time that Spark needs to run an action on a partitioned RDD (combineByKey function to sum a value).
My problem is, that the first iteration seems to work correctly (calculated duration ~25 ms), but the next ones take much less time (~5 ms). It seems to me, that Spark persists the data without any request to do so!? Can I avoid that programmatically?
I have to know the duration that Spark needs to calculate a new RDD (without any caching / persisting of earlier iterations) --> I think the duration should always be about 20-25 ms!
To ensure the recalculation I moved the SparkContext generation into the for-loops, but this didn't bring any changes...
Thanks for your advices!
Here my code which seems to persist any data:
public static void main(String[] args) {
switchOffLogging();
// jetzt
try {
// Setup: Read out parameters & initialize SparkContext
String path = args[0];
SparkConf conf = new SparkConf(true);
JavaSparkContext sc;
// Create output file & writer
System.out.println("\npar.\tCount\tinput.p\tcons.p\tTime");
// The RDDs used for the benchmark
JavaRDD<String> input = null;
JavaPairRDD<Integer, String> pairRDD = null;
JavaPairRDD<Integer, String> partitionedRDD = null;
JavaPairRDD<Integer, Float> consumptionRDD = null;
// Do the tasks iterative (10 times the same benchmark for testing)
for (int i = 0; i < 10; i++) {
boolean partitioning = true;
int partitionsCount = 8;
sc = new JavaSparkContext(conf);
setS3credentials(sc, path);
input = sc.textFile(path);
pairRDD = mapToPair(input);
partitionedRDD = partition(pairRDD, partitioning, partitionsCount);
// Measure the duration
long duration = System.currentTimeMillis();
// Do the relevant function
consumptionRDD = partitionedRDD.combineByKey(createCombiner, mergeValue, mergeCombiners);
duration = System.currentTimeMillis() - duration;
// So some action to invoke the calculation
System.out.println(consumptionRDD.collect().size());
// Print the results
System.out.println("\n" + partitioning + "\t" + partitionsCount + "\t" + input.partitions().size() + "\t" + consumptionRDD.partitions().size() + "\t" + duration + " ms");
input = null;
pairRDD = null;
partitionedRDD = null;
consumptionRDD = null;
sc.close();
sc.stop();
}
} catch (Exception e) {
e.printStackTrace();
System.out.println(e.getMessage());
}
}
Some helper functions (should not be the problem):
private static void switchOffLogging() {
Logger.getLogger("org").setLevel(Level.OFF);
Logger.getLogger("akka").setLevel(Level.OFF);
}
private static void setS3credentials(JavaSparkContext sc, String path) {
if (path.startsWith("s3n://")) {
Configuration hadoopConf = sc.hadoopConfiguration();
hadoopConf.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem");
hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem");
hadoopConf.set("fs.s3n.awsAccessKeyId", "mycredentials");
hadoopConf.set("fs.s3n.awsSecretAccessKey", "mycredentials");
}
}
// Initial element
private static Function<String, Float> createCombiner = new Function<String, Float>() {
public Float call(String dataSet) throws Exception {
String[] data = dataSet.split(",");
float value = Float.valueOf(data[2]);
return value;
}
};
// merging function for a new dataset
private static Function2<Float, String, Float> mergeValue = new Function2<Float, String, Float>() {
public Float call(Float sumYet, String dataSet) throws Exception {
String[] data = dataSet.split(",");
float value = Float.valueOf(data[2]);
sumYet += value;
return sumYet;
}
};
// function to sum the consumption
private static Function2<Float, Float, Float> mergeCombiners = new Function2<Float, Float, Float>() {
public Float call(Float a, Float b) throws Exception {
a += b;
return a;
}
};
private static JavaPairRDD<Integer, String> partition(JavaPairRDD<Integer, String> pairRDD, boolean partitioning, int partitionsCount) {
if (partitioning) {
return pairRDD.partitionBy(new HashPartitioner(partitionsCount));
} else {
return pairRDD;
}
}
private static JavaPairRDD<Integer, String> mapToPair(JavaRDD<String> input) {
return input.mapToPair(new PairFunction<String, Integer, String>() {
public Tuple2<Integer, String> call(String debsDataSet) throws Exception {
String[] data = debsDataSet.split(",");
int houseId = Integer.valueOf(data[6]);
return new Tuple2<Integer, String>(houseId, debsDataSet);
}
});
}
And finally the output of the Spark console:
part. Count input.p cons.p Time
true 8 6 8 20 ms
true 8 6 8 23 ms
true 8 6 8 7 ms // Too less!!!
true 8 6 8 21 ms
true 8 6 8 13 ms
true 8 6 8 6 ms // Too less!!!
true 8 6 8 5 ms // Too less!!!
true 8 6 8 6 ms // Too less!!!
true 8 6 8 4 ms // Too less!!!
true 8 6 8 7 ms // Too less!!!
I found a solution for me now: I wrote a separate class which calls the spark-submit command on a new process. This can be done in a loop, so every benchmark is started in a new thread and sparkContext is also separated per process. So garbage collection is done and everything works fine!
String submitCommand = "/root/spark/bin/spark-submit " + submitParams + " -- class partitioning.PartitionExample /root/partitioning.jar " + javaFlags;
Process p = Runtime.getRuntime().exec(submitCommand);
BufferedReader reader;
String line;
System.out.println(p.waitFor());
reader = new BufferedReader(new InputStreamReader(p.getInputStream()));
while ((line = reader.readLine())!= null) {
System.out.println(line);
}
If the shuffle output is small enough, then the Spark shuffle files will write to the OS buffer cache as fsync is not explicitly called...this means that, as long as there is room, your data will remain in memory.
If a cold performance test is truly necessary then you can try something like this attempt to flush the disk, but that is going to slow down the in-between each test. Could you just spin the context up and down? That might solve your need.

How to avoid using dynamic variables / a billion if statements in Java?

So, since dynamic variables aren't a thing in Java, and if statements will be horribly unwieldy, was looking for help converting this code block into a more concise one.
I looked into hashmaps, and they just didn't seem quite right, it's highly likely I was misunderstanding them though.
public String m1 = "Name1";
public String m1ip = "192.1.1.1";
public String m2 = "Name2";
public String m2ip = "192.1.1.1";
public String req;
public String reqip;
... snip some code...
if (requestedMachine == 1)
{ req = m1; reqip = m1ip;}
else if (requestedMachine == 2)
{ req = m2; reqip = m2ip;}
else if (requestedMachine == 3)
{ req = m3; reqip = m3ip;}
else if (requestedMachine == 4)
{ req = m4; reqip = m4ip;}
else if (requestedMachine == 5)
{ req = m5; reqip = m5ip;}
requestedMachine is going to be an integer, that defines which values should be assigned to req & reqip.
Thanks in advance.
Define a Machine class, containing a name and an ip field. Create an array of Machine. Access the machine located at the index requestedMachine (or requestedMachine - 1 if the number starts at 1):
Machine[] machines = new Machine[] {
new Machine("Name1", "192.1.1.1"),
new Machine("Name2", "192.1.1.1"),
...
}
...
Machine machine = machines[requestedMachine - 1];
First, create a Machine class:
class Machine {
String name;
String ip;
//Constructor, getters, setters etc omitted
}
Initialize an array of Machines:
Machine[] machines = ... //initialize them with values
Get the machine corresponding to requestedMachine:
Machine myMachine = machines[requestedMachine];
This is a great candidate for an enum:
/**
<P>{#code java EnumDeltaXmpl}</P>
**/
public class EnumDeltaXmpl {
public static final void main(String[] ingo_red) {
test(MachineAction.ONE);
test(MachineAction.TWO);
test(MachineAction.THREE);
test(MachineAction.FOUR);
}
private static final void test(MachineAction m_a) {
System.out.println("MachineAction." + m_a + ": name=" + m_a.sName + ", ip=" + m_a.sIP + "");
}
}
enum MachineAction {
ONE("Name1", "192.1.1.1"),
TWO("Name2", "292.2.2.2"),
THREE("Name3", "392.3.3.3"),
FOUR("Name4", "492.4.4.4"),
FIVE("Name5", "592.5.5.5");
public final String sName;
public final String sIP;
private MachineAction(String s_name, String s_ip) {
sName = s_name;
sIP = s_ip;
}
}
Output:
[C:\java_code\]java EnumDeltaXmpl
MachineAction.ONE: name=Name1, ip=192.1.1.1
MachineAction.TWO: name=Name2, ip=292.2.2.2
MachineAction.THREE: name=Name3, ip=392.3.3.3
MachineAction.FOUR: name=Name4, ip=492.4.4.4
Thee best choice you have is to build an array of machines with IP, Name etc..then you only need to find the machine required into the array.
public class Machine(){
private String name, ip;
public Machine(String name, String ip){
this.name=name;
// You can check a valid ip
this.ip=ip;
}}
public class Machines(){
private Machine[] machines;
private int number_of_machines;
public Machines(){
//define number_of_machines for your array and length of itself
}}
main()
Machine[] Machines = new Machine[number_of_machines];
Machine m1 = new Machine(String name, String ip);
.
.
.
Machine mn = new Machine(String name, String ip);
int number=5;
for(int i=0; i<number_of_machines; i++){
if (machines[number]<number_of_machines){
System.out.println("There is no machine with that number");
}else if (machines[number]==number_of_machines-1){
System.out.println("That is the choosen machine");
}
}
}
If your id values are not necessarily integers or if they are not a continuous sequence from 0 forward, you could also use a HashMap. Something like
HashMap<Integer, Machine> machines = new HashMap<>();
machines.put(1, machine1);
machines.put(7, machine7);
...
to get the desired value
Machine machine7 = machines.get(7);
You can replace the key with a String or whatever you like if needed. Your id values also do not need to go 0,1,2,3,4,5, ... as they need to if you are going with an array.

Start Multiple thread at the same time

I have a list of different URLs (about 10) from which I need content. I have made a program with which I am getting the content of 1 URL but I am unable to do it with multiple URLs.
I've studied lots of tutorials on threads in Java but I'm unable to find an answer.
In my case, URLs are like www.example1.com, www.example2.com, www.example3.com, www.example4.com.
I want to make thread for each URL and run it at the same time.
public class HtmlParser {
public static int searchedPageCount = 0,
skippedPageCount = 0,
productCount = 0;
public static void main(String[] args) {
List<String> URLs = new LinkedList<String>();
long t1 = System.currentTimeMillis();
URLs.add("www.example.com");
int i = 0;
for (ListIterator iterator = URLs.listIterator(); i < URLs.size();) {
i++;
System.out.println("While loop");
List<String> nextLevelURLs = processURL(URLs.get(iterator
.nextIndex()));
for (String URL : nextLevelURLs) {
if (!URLs.contains(URL)) {
System.out.println(URL);
iterator.add(new String(URL));
}
}
System.out.println(URLs.size());
}
System.out.println("Total products found: " + productCount);
System.out.println("Total searched page: " + searchedPageCount);
System.out.println("Total skipped page: " + skippedPageCount);
long t2 = System.currentTimeMillis();
System.out.println("Total time taken: " + (t2 - t1) / 60000);
}
public static List<String> processURL(String URL) {
List<String> nextLevelURLs = new ArrayList<String>();
try {
searchedPageCount++;
// System.out.println("Current URL: " + URL);
Elements products = Jsoup.connect(URL).timeout(60000).get()
.select("div.product");
for (Element product : products) {
System.out.println(product.select(" a > h2").text());
System.out.println(product.select(" a > h3").text());
System.out.println(product.select(".product > a").attr("href"));
System.out
.println(product.select(".image a > img").attr("src"));
System.out.println(product.select(".price").text());
System.out.println();
productCount++;
}
// System.out.println("Total products found until now: " +
// productCount);
Elements links = Jsoup.connect(URL).timeout(60000).get()
.select("a[href]");
for (Element link : links) {
URL = link.attr("href");
if (URL.startsWith("http://www.example.com/")) {
// System.out.println("URLs added.");
nextLevelURLs.add(URL);
} else {
skippedPageCount++;
// System.out.println("URL skipped: " + URL);
}
}
} catch (Exception e) {
e.printStackTrace();
}
return nextLevelURLs;
}
}
Unfortunately, there is no way to start two threads at the same time.
Let me explain better: first of all, the sequence thread1.Start(); and thread2.Start(); is executed with thread1 first and, after that, thread2. It means only that thread thread1 is scheduled before thread 2, not actually started. The two methods take fractions of second each one, so the fact that they are in sequence cannot be seen by a human observer.
More, Java threads are scheduled, ie. assigned to be eventually executed. Even if you have a multi-core CPU, you are not sure that 1) the threads run in parallel (other system processes may interfere) and 2) the threads both start just after the Start() method is called.
but you can run multiple threads in this way..
new Thread(thread1).start();
new Thread(thread2).start();
basically create a class that implements Runnable, put the code that deals with one url in this code. In your main class for each URL, construct a class with the information that is needs (E.g. URL) and then run run
Plenty of sites that teach how to do multi-threaded java
First of all, the code you pasted looks like bad because it is orienting a simple process. You need to turn it into OO form and then extends the Thread (or Runnable) such like:
public class URLProcessor extends Thread {
private String url;
public URLProcessor(String url) {
this.url = url;
}
#Override
public void run() {
//your business logic to parse the site with "this.url" here
}
}
And then use the main entrance to load multiple ones by using:
public static void main(String[] args) {
List<String> allmyurls = null;//get multiple urls from somewhere
for (String url : allmyurls) {
URLProcessor p = new URLProcessor(url);
p.start();
}
}

Categories