How to prevent hadoop stream from closing? - java

I built a basic web parser that uses hadoop to hand of urls to multiple threads. This works pretty well until I reach the end of my input file, Hadoop declares itself done while there are still threads running. This results in the error org.apache.hadoop.fs.FSError: java.io.IOException: Stream Closed. Is there anyway to keep the stream open long enough for the threads to finish up? (I can with reasonable accuracy predict the maximum amount of time the thread will spend on a single url).
Heres how I execute the threads
public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, Text> {
private Text word = new Text();
private URLPile pile = new URLPile();
private MSLiteThread[] Threads = new MSLiteThread[16];
private boolean once = true;
#Override
public void map(LongWritable key, Text value,
OutputCollector<Text, Text> output, Reporter reporter) {
String url = value.toString();
StringTokenizer urls = new StringTokenizer(url);
Config.LoggerProvider = LoggerProvider.DISABLED;
System.out.println("In Mapper");
if (once) {
for (MSLiteThread thread : Threads) {
System.out.println("created thread");
thread = new MSLiteThread(pile);
thread.start();
}
once = false;
}
while (urls.hasMoreTokens()) {
try {
word.set(urls.nextToken());
String currenturl = word.toString();
pile.addUrl(currenturl, output);
} catch (Exception e) {
e.printStackTrace();
continue;
}
}
}
The threads themselves get the urls like this
public void run(){
try {
sleep(3000);
while(!done()){
try {
System.out.println("in thread");
MSLiteURL tempURL = pile.getNextURL();
String currenturl = tempURL.getURL();
urlParser.parse(currenturl);
urlText.set("");
titleText.set(currenturl+urlParser.export());
System.out.println(urlText.toString()+titleText.toString());
tempURL.getOutput().collect(urlText, titleText);
pile.doneParsing();
sleep(30);
} catch (Exception e) {
pile.doneParsing();
e.printStackTrace();
continue;
}
}
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Thread done");
}
And the relevant methods in urlpile are
public synchronized void addUrl(String url,OutputCollector<Text, Text> output) throws InterruptedException {
while(queue.size()>16){
System.out.println("queue full");
wait();
}
finishedParcing--;
queue.add(new MSLiteURL(output,url));
notifyAll();
}
private Queue<MSLiteURL> queue = new LinkedList<MSLiteURL>();
private int sent = 0;
private int finishedParcing = 0;
public synchronized MSLiteURL getNextURL() throws InterruptedException {
notifyAll();
sent++;
//System.out.println(queue.peek());
return queue.remove();
}

As I can infer from the comments below, you can probably do this in each of the map() function to make things easy.
I saw you do the following, to pre-create some idle threads.
You can move the following code to
if (once) {
for (MSLiteThread thread : Threads) {
System.out.println("created thread");
thread = new MSLiteThread(pile);
thread.start();
}
once = false;
}
to,
public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, Text> {
#Override
public void configure(JobConf job) {
for (MSLiteThread thread : Threads) {
System.out.println("created thread");
thread = new MSLiteThread(pile);
thread.start();
}
}
#Override
public void map(LongWritable key, Text value,
OutputCollector<Text, Text> output, Reporter reporter) {
}
}
So, that this could get initialized once and for that matter, don't need the 'once' condition check anymore.
Moreover, you don't need to do make idle threads as above.
I don't know how much performance gain you'll get creating 16 idle threads as such.
Anyways, here is a solution (may not be perfect though)
You can use something like a countdownlatch Read more here to process your urls in batches of N or more and block off until they are done. This is because, if you release each incoming url record to a thread, the next url will be fetched immediately and chances are that when you are processing the last url the same way, the map() function will return even if you have threads remaining in the queue to process. You'll inevitably get the exception you mentioned.
Here in an example of how probably you can block off using a countdownlatch.
public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, Text> {
#Override
public void map(LongWritable key, Text value,
OutputCollector<Text, Text> output, Reporter reporter) {
String url = value.toString();
StringTokenizer urls = new StringTokenizer(url);
Config.LoggerProvider = LoggerProvider.DISABLED;
//setting countdownlatch to urls.countTokens() to block off that many threads.
final CountDownLatch latch = new CountDownLatch(urls.countTokens());
while (urls.hasMoreTokens()) {
try {
word.set(urls.nextToken());
String currenturl = word.toString();
//create thread and fire for current URL here
thread = new URLProcessingThread(currentURL, latch);
thread.start();
} catch (Exception e) {
e.printStackTrace();
continue;
}
}
latch.await();//wait for 16 threads to complete execution
//sleep here for sometime if you wish
}
}
Finally, in URLProcessingThread as soon as a URL is processed decrease the latch counter,
public class URLProcessingThread implments Runnable {
CountDownLatch latch;
URL url;
public URLProcessingThread(URL url, CountDownLatch latch){
this.latch = latch;
this.url = url;
}
void run() {
//process url here
//after everything finishes decrement the latch
latch.countDown();//reduce count of CountDownLatch by 1
}
}
Probably problems seen with your code:
At pile.addUrl(currenturl, output);, when you add a new url, in the meantime all the 16 threads will get the update (I'm not very sure), because the same pile object is passed to the 16 threads. There is a chance that your urls get re-processed or you can probably get some other side effects (I'm not very sure about that).
Other suggestion:
Additionally you may want to increase map task timeout using
mapred.task.timeout
(default=600000ms) = 10mins
Description: The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates
its status string.
You can add/override this property in mapred-site.xml

Related

Synchronizing threads to send and receive data from arduino

I'm creating a method to send and receive data from arduino . After I call this method in a thread. I use ThreadPoolExecutor to execute this thread 3 times.Because I have 3 JtestField and I need in every one show a received data. when I execute them only one which ise shown. I was trying to use differe t methods to synchronize these threads but no result. Thank you for your help.
This is the method:
synchronized void sendReceive(String txt,JTextField txt_){
PrintWriter output = new PrintWriter(OpenPort.getPort().getOutputStream());
cr=txt;
output.print(cr);
output.flush();
OpenPort.getPort().addDataListener(new SerialPortDataListener() {
#Override
public int getListeningEvents() { return SerialPort.LISTENING_EVENT_DATA_AVAILABLE; }
#Override
public void serialEvent(SerialPortEvent event)
{
Scanner data=new Scanner(OpenPort.getPort().getInputStream());
while(data.hasNextLine()) {
String msg = null;
Object item=null;
try{msg= data.nextLine();}catch(Exception e){}
if(msg != null) {
txt_.setText(msg);
list.add(msg);
System.out.println(list);
System.out.println(item);
}
msg = null;
}
}
});
}
The thread in which I call it:
class ThreadDemo extends Thread {
String cr=null;
JTextField txt_;
String txt;
private Thread t;
ThreadDemo(String txt,JTextField txt_) {
this.txt = txt;
this.txt_ = txt_;
}
#Override
public void run() {
sendReceive(txt,txt_);
try {Thread.sleep(1000000);} catch (InterruptedException ie) {}
}
}
and after I use the ThreadPoolExecutor to execte threads in the constructor of my first class:
//ip
executor.execute(new ThreadDemo("ee_get:100",SCAN.ipAddress_txt));
//port
executor.execute(new ThreadDemo("ee_get:160",SCAN.port_txt));
//intervale
executor.execute(new ThreadDemo("ee_get:60",SCAN.intervale_txt));
executor.shutdown();
please any help I was spending a lot of time to resolve this problem and no result.
Normally I must send 3 commands to arduino and receive 3 times data and every time this data must be shown in a JTextField. However I see one received data in the JTextfield

OutOfMemoryError - No trace in the console

I call the below testMethod, after putting it into a Callable(with other few Callable tasks), from an ExecutorService. I suspect that, the map.put() suffers OutOfMemoryError, as I'm trying to put some 20 million entries.
But, I'm not able to see the error trace in console. Just the thread stops still. I tried to catch the Error ( I know.. we shouldnt, but for debug I caught). But, the error is not caught. Directly enters finally and stops executing.. and the thread stands still.
private HashMap<String, Integer> testMethod(
String file ) {
try {
in = new FileInputStream(new File(file));
br = new BufferedReader(new InputStreamReader(in), 102400);
for (String line; (line= br.readLine()) != null;) {
map.put(line.substring(1,17),
Integer.parseInt(line.substring(18,20)));
}
System.out.println("Loop End"); // Not executed
} catch(Error e){
e.printStackTrace(); //Not executed
}finally {
System.out.println(map.size()); //Executed
br.close();
in.close();
}
return map;
}
Wt could be the mistake, I'm doing?
EDIT: This is how I execute the Thread.
Callable<Void> callable1 = new Callable<Void>() {
#Override
public Void call() throws Exception {
testMethod(inputFile);
return null;
}
};
Callable<Void> callable2 = new Callable<Void>() {
#Override
public Void call() throws Exception {
testMethod1();
return null;
}
};
List<Callable<Void>> taskList = new ArrayList<Callable<Void>>();
taskList.add(callable1);
taskList.add(callable2);
// create a pool executor with 3 threads
ExecutorService executor = Executors.newFixedThreadPool(3);
List<Future<Void>> future = executor.invokeAll(taskList);
//executor.invokeAll(taskList);
latch.await();
future.get(0);future.get(1); //Added this as per SubOptimal'sComment
But, this future.get() didn't show OOME in console.
You should not throw away the future after submitting the Callable.
Future future = pool.submit(callable);
future.get(); // this would show you the OOME
example based on the informations of the requestor to demonstrate
public static void main(String[] args) throws InterruptedException, ExecutionException {
Callable<Void> callableOOME = new Callable<Void>() {
#Override
public Void call() throws Exception {
System.out.println("callableOOME");
HashMap<String, Integer> map = new HashMap<>();
// some code to force an OOME
try {
for (int i = 0; i < 10_000_000; i++) {
map.put(Integer.toString(i), i);
}
} catch (Error e) {
e.printStackTrace();
} finally {
System.out.println("callableOOME: map size " + map.size());
}
return null;
}
};
Callable<Void> callableNormal = new Callable<Void>() {
#Override
public Void call() throws Exception {
System.out.println("callableNormal");
// some code to have a short "processing time"
try {
TimeUnit.SECONDS.sleep(5);
} catch (InterruptedException ex) {
System.err.println(ex.getMessage());
}
return null;
}
};
List<Callable<Void>> taskList = new ArrayList<>();
taskList.add(callableOOME);
taskList.add(callableNormal);
ExecutorService executor = Executors.newFixedThreadPool(3);
List<Future<Void>> future = executor.invokeAll(taskList);
System.out.println("get future 0: ");
future.get(0).get();
System.out.println("get future 1: ");
future.get(1).get();
}
Try catching Throwable as it could be an Exception like IOException or NullPointerException, Throwable captures everything except System.exit();
Another possibility is that the thread doesn't die, instead it becomes increasingly slower and slower due to almost running out of memory but never giving up. You should be able to see this with a stack dump or using jvisualvm while it is running.
BTW Unless all you strings are exactly 16 characters long, you might like to call trim() on the to remove any padding in the String. This could make them shorter and use less memory.
I assume you are using a recent version of Java 7 or 8. If you are using Java 6 or older, it will use more memory as .substring() doesn't create a new underlying char[] to save CPU, but in this case wastes memory.

Text is not getting printed once the Threads are done [duplicate]

This question already has answers here:
How to wait for all threads to finish, using ExecutorService?
(27 answers)
Closed 8 years ago.
Please have a look at the following code.
public class BigFileWholeProcessor {
private static final int NUMBER_OF_THREADS = 2;
public void processFile(String fileName) {
BlockingQueue<String> fileContent = new LinkedBlockingQueue<String>();
BigFileReader bigFileReader = new BigFileReader(fileName, fileContent);
BigFileProcessor bigFileProcessor = new BigFileProcessor(fileContent);
ExecutorService es = Executors.newFixedThreadPool(NUMBER_OF_THREADS);
es.execute(bigFileReader);
es.execute(bigFileProcessor);
es.shutdown();
if(es.isTerminated())
{
System.out.println("Completed Work");
}
}
}
public class BigFileReader implements Runnable {
private final String fileName;
int a = 0;
public static final String SENTINEL = "SENTINEL";
private final BlockingQueue<String> linesRead;
public BigFileReader(String fileName, BlockingQueue<String> linesRead) {
this.fileName = fileName;
this.linesRead = linesRead;
}
#Override
public void run() {
try {
//since it is a sample, I avoid the manage of how many lines you have read
//and that stuff, but it should not be complicated to accomplish
BufferedReader br = new BufferedReader(new FileReader(new File("E:/Amazon HashFile/Hash.txt")));
String str = "";
while((str=br.readLine())!=null)
{
linesRead.put(str);
System.out.println(a);
a++;
}
linesRead.put(SENTINEL);
} catch (Exception ex) {
ex.printStackTrace();
}
System.out.println("Completed");
}
}
public class BigFileProcessor implements Runnable {
private final BlockingQueue<String> linesToProcess;
public BigFileProcessor (BlockingQueue<String> linesToProcess) {
this.linesToProcess = linesToProcess;
}
#Override
public void run() {
String line = "";
try {
while ( (line = linesToProcess.take()) != null) {
//do what you want/need to process this line...
if(line==BigFileReader.SENTINEL)
{
break;
}
String [] pieces = line.split("(...)/g");
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
I want to print the text "completed work" in BigFileWholeProcessor once all the thread work is done. But instead, it is not getting printed. Why is this? How to identify that all the threads are done and need printing?
shutdown() only signal ES to shutdown, you need
awaitTermination(long timeout, TimeUnit unit)
before print message
Use submit() method instead of execute(). The get() method can be used if you want to wait for the thread to finish at any point of time. Read documentation on use of Future object for further details.
ExecutorService es = Executors.newFixedThreadPool(2);
Future<?> f = es.submit(new Thread(new TestRun()));
f.get(); // Wait for result... (i.e similar to `join()` in this case)
es.shutdown(); // Shutdown ExecutorService
System.out.println("Done.");
I have defined a TestRun class implementing Runnable, not shown here. The Future object makes more sense in other scenarios.

Java, Multi Threading, ExecutorService

Following is some parts of my code, which uses Threading. The purpose is to retrieve all the records from database (approx. 5,00,000) and send them alert email messages. The problem I am facing is the variable emailRecords becomes very heavy and too much time is taken to send email message. How can I make it fast by using multi-threading such that 5,00,000 records are processed parallelly? I tried to use ExecutorService but got confused in implementing it. I got mixed up in the method checkName(), getRecords() and sendAlert(). All these 3 methods are used relevantly. So, where to use executorService ??
Please provide me the suggestion how to proceed with the following code and which part needs editing? Thanks in advance!!
public class sampledaemon implements Runnable {
private static List<String[]> emailRecords = new ArrayList<String[]>();
public static void main(String[] args) {
if (args.length != 1) {
return;
}
countryName = args[0];
try {
Thread t = null;
sampledaemon daemon = new sampledaemon();
t = new Thread(daemon);
t.start();
} catch (Exception e) {
e.printStackTrace()
}
}
public void run() {
Thread thisThread = Thread.currentThread();
try {
while (true) {
checkName(countryName);
Thread.sleep(TimeUnit.SECONDS.toMillis(10));
}
} catch (Exception e) {
e.printStackTrace();
}
}
public void checkName(String countryName) throws Exception {
Country country = CountryPojo.getDetails(countryName)
if (country != null) {
getRecords(countryconnection);
}
}
private void getRecords(Country country, Connection con) {
String users[] = null;
while (rs.next()) {
users = new String[2];
users[0] = rs.getString("userid");
users[1] = rs.getString("emailAddress");
emailRecords.add(props);
if (emailRecords.size() > 0) {
sendAlert(date, con);
}
}
}
void sendAlert(String date, Connection con) {
for (int k = 0; k < emailRecords.size(); k++) {
//check the emailRecords and send email
}
}
}
From what i can tell is that you would most likely be single threaded data retrieval, and multi-threaded for the e-mail sending. Roughly, you'd be cycling through your result set and building a list of records. When that list hits a certain size, you make a copy and send off that copy to be processed in a thread, and clear the original list. At the end of the result set, check to see if you have unprocessed records in your list, and send that to the pool as well.
Finally, wait for the threadpool to finish processing all records.
Something along these lines:
protected void processRecords(String countryName) {
ThreadPoolExecutor executor = new ThreadPoolExecutor(10, 10, 10, TimeUnit.SECONDS,
new ArrayBlockingQueue<Runnable>(5), new ThreadPoolExecutor.CallerRunsPolicy());
List<String[]> emaillist = new ArrayList<String>(1000);
ResultSet rs = ....
try {
while (rs.next()) {
String user[] = new String[2];
users[0] = rs.getString("userid");
users[1] = rs.getString("emailAddress");
emaillist.add(user);
if (emaillist.size() == 1000) {
final List<String[]> elist = new ArrayList<String[]>(emaillist);
executor.execute(new Runnable() {
public void run() {
sendMail(elist);
}
}
emaillist.clear();
}
}
}
finally {
DbUtils.close(rs);
}
if (! emaillist.isEmpty()) {
final List<String[]> elist = emaillist;
executor.execute(new Runnable() {
public void run() {
sendMail(elist);
}
}
emaillist.clear();
}
// wait for all the e-mails to finish.
while (! executor.isTerminated()) {
executor.shutdown();
executor.awaitTermination(10, TimeUnit.DAYS);
}
}
The advantage of using the FixedThreadPool is that you don't have to do the expensive process of creating the threads again and again, its done at the beginning...see below..
ExecutorService executor = Executors.newFixedThreadPool(100);
ArrayList<String> arList = Here your Email addresses from DB will go in ;
for(String s : arList){
executor.execute(new EmailAlert(s));
}
public class EmailAlert implements Runnable{
String addr;
public EmailAlert(String eAddr){
this.addr = eAddr;
}
public void run(){
// Do the process of sending the email here..
}
}
Creating a second thread to do all of the work in instead of doing the same work in the main thread isn't going to help you avoid the problem of filling up the emailRecords list with 5 million records before processing any of them.
It sounds like your goal is to be able to read from the database and send email in parallel. Instead of worrying about the code, first think of an algorithm for the work you want to accomplish. Something like this:
In one thread, query for the records from the database, and for each result, add one job to an ExecutorService
That job sends email to one person/address/record.
or alternatively
Read records from the database in batches of N (50, 100, 1000, etc)
Submit each batch to the executorService

Telling a ThreadPoolExecutor when it should go ahead or not

I have to send a set of files to several computers through a certain port. The fact is that, each time that the method that sends the files is called, the destination data (address and port) is calculated. Therefore, using a loop that creates a thread for each method call, and surround the method call with a try-catch statement for a BindException to process the situation of the program trying to use a port which is already in use (different destination addresses may receive the message through the same port) telling the thread to wait some seconds and then restart to retry, and keep trying until the exception is not thrown (the shipping is successfully performed).
I didn't know why (although I could guess it when I first saw it), Netbeans warned me about that sleeping a Thread object inside a loop is not the best choice. Then I googled a bit for further information and found this link to another stackoverflow post, which looked so interesting (I had never heard of the ThreadPoolExecutor class). I've been reading both that link and the API in order to try to improve my program, but I'm not yet pretty sure about how am I supposed to apply that in my program. Could anybody give a helping hand on this please?
EDIT: The important code:
for (Iterator<String> it = ConnectionsPanel.list.getSelectedValuesList().iterator(); it.hasNext();) {
final String x = it.next();
new Thread() {
#Override
public void run() {
ConnectionsPanel.singleAddVideos(x);
}
}.start();
}
private static void singleAddVideos(String connName) {
String newVideosInfo = "";
for (Iterator<Video> it = ConnectionsPanel.videosToSend.iterator(); it.hasNext();) {
newVideosInfo = newVideosInfo.concat(it.next().toString());
}
try {
MassiveDesktopClient.sendMessage("hi", connName);
if (MassiveDesktopClient.receiveMessage(connName).matches("hello")) {
MassiveDesktopClient.sendMessage(newVideosInfo, connName);
}
} catch (BindException ex) {
MassiveDesktopClient.println("Attempted to use a port which is already being used. Waiting and retrying...", new Exception().getStackTrace()[0].getLineNumber());
try {
Thread.sleep(MassiveDesktopClient.PORT_BUSY_DELAY_SECONDS * 1000);
} catch (InterruptedException ex1) {
JOptionPane.showMessageDialog(null, ex1.toString(), "Error", JOptionPane.ERROR_MESSAGE);
}
ConnectionsPanel.singleAddVideos(connName);
return;
}
for (Iterator<Video> it = ConnectionsPanel.videosToSend.iterator(); it.hasNext();) {
try {
MassiveDesktopClient.sendFile(it.next().getAttribute("name"), connName);
} catch (BindException ex) {
MassiveDesktopClient.println("Attempted to use a port which is already being used. Waiting and retrying...", new Exception().getStackTrace()[0].getLineNumber());
try {
Thread.sleep(MassiveDesktopClient.PORT_BUSY_DELAY_SECONDS * 1000);
} catch (InterruptedException ex1) {
JOptionPane.showMessageDialog(null, ex1.toString(), "Error", JOptionPane.ERROR_MESSAGE);
}
ConnectionsPanel.singleAddVideos(connName);
return;
}
}
}
Your question is not very clear - I understand that you want to rerun your task until it succeeds (no BindException). To do that, you could:
try to run your code without catching the exception
capture the exception from the future
reschedule the task a bit later if it fails
A simplified code would be as below - add error messages and refine as needed:
public static void main(String[] args) throws Exception {
ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(corePoolSize);
final String x = "video";
Callable<Void> yourTask = new Callable<Void>() {
#Override
public Void call() throws BindException {
ConnectionsPanel.singleAddVideos(x);
return null;
}
};
Future f = scheduler.submit(yourTask);
boolean added = false; //it will retry until success
//you might use an int instead to retry
//n times only and avoid the risk of infinite loop
while (!added) {
try {
f.get();
added = true; //added set to true if no exception caught
} catch (ExecutionException e) {
if (e.getCause() instanceof BindException) {
scheduler.schedule(yourTask, 3, TimeUnit.SECONDS); //reschedule in 3 seconds
} else {
//another exception was thrown => handle it
}
}
}
}
public static class ConnectionsPanel {
private static void singleAddVideos(String connName) throws BindException {
String newVideosInfo = "";
for (Iterator<Video> it = ConnectionsPanel.videosToSend.iterator(); it.hasNext();) {
newVideosInfo = newVideosInfo.concat(it.next().toString());
}
MassiveDesktopClient.sendMessage("hi", connName);
if (MassiveDesktopClient.receiveMessage(connName).matches("hello")) {
MassiveDesktopClient.sendMessage(newVideosInfo, connName);
}
for (Iterator<Video> it = ConnectionsPanel.videosToSend.iterator(); it.hasNext();) {
MassiveDesktopClient.sendFile(it.next().getAttribute("name"), connName);
}
}
}

Categories