We have a spark streaming program which pull messages from kafka and process each individual message using forEachPartiton transformation.
If case if there is specific error in the processing function we would like to throw the exception back and halt the program. The same seems to be not happening. Below is the code we are trying to execute.
JavaInputDStream<KafkaDTO> stream = KafkaUtils.createDirectStream( ...);
stream.foreachRDD(new Function<JavaRDD<KafkaDTO>, Void>() {
public Void call(JavaRDD<KafkaDTO> rdd) throws PropertiesLoadException, Exception {
rdd.foreachPartition(new VoidFunction<Iterator<KafkaDTO>>() {
#Override
public void call(Iterator<KafkaDTO> itr) throws PropertiesLoadException, Exception {
while (itr.hasNext()) {
KafkaDTO dto = itr.next();
try{
//process the message here.
} catch (PropertiesLoadException e) {
// throw Exception if property file is not found
throw new PropertiesLoadException(" PropertiesLoadException: "+e.getMessage());
} catch (Exception e) {
throw new Exception(" Exception : "+e.getMessage());
}
}
}
});
}
}
In the above code even if we throw a PropertiesLoadException the program doesn't halt and streaming continues. The max retries we set in Spark configuration is only 4. The streaming program continues even after 4 failures. How should the exception be thrown to stop the program?
I am not sure if this is the best approach but we surrounded the main batch with try and catch and when I get exception I just call close context. In addition you need to make sure that stop gracfully is off (false).
Example code:
try {
process(dataframe);
} catch (Exception e) {
logger.error("Failed on write - will stop spark context immediately!!" + e.getMessage());
closeContext(jssc);
if (e instanceof InterruptedException) {
Thread.currentThread().interrupt();
}
throw e;
}
And close function:
private void closeContext(JavaStreamingContext jssc) {
logger.warn("stopping the context");
jssc.stop(false, jssc.sparkContext().getConf().getBoolean("spark.streaming.stopGracefullyOnShutdown", false));
logger.error("Context was stopped");
}
In config :
spark.streaming.stopGracefullyOnShutdown false
I think that with your code it should look like this:
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, streamBatch);
JavaInputDStream<KafkaDTO> stream = KafkaUtils.createDirectStream( jssc, ...);
stream.foreachRDD(new Function<JavaRDD<KafkaDTO>, Void>() {
public Void call(JavaRDD<KafkaDTO> rdd) throws PropertiesLoadException, Exception {
try {
rdd.foreachPartition(new VoidFunction<Iterator<KafkaDTO>>() {
#Override
public void call(Iterator<KafkaDTO> itr) throws PropertiesLoadException, Exception {
while (itr.hasNext()) {
KafkaDTO dto = itr.next();
try {
//process the message here.
} catch (PropertiesLoadException e) {
// throw Exception if property file is not found
throw new PropertiesLoadException(" PropertiesLoadException: " + e.getMessage());
} catch (Exception e) {
throw new Exception(" Exception : " + e.getMessage());
}
}
}
});
} catch (Exception e){
logger.error("Failed on write - will stop spark context immediately!!" + e.getMessage());
closeContext(jssc);
if (e instanceof InterruptedException) {
Thread.currentThread().interrupt();
}
throw e;
}
}
}
In addition please note that my stream is working on spark 2.1 Standalone (not yarn / mesos) client mode. In addition I implement the stop gracefully my self using ZK.
Related
I have a application JSON -> XML converter. This application will take a List of events that are converted to XML one by one. Before conversion, the header for the final XML will be created using the start method, and later converted events are added to xmlEventWriter one-by-one, and finally after all the conversion the closing tags are added to the XML using end method.
I am facing an issue during the closing of tags and running into the error:
javax.xml.stream.XMLStreamException: No open start element, when trying to write end element
As per my understanding, everything is correct but still facing the issue don't know why.
Following is the class that will create the header body and closure tags in XML:
public class EventXMLStreamCollector implements EventsCollector<OutputStream> {
private final OutputStream stream;
private final XMLEventWriter xmlEventWriter;
private final XMLEventFactory events;
public EventXMLStreamCollector(OutputStream stream) {
this.stream = stream;
try {
xmlEventWriter = XMLOutputFactory.newInstance().createXMLEventWriter(stream);
events = XMLEventFactory.newInstance();
} catch (XMLStreamException e) {
throw new EventFormatConversionException("Error occurred during the creation of XMLEventWriter : " + e);
}
}
public void collect(Object event) {
System.out.println("COLLECT START");
try {
XMLEventReader xer = new EventReaderDelegate(XMLInputFactory.newInstance().createXMLEventReader(new StringReader(event.toString()))) {
#Override
public boolean hasNext() {
if (!super.hasNext())
return false;
try {
return !super.peek().isEndDocument();
} catch (XMLStreamException ignored) {
return true;
}
}
};
if (xer.peek().isStartDocument()) {
xer.nextEvent();
xmlEventWriter.add(xer);
}
} catch (XMLStreamException e) {
throw new EventFormatConversionException("Error occurred during the addition of events to XMLEventWriter: " + e);
}
System.out.println("COLLECT END");
}
#Override
public OutputStream get() {
return stream;
}
#Override
public void start(Map<String, String> context) {
System.out.println("START START");
try {
xmlEventWriter.add(events.createStartDocument());
xmlEventWriter.add(events.createStartElement(new QName("doc:Document"), null, null));
xmlEventWriter.add(events.createNamespace("doc", "urn:one"));
xmlEventWriter.add(events.createNamespace("xsi", "http://www.w3.org/2001/XMLSchem-instance"));
xmlEventWriter.add(events.createNamespace("cbvmda", "urn:two"));
for (Map.Entry<String, String> stringStringEntry : context.entrySet()) {
xmlEventWriter.add(events.createAttribute(stringStringEntry.getKey(), stringStringEntry.getValue()));
}
xmlEventWriter.add(events.createStartElement(new QName("Body"), null, null));
xmlEventWriter.add(events.createStartElement(new QName("EventList"), null, null));
} catch (XMLStreamException e) {
throw new EventFormatConversionException("Error occurred during the creation of final XML file header information " + e);
}
System.out.println("START END");
}
#Override
public void end() {
System.out.println("END START");
try {
System.out.println(xmlEventWriter.toString());
xmlEventWriter.add(events.createEndElement(new QName("EventList"), null));
xmlEventWriter.add(events.createEndElement(new QName("Body"), null));
xmlEventWriter.add(events.createEndElement(new QName("doc:Document"), null));
xmlEventWriter.add(events.createEndDocument());
xmlEventWriter.close();
} catch (XMLStreamException e) {
throw new EventFormatConversionException("Error occurred during the closing xmlEventWriter:" + e);
}
System.out.println("END END");
}
#Override
public void collectSingleEvent(Object event) {
try {
XMLEventReader xer = XMLInputFactory.newInstance().createXMLEventReader(new StringReader(event.toString()));
if (xer.peek().isStartDocument()) {
xer.nextEvent();
}
xmlEventWriter.add(xer);
} catch (XMLStreamException e) {
System.out.println("ADDED : " + e.getMessage());
throw new EventFormatConversionException("Error occurred during the addition of events to XMLEventWriter: " + e);
}
}
}
I am getting the error for this line:
xmlEventWriter.add(events.createEndElement(new QName("Body"), null));
I am not sure why I am getting this error. I am opening the Body tag and then trying to close it. I am sure that the flow is correct, I am calling start, collect, and then finally end. Following the output I am getting:
START START
START END
COLLECT START
COLLECT END
END START
I am not getting END END because of the error I am getting for the closing of Body tag. Can someone please help me understand this issue and provide some workaround and help.
I'm currently writing a client-server application where data should be transferred to the server to process the data. For building the connection I use ServerSocket and Socket and for sending the data I use OutputStream + ObjectOutputStream on the client-side and InputStream + ObjectInputStream on the server-side. The connection is currently running on localhost.
The object I try to transfer is a serializable class that only contains String parameters.
The problem I'm facing now is that readObject() immediately throws an EOFException as soon as the OutputStreams of the client are initialized (which leads to an initialization of the InputStreams of the server at the same time) instead of waiting for input from the client.
I send the data from the client using this code:
public void send(DataSet dataSet) {
if (!clientStreamsEstablished) {
initiateClientStreams();
}
try {
out.writeObject(dataSet);
} catch (IOException e) {
e.printStackTrace();
}
}
This method is only called when I hit the "submit"-button in the UI so it will not be executed on start of the application.
The data is currently (already tried a ton of other approaches with and without while() loop etc., etc.) read on the server using this method:
private void waitForInput(ObjectInputStream in, InputStream listeningPort) {
boolean dataReceived = false;
DataSet dataSet = null;
System.out.println("waiting ...");
while (!dataReceived) {
try {
Object temp = in.readObject(); // <-- EOFException is thrown here
boolean test = false;
if (temp instanceof DataSet) {
dataSet = (DataSet) temp;
}
} catch (ClassNotFoundException | IOException e) {
e.printStackTrace();
break;
}
}
System.out.println("Test 2: " + dataSet.toString());
if (dataReceived) {
waitForInput(in, listeningPort);
}
}
As soon as the client thread on the server reaches this line (see code-comment above) I get this stacktrace:
java.io.EOFException
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2626)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1321)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at com.labdashboardserver.ClientThread.waitForInput(ClientThread.java:53)
at com.labdashboardserver.ClientThread.run(ClientThread.java:43)
Exception in thread "Thread-2" java.lang.NullPointerException
at com.labdashboardserver.ClientThread.waitForInput(ClientThread.java:65)
at com.labdashboardserver.ClientThread.run(ClientThread.java:43)
The reason for the second part of the stacktrace containing the NullPointerException is apparent as due to the EOFExcpetion the dataSet never is initialized.
However from my point of view readObject() should block and wait for the client to send ANY data before starting to read it and throw an EOF. I feel like I read through half of Stack Overflow and other forums searching for an answer but the articles I found only discuss reading files or only immediate temporary streams which are closed afterwards.
Edit
I initialize the connection before calling the UI in my main method:
public static void main(String[] args) {
connector = new LabConnector();
connector.run();
if (connector.getConnectionEstablished()) {
EventQueue.invokeLater(new Runnable() {
public void run() {
try {
frame = new LabUI();
frame.setVisible(true);
} catch (Exception e) {
JOptionPane.showMessageDialog(null,
"Error Message:\n" + e.getMessage() + "\nProgram shutting down!", "Critical Error", 0);
}
}
});
}
While in the LabConnector class I initialize the streams and connection like this:
public void run() {
try {
establishConnection();
} catch (UnknownHostException e) {
retryConnection(e);
} catch (IOException e) {
retryConnection(e);
}
if (connectionEstablished) {
initiateClientStreams();
}
}
private void establishConnection() throws UnknownHostException, IOException {
client = new Socket(HOST_IP_ADRESS, HOST_PORT);
JOptionPane.showMessageDialog(null, "Connected to Server!");
connectionEstablished = true;
}
private void initiateClientStreams() {
try {
sendingPort = client.getOutputStream();
out = new ObjectOutputStream(sendingPort);
clientStreamsEstablished = true;
} catch (IOException e) {
e.printStackTrace();
}
}
My problem is I have customWaitMethods such as:
public void waitForLoading(WebElement loadingElement, WebElement errorElement) {
long timeOut = Long.parseLong(PropertyReader.getInstance().getProperty("DEFAULT_TIME_OUT"));
try {
WebDriverWait wait = new WebDriverWait(DriverFactory.getInstance().getDriver(), timeOut);
wait.until(ExpectedConditions.invisibilityOfElementLocated(By.id(loadingElement.toString())));
if (errorElement.isDisplayed()) {
throw new TestException();
}
} catch (TimeoutException e) {
System.out.println("Timed out after default time out");
} catch (TestException e) {
System.out.println("Unexpected error occurred, environment error");
e.printStackTrace();
}
}
I need some generic customWait methods. I do a search, but several cases need to be handled. Error msg appear -> fail the test. wait for the loading content, and it disappeared, -> check the search result.
How can I extend this code if I would like to check continuously some error_message element appears as well and in this case I would throw an exception? So independently I can handle the timeout exception and the other, error msg?
This sript is failing because of the IF. ErrorElement does not appear on the page, ---> nosuchelementException
You can catch different Exceptions as you see fit. In your case, you want to catch the TimeoutException to handle time outs. Then catch a different type of exception to handle the error message:
public void waitForLoading() {
long timeOut = Long.parseLong(...);
try {
WebDriverWait wait = new WebDriverWait(...);
wait.until(ExpectedConditions.invisibilityOfElementLocated(...));
if (<error-message-appears>) {
throw new CustomErrorMessageAppearedException();
}
} catch (TimeoutException e) {
System.out.println("Timed out after...");
} catch (CustomErrorMessageAppearedException e) {
// handle error message
}
}
The easiest approach I see is:
public void waitForLoading() {
long timeOut = Long.parseLong(PropertyReader.getInstance().getProperty("DEFAULT_TIME_OUT"));
try {
WebDriverWait wait = new WebDriverWait(DriverFactory.getInstance().getDriver(), timeOut);
if (!wait.until(ExpectedConditions.invisibilityOfElementLocated(By.id("wait_element")));)
{
throw new NoSuchElementException();
}
} catch (TimeOutException e) {
System.out.println("Timed out after " + timeOut + "seconds waiting for loading the results.");
}
}
How do I test that the following code will perform the logging statement when Exception is thrown, using Mockito?
public void cleanUp() {
for (Map.Entry<String, Connection> connection : dbConnectionMap.entrySet()) {
try {
if (connection.getValue() != null) {
connection.getValue().close();
}
}catch (Exception e) {
LOGGER.log(Level.WARNING, "Exception when closing database connection: ", e);
}
}
reset();
}
This question is about java.lang.Process and its handling of stdin, stdout and stderr.
We have a class in our project that is an extension to org.apache.commons.io.IOUtils. There we have a quiet new method for closing the std-streams of a Process-Object appropriate? Or is it not appropriate?
/**
* Method closes all underlying streams from the given Process object.
* If Exit-Code is not equal to 0 then Process will be destroyed after
* closing the streams.
*
* It is guaranteed that everything possible is done to release resources
* even when Throwables are thrown in between.
*
* In case of occurances of multiple Throwables then the first occured
* Throwable will be thrown as Error, RuntimeException or (masked) IOException.
*
* The method is null-safe.
*/
public static void close(#Nullable Process process) throws IOException {
if(process == null) {
return;
}
Throwable t = null;
try {
close(process.getOutputStream());
}
catch(Throwable e) {
t = e;
}
try{
close(process.getInputStream());
}
catch(Throwable e) {
t = (t == null) ? e : t;
}
try{
close(process.getErrorStream());
}
catch (Throwable e) {
t = (t == null) ? e : t;
}
try{
try {
if(process.waitFor() != 0){
process.destroy();
}
}
catch(InterruptedException e) {
t = (t == null) ? e : t;
process.destroy();
}
}
catch (Throwable e) {
t = (t == null) ? e : t;
}
if(t != null) {
if(t instanceof Error) {
throw (Error) t;
}
if(t instanceof RuntimeException) {
throw (RuntimeException) t;
}
throw t instanceof IOException ? (IOException) t : new IOException(t);
}
}
public static void closeQuietly(#Nullable Logger log, #Nullable Process process) {
try {
close(process);
}
catch (Exception e) {
//log if Logger provided, otherwise discard
logError(log, "Fehler beim Schließen des Process-Objekts (inkl. underlying streams)!", e);
}
}
public static void close(#Nullable Closeable closeable) throws IOException {
if(closeable != null) {
closeable.close();
}
}
Methods like these are basically used in finally-blocks.
What I really want to know is if I am safe with this implementation? Considering things like: Does a process object always return the same stdin, stdout and stderr streams during its lifetime? Or may I miss closing streams previously returned by process' getInputStream(), getOutputStream() and getErrorStream() methods?
There is a related question on StackOverflow.com: java: closing subprocess std streams?
Edit
As pointed out by me and others here:
InputStreams have to be totally consumed. When not done then the subprocess may not terminate, because there is outstanding data in its output streams.
All three std-streams have to be closed. Regardless if used before or not.
When the subprocess terminates normally everything should be fine. When not then it have to be terminated forcibly.
When an exit code is returned by subprocess then we do not need to destroy() it. It has terminated. (Even when not necessarily terminated normally with Exit Code 0, but it terminated.)
We need to monitor waitFor() and interrupt when timeout exceeds to give process a chance to terminate normally but killing it when it hangs.
Unanswered parts:
Consider Pros and Cons of consuming the InputStreams in parallel. Or must they be consumed in particular order?
An attempt at simplifying your code:
public static void close(#Nullable Process process) throws IOException
{
if(process == null) { return; }
try
{
close(process.getOutputStream());
close(process.getInputStream());
close(process.getErrorStream());
if(process.waitFor() != 0)
{
process.destroy();
}
}
catch(InterruptedException e)
{
process.destroy();
}
catch (RuntimeException e)
{
throw (e instanceof IOException) ? e : new IOException(e);
}
}
By catching Throwable I assume you wish to catch all unchecked exceptions. That is either a derivative of RuntimeException or Error. However Error should never be catched, so I have replaced Throwable with RuntimeException.
(It is still not a good idea to catch all RuntimeExceptions.)
As the question you linked to states, it is better to read and discard the output and error streams. If you are using apache commons io, something like,
new Thread(new Runnable() {public void run() {IOUtils.copy(process.getInputStream(), new NullOutputStream());}}).start();
new Thread(new Runnable() {public void run() {IOUtils.copy(process.getErrorStream(), new NullOutputStream());}}).start();
You want to read and discard stdout and stderr in a separate thread to avoid problems such as the process blocking when it writes enough info to stderr or stdout to fill the buffer.
If you are worried about having two many threads, see this question
I don't think you need to worry about catching IOExceptions when copying stdout, stdin to NullOutputStream, since if there is an IOException reading from the process stdout/stdin, it is probably due to the process being dead itself, and writing to NullOutputStream will never throw an exception.
You don't need to check the return status of waitFor().
Do you want to wait for the process to complete? If so, you can do,
while(true) {
try
{
process.waitFor();
break;
} catch(InterruptedException e) {
//ignore, spurious interrupted exceptions can occur
}
}
Looking at the link you provided you do need to close the streams when the process is complete, but destroy will do that for you.
So in the end, the method becomes,
public void close(Process process) {
if(process == null) return;
new Thread(new Runnable() {public void run() {IOUtils.copy(process.getInputStream(), new NullOutputStream());}}).start();
new Thread(new Runnable() {public void run() {IOUtils.copy(process.getErrorStream(), new NullOutputStream());}}).start();
while(true) {
try
{
process.waitFor();
//this will close stdin, stdout and stderr for the process
process.destroy();
break;
} catch(InterruptedException e) {
//ignore, spurious interrupted exceptions can occur
}
}
}
Just to let you know what I have currently in our codebase:
public static void close(#Nullable Process process) throws IOException {
if (process == null) {
return;
}
Throwable t = null;
try {
flushQuietly(process.getOutputStream());
}
catch (Throwable e) {
t = mostImportantThrowable(t, e);
}
try {
close(process.getOutputStream());
}
catch (Throwable e) {
t = mostImportantThrowable(t, e);
}
try {
skipAllQuietly(null, TIMEOUT, process.getInputStream());
}
catch (Throwable e) {
t = mostImportantThrowable(t, e);
}
try {
close(process.getInputStream());
}
catch (Throwable e) {
t = mostImportantThrowable(t, e);
}
try {
skipAllQuietly(null, TIMEOUT, process.getErrorStream());
}
catch (Throwable e) {
t = mostImportantThrowable(t, e);
}
try {
close(process.getErrorStream());
}
catch (Throwable e) {
t = mostImportantThrowable(t, e);
}
try {
try {
Thread monitor = ThreadMonitor.start(TIMEOUT);
process.waitFor();
ThreadMonitor.stop(monitor);
}
catch (InterruptedException e) {
t = mostImportantThrowable(t, e);
process.destroy();
}
}
catch (Throwable e) {
t = mostImportantThrowable(t, e);
}
if (t != null) {
if (t instanceof Error) {
throw (Error) t;
}
if (t instanceof RuntimeException) {
throw (RuntimeException) t;
}
throw t instanceof IOException ? (IOException) t : new IOException(t);
}
}
skipAllQuietly(...) consumes complete InputStreams. It uses internally an implementation similar to org.apache.commons.io.ThreadMonitor to interrupt consumption if a given timeout exceeded.
mostImportantThrowable(...) decides over what Throwable should be returned. Errors over everything. First occured higher prio than later occured. Nothing very important here since these Throwable are most probably discarded anyway later. We want to go on working here and we can only throw one, so we have to decide what we throw at the end, if ever.
close(...) are null-safe implementations to close stuff but throwing Exception when something went wrong.