Google Dataflow: set a DefaultUncaughtExceptionHandler - java

Is it possible to set a DefaultUncaughtExceptionHandler for every worker and thread in Dataflow ?
Something like Thread.setDefaultUncaughtExceptionHandler.

No, I don't believe so. The Dataflow model for expressing user code operates at a higher level of abstraction than individual workers and threads. Can you expand a bit more on why you'd like to do this globally?
The Dataflow Service already retries all uncaught exceptions in your user code. Elements are processed in groups called bundles -- if any element in a bundle causes an exception to be thrown, the entire bundle is retried. In batch mode, failing bundles are retried until a single bundle has failed 4 times, at which point the job is failed. In streaming mode, failing bundles are retried indefinitely, although you can use the update feature to update your code to better handle the issue.
All of these exceptions will appear in both the job logs in Dataflow UI and in your Cloud Logging worker logs. See more info here.

Related

parallel updates to oracle table from spring mvc application

We have an application running in 2 containers/pods. This applications reads request from ActiveMQ and for processing this request this applications needs to update 10 tables. We have these hundreds of ASYNC request in ActiveMQ that needs to be processed in hundreds per second and each request tries to update the 10 tables in one oracle database. Because of these updates to 10 tables at the same fraction of a second some of the request fails frequently with this error "Error updating database. Cause: java.sql.SQLException: ORA-00060: deadlock detected while waiting for resource"
Is there a better way to handle these kind of scenarios like a better architecture ?
Is there a better way to handle these kind of scenarios using spring framework?
Your premise is wrong; deadlocks are not caused simply by a large amount of activity on many tables. They're a special kind of lock that occur when different sessions try to lock the same row in a different order. When that happens, the only solution is to kill one of the sessions to release the lock.
The first step in investigating deadlocks is to look in the alert log and find the trace file generated by a deadlock. It will show the statements and objects involved. If you're lucky, the mistake is caused by a missing foreign key index or a bitmap index on a transactional table, which can be resolved with a simple DDL change.
If you're unlucky, the application is changing tables in different orders. In that case, you'll need to change the application to always process changes the same way. I don't think I can offer any advice for that, it depends entirely on your application.

Shutting down spring batch jobs gracefully in tomcat

We run several spring batch jobs within tomcat in the same web application that serves up our UI. Lately we have been adding many more jobs and we are noticing that when we patch our app, several jobs may get stuck in a STARTING or STARTED status. Many of those jobs ensure that another job is not running before they start up, so this means after we patch the server, some of our jobs are broken until we manually run SQL to update the statuses of the jobs to ABANDONED or STOPPED.
I have read here that JobScope and StepScope jobs don't play nicely with shutting down.
That article suggests not using JobScope or StepScope but I can't help but think that this is a solved problem where people must be doing something when the application exits to prevent this problem.
Are there some best practices for handling this scenario? What are you doing in your applications?
We are using spring-batch version 3.0.3.RELEASE
I will provide you an idea on how to solve this scenario. Not necessarily a spring-batch solution.
Everytime I need to add jobs in an application I do as this:
Create a table to control the jobs (queue, priority, status, etc.)
Create a JobController class to manage all jobs
All jobs are defined by the status R-running, F-Finished, Q-Queue (you can add more as you need like aborted, cancelled, etc) (the jobs control these statuses)
The jobController must be loaded only once, you can define it as a spring bean for this
Add a boolean attribute to JobController to inform if you already checked the jobs when you instantiate it. Set it to false
Check if there are jobs with the R status which means that in the last stop of the server they were running so you update every job with this R status to Q and increase their priority so it will get executed first after a restart of the server. This check is inside the if for that boolean attribute, after the check set it to true.
That way every time you call the JobController for the first time and there are unfinished jobs from a server crash you will be able to set then all to a status where it can be executed again. And this check will happens only once since you will be checking that boolean attribute.
A thing that you should be aware of is caution with your jobs priority, if you manage it wrong you may run into a starvation problem.
You can easily adapt this solution to spring-batch.
Hope it helps.

Logging aspect for exceptions with JMS retry mechansim

In our project we use Activemq (jms template) - to publish many events from one webapp to another.
we use logging aspect (spring aop) as well - mainly we log errors and entering\quitting methods.
Now, sometimes we face racing conditions on the flow of the systems. i.e. an entity is being created on one web app, an event is fired to notify another webapp, but the handling of the other webapp requires a different event to be handled first, so if such scenario happens, the handling fails (for example, an id is missing ) and immediately retried (jms re-delivery), on the 2nd time of the retry its usually works (never more then 3 retries are required).
So basically, we have exceptions as part of our day to day flow, but:
our log files are huge and messy because of the exceptions thrown by such scenarios, any idea how can we not log the first few retries exceptions and only on a later exception we will log? Maybe another approach you can recommend?
Thanks.
You can use JMSXDeliveryCount property of Message to get the redelivery count. See http://activemq.apache.org/activemq-message-properties.html

Two instances despite using concurrent requests and low traffic

My Apache Wicket web application uses JDO for its data persistence in GAE/J.
On application start-up, the home page enqueues a task before it is shown (with zero delay to its default ETA). This task causes the construction of a new Wicket web page, in order to construct the JVM's singleton Persistence Manager Factory (PMF) instance for use by the application during its lifetime.
I have set the application to use concurrent requests by adding
<threadsafe>true</threadsafe>
to the application's appengine-web.xml file.
Despite this, after a single request to visit the application's home page, I get two application instances: one created by the home page visit request, and the other created by the execution of the enqueued task (about 6 to 7 seconds later).
I could try to solve this problem by delaying the execution of the enqueued task (by around 10 seconds, perhaps?), but why should I need to try this when I have enabled concurrent requests? Should the first GAE/J application instance not be able to handle two requests close together without causing a second instance to be brought forth? I presume that I am doing something wrong, but what is it?
I have searched Stack Overflow's set of tags ([google-app-engine] [java]), and the depreciating group "Google App Engine for Java" too, but have found nothing relevant to my question.
I would appreciate any pointers.
If you want the task to use an existing instance, you can set the X-AppEngine-FailFast header, which according to the GAE docs:
This header instructs the Scheduler to immediately fail the request if an existing instance is not available. The Task Queue will retry and back-off until an existing instance becomes available to service the request
It's worth checking out the Managing Your App's Resource Usage document for performance and tuning techniques.

Multhreading in Java

I'm working with core java and IBM Websphere MQ 6.0. We have a standalone module say DBcomponent that hits the database and fetches a resultset based on the runtime query. The query is passed to the application via MQ messaging medium. We have a trigger configured for the queue which invokes the DBComponent whenever a message is available in the queue. The DBComponent consumes the message, constructs the query and returns the resultset to another queue. In this overall process we use log4j to log statements on a log file for auditing.
The connection is pooled to the database using Apache pool. I am trying to check whether the log messages are logged correctly using a sample program. The program places the input message to the queue and checks for the logs in the log file. Its expected for the trigger method invocation to complete before i try to check for the message in log file, but every time my program to check for log message gets executed first leading my check to failure.
Even if i introduce a Thread.sleep(time) doesn't solves the case. How can i make it to keep my method execution waiting until the trigger operation completes?
Any suggestion will be helpful.
I suggest you go and read up about the concurrency primitives that Java offers you. http://tutorials.jenkov.com/java-concurrency/index.html seems to cover the bases, the Thread Signalling chapter in particular.
I would recommend against relying on log4j (or any logging functionality) even in a simple test program.
Have your test run as you would expect it to, putting debugging/tracing statements in the log as you see fit (be liberal about it, log4j is very fast!) Then, when it's done, check the log yourself.
Writing log parsing will only complicate your goals.
Write your test, view the result, view the logs. If you want automated testing, consider setting up a functional test. You can set up tests free using Selenium. (http://seleniumhq.org/) There's no need to write your own functional testing/parsing stuff when there's easy to configure, easy to use, easy to customize frameworks out there! :-)

Categories