Oozie Spark2 Java Action : How to Shutdown - java

I am migrating my existing pipelines from spark 1.6.0 to spark 2.1.0(cdh 5.15.1).
oozie version that i am using(4.1.0) does not support spark2 action so we are running the spark2 jobs using java action.
Jobs are getting executed successfully through java action but one problem that i am facing is whenever the oozie workflow is killed, spark application is not getting killed especially when running cluster mode.
I can understand that java action launches the spark driver in a separate container which is a separate jvm process.
Just want to understand if there is a way to handle this scenario.

I'm pretty sure this is what happens with all Oozie actions that run as MapReduce jobs. I've experienced the same issue with Hive2 actions.
From O'Reilly's "Apache Oozie"
If any execution path of a workflow reaches a kill node, Oozie will terminate the workflow immediately, failing all running actions ... and setting the completion status of the workflow to KILLED. It is worth noting that Oozie will not explicitly kill the currently running MapReduce jobs on the Hadoop cluster that corresponds to those actions.
And about the Java action
This action runs as a single mapper job....

Related

Cron job from application or external server

I am working on an application which is deployed in pcf and we are using caching mechanism. We want to implement a batch job which will purge the data from cache region.
I want get some suggestions on below:
should i include this batch job in the same application so that it uses the same server to run the batch jobs?
or should i create a new server for running these batch jobs?
Just want to see how the performance of the current application will be impacted, if we run the batch job from the same server. Just want to see the advantages and disadvantages for the same..
TIA

Flink cluster on EKS

I am new to Flink and kubernetes. I am planning to creating a flink streaming job that streams data from a FileSystem to Kafka.
Have the flink job jar which is working fine(tested locally). Now I am trying to host this job in kubernetes, and would like to use EKS in AWS.
I have read through official flink documentation on how to setup flink cluster.
https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/kubernetes.html
I tried to set it up locally using minikube and brought up session cluster and submitted the job which is working fine.
My questions:
1)Out of the two options Job cluster and session cluster, since the job is streaming job and should keep monitor the filesystem and when any new files came in it should stream it to destination, can I use job cluster in this case? As per documentation job cluster is something that executes the job and terminates once it is completed, if the job has monitor on a folder does it ever complete?
2)I have a maven project that builds the flink jar, would like to know the ideal way to spin a session/job cluster using this jar in production ? what is the normal CI CD process ? Shall I build a session cluster initially and submit the jobs whenever needed ? or spinning up Job cluster with the jar built ?
First off, the link that you provided is for Flink 1.5. If you are starting fresh, I'd recommend using Flink 1.9 or the upcoming 1.10.
For your questions:
1) A job with file monitor never terminates. It cannot know when no more files arrive, so you have to cancel it manually. Job cluster is fine for that.
2) There is no clear answer to that and it's also not Flink specific. Everyone has a different solution with different drawbacks.
I'd aim for a semi-automatic approach, where everything is automatic but you need to explicitly press a deploy button (and not just a git push). Often times, these CI/CD pipelines deploy on a test cluster first and make a smoke test before allowing a deploy on production.
If you are completely fresh, you could check the AWS codedeploy. However, I made good experiences with Gitlab and AWS runner.
The normal process would be something like:
build
integration/e2e tests on build machine (dockerized)
deploy on test cluster/preprod cluster
run smoke tests
deploy on prod
I have also seen processes that go quickly on prod and invest the time in better monitoring and a fast rollback instead of preprod cluster and smoke tests. That's usually viable for business uncritical processes and how expensive a reprocessing is.

Java API for Jenkins

I am writing a Java application where I need to track the status of Jenkins build and execute few actions on build success and failure.
I am quite new to Jenkins. Is there a Java api available to track the status of the build?
Is it possible to trigger the java application on successful completion or during the failure of the build.
Your suggestions are welcome.
Thanks,
Santhosh
There is the Jenkins REST API which could suit your needs
Alternatively, there are literally hundreds of plugins for Jenkins so it is likely that you could run your whole process from within Jenkins using
Build Pipeline
This plugin provides a Build Pipeline View of upstream and downstream connected jobs that typically form a build pipeline. In addition, it offers the ability to define manual triggers for jobs that require intervention prior to execution, e.g. an approval process outside of Jenkins.
Multijob
Gives the option to define complex and hierarchical jobs structure in Jenkins.

Google App Engine Cron Job

I have created a cron.xml file and a servlet which describes the job.
Now when i compile and login as an admin, local development dashboard doesn't show Cron Jobs link.
Local development server does not have the Cron Jobs link neither does it execute cron jobs. The actual appengine will show cron jobs and will execute them.
You can manually execute cron jobs on local server by visiting their urls. e.g.
http://localhost:8888/FindReservedBooksTask.
BTW the cron.xml file should be in the war/WEB-INF directory.
The dev appserver doesn't automatically run your cron jobs. You can use your local desktop's cron or scheduled tasks interface to hit the URLs of your jobs with curl or a similar tool.
Here is the link to the GAE doc on this.
Also, make sure you disable all security constraints for your crons in the web.xml. If you don't have them -- you should restrict cron to admin accounts.
This website walks you through the way to use cron jobs inside of Google App Engine.
Also, The Google App Engine provides a service called the Cron Service that helps us do two fundamental things:
Allows your application to schedule these tasks.
Execute these tasks based on their schedule.

Execute hadoop jar through Spring MVC using process command

I am new to Java and currently working on a project where a Hadoop job needs to be triggered from Spring MVC application. The manager asked me to use "process" for which I have no clue. I have written a shell script to trigger the job but the client wants it to be triggered directly from the Spring MVC app so that log can be written in local file system.
Can anyone help me how to trigger a Hadoop jar (more specifically Yarn command with different arguments) to be triggered in edge node through Java process?
You can try using ProcessBuilder.
http://docs.oracle.com/javase/7/docs/api/java/lang/ProcessBuilder.html

Categories