I have a mapreduce job as a 'jar' ,that should be run daily. Also, I need to run this jar from a remote java application. How can I schedule it: i.e, I just want to run job daily from my remote java application.
I read about Oozie, but I dont think it is apt here.
Take a look at Quartz. It enables you to run a standalone java programs or run inside an web or application container (like JBoss or Apache Tomcat). There is a good integration with Spring and Spring batch in particular.
Quartz can be configured outside of the java code - in XML and the syntax is exactly like in crontab. So, I found it very handy.
äSome examples can be found here and here.
I am not clear about your requirement. You can use ssh command execution libraries in your program.
SSH library for Java
If you are running your program in linux environment itself, You can set some crontab for periodic execution.
If the trigger of your jar is your java program, then you should schedule your java program hourly rather than the jar. And if that is separate, then you can schedule your jar in Oozie workflow where you can have the java code execution in step one of oozie workflow and jar execution in the second step.
In oozie, you can pass the parameters from one level to another as well.Hope this helps.
-Dipika Harwani
Related
Currently our project is on MR and we use Oozie to orchestrate our MR Jobs. Now we are moving to Spark, and would like to know the recommended ways to schedule/trigger Spark Jobs on the CDH cluster. Note that CDH Oozie does not support Spark2 Jobs. So please give an alternative for this.
Last time I looked, Hue had a Spark option in the Worlflow editor. If Cloudera didn't support that, I'm not sure why it'd be there...
CDH Oozie does support plain shell scripts, though, but you need to be sure all NodeManagers will have spark-submit command available on the local server.
If that doesn't work, it also supports Java actions for running a JAR, so you could write your Spark scripts all starting with a main method that loads up any configuration from there
As soon as you submit the spark job from the shell, like:
spark-submit <script_path> <arguments_list>
it gets submitted to the CDH cluster. Immediately you will be able to see the spark jobs and its progress in the Hue.This is how we trigger the spark jobs.
Further, to orchestrate a series of jobs, you can use a shell script wrapper around it. Or, you can use a cron job to trigger in timing.
I have an executable jar and I was trying to create a Windows Service using sc.exe. I used the below code for creating service:
sc create "TestService" binPath= "C:\Program Files\Java\jdk1.6.0_03\jre\bin\java.exe -jar C:\abc\MainClass.jar"
The service got created but when I was trying to start the service I got the below error:
Error 1053: The service did not respond to the start or control request in a timely fashion.
Later I tried to use Java Service Wrapper (Community Edition), the service starts for some time but is getting stopped everytime. The wrapper log tells something like:
Advice:
The Wrapper consists of a native component as well as a set of classes
which run within the JVM that it launches. The Java component of the
Wrapper must be initialized promptly after the JVM is launched or the
Wrapper will timeout, as just happened. Most likely the main class
specified in the Wrapper configuration file is not correctly initializing
the Wrapper classes:
com.MainClass
While it is possible to do so manually, the Wrapper ships with helper
classes to make this initialization processes automatic.
Please review the integration section of the Wrapper's documentation
for the various methods which can be employed to launch an application
within the Wrapper
Could anyone please tell me how can I run jar as a Windows Service without using external software as I can't use any third party app on Client's prod env.
If not what other configs I need to do in Java Service Wrapper to make the service start.
I tried to find some info related to this on stackoverflow but I did not get anything thing. If any one has anything on stackoverflow please feel free to put this in comment.
I have used this approach before in a productive environment, so I can assure you it is safe to use.
The Jar-File is wrapped in an exe and then it is added to the windows service scheduler (or however you want to call this). If you have a maven project this is also really easy to accomplish.
https://dzone.com/articles/installing-a-java-application-as-a-windows-service
Edit: you said you can’t use external software. With this approach everything that is needed is packed in the exe file. Including a JRE, I hope that that is allowed by your client’s policy.
I wanted to develop 'tasks' in Java which can be run periodically as per the schedule defined.
How do I run this on my Linux server. If it is a jar file - is it enough that I create a jar file and run it using shell script? and schedule to run the script (CRON)
I was planning to make use of Spring Framework. Do I really need one? Since I can schedule to call my java program using CRON
How do I approach this?
You can build the app using Spring Boot and run it as a daemon:
https://docs.spring.io/spring-boot/docs/current/reference/html/deployment-install.html
And then use quartz to schedule tasks
You can use CRON job and as well as scheduler like (Quartz etc) to run your java task. I think CRON job is a convenient way to run your jar file. You can simply schedule your jar in the CRON job.
Check out quartz its an awesome scheduling library that you can include in any java application.
Once the scheduler is started it runs in selected intervals defined in a cron expression say
( ***** )
So I'm developing a DropWizard application and all of the tutorials point towards compiling and running java -jar to start the web server. However while I'm doing local development this is a pretty slow work flow. Having used Jetty before I know it will autoreload and run in a daemon mode.
We're using Gradle and I found this which works to start Jetty. The first problem I encountered is this:
Directory '/src/main/webapp' specified for property 'webAppSourceDirectory' does not exist.
I found I way around this by adding
jettyRun.webAppSourceDirectory = file("src/main/java")
to the build.gradle file but of course this just lists files in that directory. Is there a directory I can point jetty to for this to work?
Or is there another way I can get DropWizard to auto reload resources and recompile?
Also Is there a way to get DropWizard to run in the background?
Dropwizard doesn't run on Jetty. It manages Jetty, as well as other tools. So manipulating jetty is not a solution for what you want to accomplish.
Or is there another way I can get DropWizard to auto reload resources
and recompile?
No AFAIK.
Also Is there a way to get DropWizard to run in the background?
Also no AFAIK. You should be able to fix that with some bash tricks.
Or maybe this might be of some help, but I don't think it will recompile and reload resources.
Dropwizard is a fairly lightweight application. In my development environment it takes about 3-5 seconds to build and start a dropwizard service; that is by using Intellij, not gradle (or maven).
I am trying to create an integration test, which requires a running PostgreSQL server. Is it possible to start the server in maven build and stop it when tests are completed (in a separate process, I think)? Assuming the PostgreSQL server is not installed on the machine.
You are trying to push maven far beyond the intended envelope, so you'll be in for a fair amount of hurt before it will work.
Luckily postgresql can be downloaded as a zip archive.
As already mentioned above maven can use ant tasks to extend its reach. Ant has a large set of tasks to unzip files, and run commands. The sequence would be as follows :
unzip postgresql-xxx.zip in a well known directory --> INSTALL_DIR
create a data directory --> DATA_DIR
/bin/init-db -D
/bin/postgres -D
/bin/create_db -EUNICODE test
This should give you a running server with a test database.
Further issues : create a user, security (you likely want to connect via TCP/IP but this is disabled by default if I recall correct, this requires editing a config file before starting the database)
...
Good Luck.
I started writing a plugin for this purpose:
https://github.com/adrianboimvaser/postgresql-maven-plugin
It's in a very early stage and lacks documentation, but mostly works.
I already released version 0.1 to Maven Central.
I'm also releasing PostgreSQL binary distributions for all platforms as maven artifacts.
You can find the usage pattern in the plugin's integration tests.
Cheers!
Not to my knowledge. However, you could run a remote command that starts the server.
I think the usual scenario is to have a running integration test db, and not to shut it down/ restart it between builds.
But if you really want to you could set up your continuous integration server to start/ stop the db.
You sound like you are trying to build a full continuous integration environment. You should probably look into using a full CI tool such as Cruise Control or Bamboo.
How I've done it before is to set up a dedicated CI db that is accessible from the CI server, and then have a series of bash/python/whatever scripts run as a After Successful Build step which can then run whatever extra integration tasks you like. Pair that with something like liquibase and you could wipe out the CI db and make sure it is up to the latest schema every build.
Just to bring some fresh perspective into this matter:
You could also start the postgresql database as docker instance.
The plugin ecosystem for docker seems to be still in flux, so you might need to decide yourself which fits. Here are a few links to speed up your search:
https://github.com/fabric8io/docker-maven-plugin
http://heidloff.net/article/23.09.2015102508NHEBVR.htm
https://dzone.com/articles/build-images-and-run-docker-containers-in-maven