How to Run Cron Jobs in Kotlin Ktor? - java

Is there a way to run Cron Jobs with Ktor? My end objective is to host a Cron Job written with Kotlin for the Coinverse app's backend service to populate data.
I'm currently hosting multiple Java .jar apps written in Kotlin on AppEngine. I'm looking to refactor these apps into Ktor apps on AppEngine with a Cron Job for scheduled tasks, as the .jar apps have more issues with dependencies.
I'm looking for Ktor's equivalent to Cloud Functions' built-in implementation for Cron Jobs with JavaScript.
functions.pubsub.schedule
Back-up option: If Ktor does not have this feature and I want to keep the code in Kotlin, Google has an alpha, Using Kotlin with Google Cloud Functions. It appears Kotlin + Cloud Functions' built-in implementation could be used with this approach.

Sergey Mashkov from the JetBrains team suggests in the kotlinlang Slack group to launch a Kotlin Coroutine on the Application scope using an infinite loop and delay.
Then, the Ktor app can be deployed to AppEngine.
fun Application.main() {
launch {
while(true) {
delay(600000)
// Populate data here.
}
}
}

As for my experience, this will not work the app will stop after 20 minutes or so.
The only solution I've found is to make a regular cron.yaml and and a ktor app, and it works without complain....(the ktor app shall implement a get, and will be called by the cron file)

Related

Google Cloud scheduler Java job containerized with Selenium

I've got a Java code to perform some interactions with web pages and used Selenium for it.
Now I'd like to get this code executed every hours and I've thought it's a great occasion to discover the cloud world.
I've created an account on Google Cloud.
Because my app need to have a driver to use Selenium (gecko driver for Firefox), I'll have to create an docker image to set everything it need inside it.
In Google Cloud services, there is the "Cloud Scheduler" which can allow me to run a code when I want to.
But here are my questions :
What kind of target should I configure (HTTP, Pub/Sub, HTTP App Engine)?
Because I'm not using the Google Cloud Functions, my container will always be up, it doesn't seems as a great idea for a pricing reason? I would have like to have my container up only the time of the execution.
Also I was thinking to use Quarkus framework to wrap my application since I've since it was made for the cloud and very quick to start, is that the best option for me?
I'll be very glade if someone can help me to see this a little better. I'm not a total beginner I work as a Java / JavaScript developer for 5 years now and dockerized some application but everything about the cloud is a big piece, not easy to know where to start.
So you:
are using docker images
run your workload occasionally
aren't willing to use Cloud Function
==> Cloud Run is your best bet. Here is Google Cloud Run Quick start : https://cloud.google.com/run/docs/quickstarts/prebuilt-deploy
Keep in mind that your containerised application needs to be listening to HTTP requests so take a look at Cloud Run Container runtime contract
Finally you can indeed trigger Cloud Run from Cloud Scheduler, and here a detailed documentation on how to do it https://cloud.google.com/run/docs/triggering/using-scheduler
As #MBHAPhoenix says, Cloud Run is your best option. You can then trigger the job from Cloud Scheduler. We have this exact scenario currently running for one of our projects but our container is Python. We wrote an article about it here
You should note that to trigger your Cloud Run job from Cloud Scheduler, you'll have to 'secure it'. This means means you won't be able to just type the URL in a web browser. A service account will be responsible for running the Cloud Run job and you'll then need to grant your Cloud Scheduler service access to this service account so it can invoke the Cloud Run Job. I've been meaning to put up a post about the exact steps for doing this (will try to get it done this weekend).
In terms of cost, we have this snippet from our article
...Cloud Run only runs when it receives an HTTP request. It plays dead and comes alive to execute your code when an HTTP request comes in. When it is done executing the request, it goes 'dead' again till the next request comes in. This means you're not paying for time spent idling i.e. when it is not doing anything.....

How to Authenticate CloudTasksClient in App Engine

I am migrating my existing java application to google cloud app engine. The application creates threads on periodic basis to perform certain background tasks. As App Engine does not support threads hence I have to use "Tasks".
I could not find any sample code that uses an application (running in app engine) to create and send task to task handler (also running in app engine).
Sample code available on internet uses client code (task creater) running on local machine and using authentication (via setting key json path in environment variable). In my case I want task creater and task handle both to run on app engine.
My question is: Where can I find a sample code that programmatically authenticates and creates tasks? Basically CloudTasksClient needs to be authenticated programmatically.
I am giving some documentation links that can help with creating and handling the tasks using App Engine.
Please find an example of creating a task with authentication. Additionally this stackoverflow answer may help.

Recurrent actions via OpenShift

I have an application (Spring Boot + Hibernate + Postgres) which executes ETL process. The application is deployed in OpenShift and has a scale n > 1, so this application always has more than 1 replica. But if every app launched own ETL in same database then data wouldn't be consistent.
Therefore, I think the process should be launched via something external.
I see a decision of my task as a method of API which can "doEtl()" and the method can be called a kubernete (OS) 'schedule' or another kuber (OS) tool. However I can't understand how to google it. I try to look for 'kubernetes custom schedule' but the found results explain 'how to work' or how to write custom the schedule for auto-scale.
Can someone advice me, if it is generally possible and if yes how to google it or how to named it?
You might be looking for the CronJobs object that is available and can be used to regularly execute a certain action.
For OpenShift, you can find more information in the documentation: https://docs.openshift.com/container-platform/4.3/nodes/jobs/nodes-nodes-jobs.html

GAE Datastore Backup using Java and Cron

I want to backup (and later restore) data from GAE datastore using the export facilities that went live this year. I want to use cron and java. I have found this post which points at this page but it's just for phython.
Originally I wanted to do this automatically every day using the Google Cloud Platform console but I can't find a way of doing it. Now I am resorting to incorporating it into Java and a cron job. I need restore instructions as well as backup.
I'm not interested in using the datastore admin backup as this will no longer be available next year.
According to the docs, the way to do it is, indeed, through Cron for GAE and having a GAE module call the API to export.
The point is not the code itself, but understanding why this is so.
Currently, the easiest way to schedule tasks in GCP is though Cron jobs in GAE, but those can only call GAE modules. Following the docs that you pointed out, the Cron will be quite similar to the one described there.
Regarding the handler itself, you only need to call the Datastore Admin API authenticated with an account with the proper permissions.
Since the Cloud Client Library does not have admin capabilities for Datastore, you'll have to either construct the call manually, or use the Datastore API Client Library.
Notice that, for GCP APIs, there are usually two client libraries available: the Cloud Client Library and the API Client Library. The first one is hand-crafted while the second one is auto-generated from the discovery document of each API.
If one specific functionality is not available through the Cloud Client Library (the recommended way of interacting with GCP APIs), you can always check the API Client Library for that same functionality.

Deploy Apache Spark application from another application in Java, best practice

I am a new user of Spark. I have a web service that allows a user to request the server to perform a complex data analysis by reading from a database and pushing the results back to the database. I have moved those analysis's into various Spark applications. Currently I use spark-submit to deploy these applications.
However, I am curious, when my web server (written in Java) receives a user request, what is considered the "best practice" way to initiate the corresponding Spark application? Spark's documentation seems to be to use "spark-submit" but I would rather not pipe out the command to a terminal to perform this action. I saw an alternative, Spark-JobServer, which provides an RESTful interface to do exactly this, but my Spark applications are written in either Java or R, which seems to not interface well with Spark-JobServer.
Is there another best-practice to kickoff a spark application from a web server (in Java), and wait for a status result whether the job succeeded or failed?
Any ideas of what other people are doing to accomplish this would be very helpful! Thanks!
I've had a similar requirement. Here's what I did:
To submit apps, I use the hidden Spark REST Submission API: http://arturmkrtchyan.com/apache-spark-hidden-rest-api
Using this same API you can query status for a Driver or you can Kill your Job later
There's also another hidden UI Json API: http://[master-node]:[master-ui-port]/json/ which exposes all information available on the master UI in JSON format.
Using "Submission API" I submit a driver and using the "Master UI API" I wait until my Driver and App state are RUNNING
The web server can also act as the Spark driver. So it would have a SparkContext instance and contain the code for working with RDDs.
The advantage of this is that the Spark executors are long-lived. You save time by not having to start/stop them all the time. You can cache RDDs between operations.
A disadvantage is that since the executors are running all the time, they take up memory that other processes in the cluster could possibly use. Another one is that you cannot have more than one instance of the web server, since you cannot have more than one SparkContext to the same Spark application.
We are using Spark Job-server and it is working fine with Java also just build jar of Java code and wrap it with Scala to work with Spark Job-Server.

Categories