Google Cloud scheduler Java job containerized with Selenium - java

I've got a Java code to perform some interactions with web pages and used Selenium for it.
Now I'd like to get this code executed every hours and I've thought it's a great occasion to discover the cloud world.
I've created an account on Google Cloud.
Because my app need to have a driver to use Selenium (gecko driver for Firefox), I'll have to create an docker image to set everything it need inside it.
In Google Cloud services, there is the "Cloud Scheduler" which can allow me to run a code when I want to.
But here are my questions :
What kind of target should I configure (HTTP, Pub/Sub, HTTP App Engine)?
Because I'm not using the Google Cloud Functions, my container will always be up, it doesn't seems as a great idea for a pricing reason? I would have like to have my container up only the time of the execution.
Also I was thinking to use Quarkus framework to wrap my application since I've since it was made for the cloud and very quick to start, is that the best option for me?
I'll be very glade if someone can help me to see this a little better. I'm not a total beginner I work as a Java / JavaScript developer for 5 years now and dockerized some application but everything about the cloud is a big piece, not easy to know where to start.

So you:
are using docker images
run your workload occasionally
aren't willing to use Cloud Function
==> Cloud Run is your best bet. Here is Google Cloud Run Quick start : https://cloud.google.com/run/docs/quickstarts/prebuilt-deploy
Keep in mind that your containerised application needs to be listening to HTTP requests so take a look at Cloud Run Container runtime contract
Finally you can indeed trigger Cloud Run from Cloud Scheduler, and here a detailed documentation on how to do it https://cloud.google.com/run/docs/triggering/using-scheduler

As #MBHAPhoenix says, Cloud Run is your best option. You can then trigger the job from Cloud Scheduler. We have this exact scenario currently running for one of our projects but our container is Python. We wrote an article about it here
You should note that to trigger your Cloud Run job from Cloud Scheduler, you'll have to 'secure it'. This means means you won't be able to just type the URL in a web browser. A service account will be responsible for running the Cloud Run job and you'll then need to grant your Cloud Scheduler service access to this service account so it can invoke the Cloud Run Job. I've been meaning to put up a post about the exact steps for doing this (will try to get it done this weekend).
In terms of cost, we have this snippet from our article
...Cloud Run only runs when it receives an HTTP request. It plays dead and comes alive to execute your code when an HTTP request comes in. When it is done executing the request, it goes 'dead' again till the next request comes in. This means you're not paying for time spent idling i.e. when it is not doing anything.....

Related

Azure-aks: programatically watch AKS creation status using Java

I am writing test cases for an IaaC project. The use case is like the following:
Given I do not have a AKS cluster
And I call an api to create a cluster
Then I have a cluster up and running
In the above scenario for 3rd step, I would like to implement two things, which are as follows:
First wait for the AKS resource to be created and
Once the resource is created, wait for the its provisioning state to be success
I searched the Azure documentation to see if there are any out of the box Watchers that azure provides as a part of it's azure-sdk-for-java. But did not find any. Could somebody share your knowledge or experience is solving such a problem statement?

Manage running Java apps remotely

We have several Java standalone applications (in form of Jar files) running on multiple servers. These applications mainly read and stream data between systems. We are using Java 8 mainly in our development. I was put in charge recently. My main function is to manage and maintain these apps.
Currently, I check these apps manually by accessing these servers, check if the app is running, and sometimes run some database queries to see if the app started pulling data. My problem is that in many cases, some of these apps fail and shutdown due to data issue or edge cases without anyone noticing. We need some monitoring and application recovery in place.
We don't have docker infrastructure in place. We plan to implement docker in the future, but for now this is not an option.
After research, the following are options I thought of or solutions I tried:
Have the apps create a socket client which sends a heartbeat to a monitoring app (which needs to be developed). I am keeping this as my last option.
I tried to use Eclipse Vertx to wrap the apps into Verticles. Then create a web view that can show me status and other info. After several tries, the apps fail to parse the data correctly (might be due to my lack of understanding to Vertx library).
Have a third party solution that does this, but I have no idea what solutions are out there. I am open for suggestions.
My requirements are:
Proper monitoring of the apps running and their status.
In case of failure, the app should start again while notifying the admin/developer.
I am willing to develop a solution or implement a third party one. I need you guidance on this.
Thank you.
You could use spring-boot-actuator (see health). It comes with a built-in endpoint that has some health checks(depending on your spring-boot project), but you can create your own as well.
Then, doing a http request to http://{host}:{port}/{context}/actuator/health (replace with yours), you could see those health checks status and also use the response status code to monitor your application.
Have you heard of Java Service Wrappers? Not a full management functionality, however it would monitor for JVM crashes and out of memory conditions and restart your application for sure. Alerting should also be possible.
There is a small comparison table here: https://yajsw.sourceforge.io/#mozTocId284533
So some basic monitoring and management is included already. If you need more, I suggest using JMX (https://www.oracle.com/java/technologies/javase/javamanagement.html) or Prometheus (https://prometheus.io/ and https://github.com/prometheus/client_java)

Recurrent actions via OpenShift

I have an application (Spring Boot + Hibernate + Postgres) which executes ETL process. The application is deployed in OpenShift and has a scale n > 1, so this application always has more than 1 replica. But if every app launched own ETL in same database then data wouldn't be consistent.
Therefore, I think the process should be launched via something external.
I see a decision of my task as a method of API which can "doEtl()" and the method can be called a kubernete (OS) 'schedule' or another kuber (OS) tool. However I can't understand how to google it. I try to look for 'kubernetes custom schedule' but the found results explain 'how to work' or how to write custom the schedule for auto-scale.
Can someone advice me, if it is generally possible and if yes how to google it or how to named it?
You might be looking for the CronJobs object that is available and can be used to regularly execute a certain action.
For OpenShift, you can find more information in the documentation: https://docs.openshift.com/container-platform/4.3/nodes/jobs/nodes-nodes-jobs.html

Deploy Apache Spark application from another application in Java, best practice

I am a new user of Spark. I have a web service that allows a user to request the server to perform a complex data analysis by reading from a database and pushing the results back to the database. I have moved those analysis's into various Spark applications. Currently I use spark-submit to deploy these applications.
However, I am curious, when my web server (written in Java) receives a user request, what is considered the "best practice" way to initiate the corresponding Spark application? Spark's documentation seems to be to use "spark-submit" but I would rather not pipe out the command to a terminal to perform this action. I saw an alternative, Spark-JobServer, which provides an RESTful interface to do exactly this, but my Spark applications are written in either Java or R, which seems to not interface well with Spark-JobServer.
Is there another best-practice to kickoff a spark application from a web server (in Java), and wait for a status result whether the job succeeded or failed?
Any ideas of what other people are doing to accomplish this would be very helpful! Thanks!
I've had a similar requirement. Here's what I did:
To submit apps, I use the hidden Spark REST Submission API: http://arturmkrtchyan.com/apache-spark-hidden-rest-api
Using this same API you can query status for a Driver or you can Kill your Job later
There's also another hidden UI Json API: http://[master-node]:[master-ui-port]/json/ which exposes all information available on the master UI in JSON format.
Using "Submission API" I submit a driver and using the "Master UI API" I wait until my Driver and App state are RUNNING
The web server can also act as the Spark driver. So it would have a SparkContext instance and contain the code for working with RDDs.
The advantage of this is that the Spark executors are long-lived. You save time by not having to start/stop them all the time. You can cache RDDs between operations.
A disadvantage is that since the executors are running all the time, they take up memory that other processes in the cluster could possibly use. Another one is that you cannot have more than one instance of the web server, since you cannot have more than one SparkContext to the same Spark application.
We are using Spark Job-server and it is working fine with Java also just build jar of Java code and wrap it with Scala to work with Spark Job-Server.

Scaling Spring on Heroku

I was wondering if someone could point me to a good tutorial or blog post on writing a spring application that can be all run in a single process for integration testing locally but when deployed will deploy different subsystems into different processes/dynos on heroku.
For example, I have services for User management, Job processing, etc. all in my web application. I want to run it just as a web application locally. But when I deploy to heroku I want to deploy just the stateless web front end to TWO dynos and then have worker dynos that I can select different services to run on. I may decide to group 2 of these services into one process or decide that each should run in its own process. Obviously when the services run in their own process they will need to transparently add some kind of transport like REST or RabbitMQ or AKKA or some such.
Any pointers on where to start looking to learn how to do this? Or am I thinking about this incorrectly and you'd like to suggest a different approach? I need to figure out how to setup the application and also how to construct maven and intellij to achieve this.
Thanks.
I can't point you to a prefabricated article or post, but I can share the direction I started down to solve a similar problem. Essentially, the proposed approach was similar to yours - put specific services with potentially long-running logic in worker dynos and pass messages via Jesque (Java port of Resque) on a RedisToGo instance (Heroku add-on). I never got the separate web vs. worker Spring contexts fully ironed out (moved on to other priorities) but the gist of it was 1) web tier app context would be configured to post messages and 2) worker app context configured to consume.
That said, I used foreman locally to simulate the Heroku environment to debug scaling (foreman start --formation="web=2" + Apache mod_proxy_http). Big Spring gotcha when you scale to 2+ dynos - make sure you are using Redis or Memcache for session storage when using webapp-runner. Spring uses HttpSession by default to store the security context... no session affinity or native Tomcat session replication.
Final caveat - in our case, none of our worker processing needed to be reflected to the end user. That said, we were using Pusher for other features (also a Heroku add-on). If you need to update the user when an async task completes, I recommend looking at it.

Categories