GAE Datastore Backup using Java and Cron

GAE Datastore Backup using Java and Cron - java

I want to backup (and later restore) data from GAE datastore using the export facilities that went live this year. I want to use cron and java. I have found this post which points at this page but it's just for phython.
Originally I wanted to do this automatically every day using the Google Cloud Platform console but I can't find a way of doing it. Now I am resorting to incorporating it into Java and a cron job. I need restore instructions as well as backup.
I'm not interested in using the datastore admin backup as this will no longer be available next year.

According to the docs, the way to do it is, indeed, through Cron for GAE and having a GAE module call the API to export.
The point is not the code itself, but understanding why this is so.
Currently, the easiest way to schedule tasks in GCP is though Cron jobs in GAE, but those can only call GAE modules. Following the docs that you pointed out, the Cron will be quite similar to the one described there.
Regarding the handler itself, you only need to call the Datastore Admin API authenticated with an account with the proper permissions.
Since the Cloud Client Library does not have admin capabilities for Datastore, you'll have to either construct the call manually, or use the Datastore API Client Library.
Notice that, for GCP APIs, there are usually two client libraries available: the Cloud Client Library and the API Client Library. The first one is hand-crafted while the second one is auto-generated from the discovery document of each API.
If one specific functionality is not available through the Cloud Client Library (the recommended way of interacting with GCP APIs), you can always check the API Client Library for that same functionality.

Related

Manage running Java apps remotely

We have several Java standalone applications (in form of Jar files) running on multiple servers. These applications mainly read and stream data between systems. We are using Java 8 mainly in our development. I was put in charge recently. My main function is to manage and maintain these apps.
Currently, I check these apps manually by accessing these servers, check if the app is running, and sometimes run some database queries to see if the app started pulling data. My problem is that in many cases, some of these apps fail and shutdown due to data issue or edge cases without anyone noticing. We need some monitoring and application recovery in place.
We don't have docker infrastructure in place. We plan to implement docker in the future, but for now this is not an option.
After research, the following are options I thought of or solutions I tried:
Have the apps create a socket client which sends a heartbeat to a monitoring app (which needs to be developed). I am keeping this as my last option.
I tried to use Eclipse Vertx to wrap the apps into Verticles. Then create a web view that can show me status and other info. After several tries, the apps fail to parse the data correctly (might be due to my lack of understanding to Vertx library).
Have a third party solution that does this, but I have no idea what solutions are out there. I am open for suggestions.
My requirements are:
Proper monitoring of the apps running and their status.
In case of failure, the app should start again while notifying the admin/developer.
I am willing to develop a solution or implement a third party one. I need you guidance on this.
Thank you.

You could use spring-boot-actuator (see health). It comes with a built-in endpoint that has some health checks(depending on your spring-boot project), but you can create your own as well.
Then, doing a http request to http://{host}:{port}/{context}/actuator/health (replace with yours), you could see those health checks status and also use the response status code to monitor your application.

Have you heard of Java Service Wrappers? Not a full management functionality, however it would monitor for JVM crashes and out of memory conditions and restart your application for sure. Alerting should also be possible.
There is a small comparison table here: https://yajsw.sourceforge.io/#mozTocId284533
So some basic monitoring and management is included already. If you need more, I suggest using JMX (https://www.oracle.com/java/technologies/javase/javamanagement.html) or Prometheus (https://prometheus.io/ and https://github.com/prometheus/client_java)

Deploy Apache Spark application from another application in Java, best practice

I am a new user of Spark. I have a web service that allows a user to request the server to perform a complex data analysis by reading from a database and pushing the results back to the database. I have moved those analysis's into various Spark applications. Currently I use spark-submit to deploy these applications.
However, I am curious, when my web server (written in Java) receives a user request, what is considered the "best practice" way to initiate the corresponding Spark application? Spark's documentation seems to be to use "spark-submit" but I would rather not pipe out the command to a terminal to perform this action. I saw an alternative, Spark-JobServer, which provides an RESTful interface to do exactly this, but my Spark applications are written in either Java or R, which seems to not interface well with Spark-JobServer.
Is there another best-practice to kickoff a spark application from a web server (in Java), and wait for a status result whether the job succeeded or failed?
Any ideas of what other people are doing to accomplish this would be very helpful! Thanks!

I've had a similar requirement. Here's what I did:
To submit apps, I use the hidden Spark REST Submission API: http://arturmkrtchyan.com/apache-spark-hidden-rest-api
Using this same API you can query status for a Driver or you can Kill your Job later
There's also another hidden UI Json API: http://[master-node]:[master-ui-port]/json/ which exposes all information available on the master UI in JSON format.
Using "Submission API" I submit a driver and using the "Master UI API" I wait until my Driver and App state are RUNNING

The web server can also act as the Spark driver. So it would have a SparkContext instance and contain the code for working with RDDs.
The advantage of this is that the Spark executors are long-lived. You save time by not having to start/stop them all the time. You can cache RDDs between operations.
A disadvantage is that since the executors are running all the time, they take up memory that other processes in the cluster could possibly use. Another one is that you cannot have more than one instance of the web server, since you cannot have more than one SparkContext to the same Spark application.

We are using Spark Job-server and it is working fine with Java also just build jar of Java code and wrap it with Scala to work with Spark Job-Server.

Can a second GAE application access the datastore of a primary application?

If I had an application that stored information in its datastore. Is there a way to access that same datastore from a second application?

Yes you can, with the Remote APIs.
For example, you can use Remote API to access a production datastore
from an app running on your local machine. You can also use Remote API
to access the datastore of one App Engine app from a different App
Engine app.
You need to configure the servlet (see documentation for that) and import the appengine-remote-api.jar in your project (You can find it ..\appengine-java-sdk\lib\)
Only remember that Ancestor Queries with Remote APIs are not working (See this)

You didn't mention why you wanted to access the datastore of one application from another, but depending on the nature of your situation, App Engine modules might be a solution. These are structurally similar to separate applications, but they run under the same application "umbrella" and can access a common datastore.

You can not directly access datastore of another application. Your application must actively serve that data in order for another application to be able to access it. The easiest way to achieve this is via Remote API, which needs a piece of code installed in order to serve the data.
If you would like to have two separate code bases (even serving different hostnames/urls), then see the new AppEngine Modules. They give you ability to run totally different code on separate urls and with different runtime settings (instances), while still being on one application sharing all stateful services (datastore, tasks queue, memcache..).

Back-end server for Play 2 framework app

I'm planning a web application where users will be able to upload and process their files. The specifics of the application are irrelevant to my questions, but lets assume that the application will deal with mp3 audio files. I'm going to split my application in two distinct parts: the front-end and the back-end.
The front-end application will be a usual web application serving html pages to users. Typically a user will upload his file and fill an html form to specify which operations he would like to perform on the file. The files will be initially uploaded to a storage facility, such as Amazon S3, and later processed by a back-end server. I'm using Play 2.0.4 framework to develop the front-end application and this is going very well for me. I managed to implement user authorization, drafted most of the UI and also implemented file upload to S3. The application is currently deployed on Heroku without any problems.
For my back-end server I'm considering to use Play 2 framework once again. The back-end server will receive notification (http request) from the front-end server about creation of a new job. Job specification will include a link to the original user file in the storage and arguments describing the job. The job should be added to a queue. Now the most important part is to delegate the actual processing job to a third party program, which most certainly will be a compiled command line utility, such as SoX for the case of audio processing, written by good people using a programming language of their choice. As far as I know it is possible to call an external program from java, pass command line arguments and collect the result. After processing is done, the back-end server will upload processed file back to storage, and send notification (http request) to the front-end application, which will store a link to the processed file and display it to the user at some later time. To be able to use command line utility I'm going to deploy the back-end application to a Amazon EC2 instance with a Typesafe stack installation.
Here are some questions about this basic plan:
Is Play 2 a reasonable choice for the back-end, or should I look into alternatives? One of them seems to be CGI, which according to Wikipedia "is a standard method for web server software to delegate the generation of web content to executable files." Unfortunately I don't have any experience with that.
There shouldn't be any problem implementing a job queue with Play?
Is it possible to install a command line utility on EC2 and call it from Play?
Should I expect any problems installing Typesafe stack on the EC2? This post briefly describes what I'm planning to do https://www.assembla.com/spaces/bufferine/wiki/Typesafe_stack_on_Amazon_EC2
Assuming that in the future the application will grow, how would I split the jobs among multiple instances on EC2? Should I create a separate job-balancing application in between my front-end and back-end?
I would appreciate any advice! Thanks!
Note: I'm using Java api for Play 2 framework, since I'm not familiar with Scala language.

You may consider Akka for processing and it's built in Play2. It will help you to manage tasks easily, and even saving hardware ressources if used with advanced features. There is a Java API that should cover all your needs. And it's not necessary in a backend APP, if you need more power you can scale even better with two same instancies. Play and Akka are stateless, you can just add new instances to scale. To make it run on EC2, just use the play dist command.
And yes, you can install whatever you want in EC2 and call it from your app.
You may like:
http://akka.io/
http://www.playframework.com/documentation/2.1.0/JavaAkka
http://www.playframework.com/documentation/2.1.0/ProductionDist
also, but in scala
http://blog.greweb.fr/2013/01/playcli-play-iteratees-unix-pipe/
http://blog.greweb.fr/2012/11/play-framework-enumerator-outputstream/

Can I run locally and debug a Web App that uses Google API's? [GWT+GAE]

I'm working on a web-app using Google App Engine with GWT, and I need to use Google API's (Google Calendar, Documents and so...).
As I know, I must configure a domain with Google to set my domain as callback of an OAuth Authentication. Am I right?
If so, am I forced to deploy on GAE to test? I mean, I can't run locally because my localhost can't be a valid callback.
Do you know any way to debug locally even using Google API's?
I have recommended to config a DynDNS, but isn't a solution in short term (incompatible router)...

If you use AuthSub instead I don't think you need to register a domain. The user just need a google account.
I have in the past used AuthSub together with Google Docs/Spreadsheet APIs on GAE and also been able to test it locally.
I can unfortunately not give you my code and exact solution (it was a while ago). But one of the samples I used extensively to base my code on was the FetcherServlet, check this code out:
http://code.google.com/p/google-app-engine-samples/source/browse/trunk/retrieving-gdata-feeds-java/src/com/google/appengine/demo/web/FetcherServlet.java?r=122
Also, I guess you might already have read this page (but their FetcherServlet uses OAuth, not AuthSub), so maybe just use it for some background info:
http://code.google.com/appengine/articles/java/retrieving_gdata_feeds.html

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.