Convert AWS SageMaker ml model into a java library

Convert AWS SageMaker ml model into a java library - java

When using AWS SageMaker, after you complete the training of a model, SageMaker will output the model as a model.tar.gz file in a specificed S3 bucket. The next step the documentation recommends is to deploy the model onto SageMaker. However, I do not want to deploy the model. In my case, there are some service to service latency considerations for not going that route. Furthermore, I would also like to still utilize predictions from the model in offline scenarios. Has anyone been able to take the model.tar.gz and make it into a java library? What tools did you use? How did you parse the model?

Most of the machine learning in the recent years are developed in Python and it is very common and performing well to serve the model with Python environments. You can see the flow in SageMaker documentation (https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html), but this is completely open and can be achieved with NGNIX, GUnicorn.
You find some Java libraries for running some of the common machine learning algorithms, mainly: https://github.com/jpmml
Nevertheless, check if you must have Java as your run-time environment. Java will add very little value here (if any) and a lot of issues of compatibility.

Related

Is it possible to use more than one framework at the backend(Spring boot + Django)?

tl;dr: Is Spring + Django back-end possible?
When I was new to industry and was still working my way around the office, I got interested in Django and created a very small, basic-level application using the framework. When I got to meet my team after a few weeks, they said to go for Spring framework. After spending half a year on the framework and the main proj, I finally started to get time to start working off-hours. But, I don't want to lose both the skills - My teammate(when we were still in office ;) ) once told me that they worked on a project that started with python code, and then later added features using Java. And I am unable to find any helpful google searches(mostly showing Spring vs Django).
How should I go about it? Is it too much to ask for? Is it worthwhile? Will I learn some new concepts of application architecture a noob like me would have missed. Please provide me with some insight.
Are there resources(docs) I can go through?
P.S. I'm not a diehard fan of either of the frameworks right now, just another coder testing waters.

You can't write java in python.
You can extend Python with C/C++ which is quite common: Extending Python with C or C++
And about the part that they told that they added features with java:
It's common to create different parts of a project using different languages and tools. Microservice architecture is a common architecture for these kinds of use cases. You basically code different parts of the project in a language you want and then you connect all the parts using different methods like REST APIs, gRPC and etc.
Imagine you are creating a website like youtube that lets others upload videos. There is a form that users upload their files and you store them in your storage and then you have to encode the video file for different qualities. You can code the form handler using Python and Django to store the files in your storage. Then you can code another service using java that handles the encoding part which is a heavy process. When an upload is completed, you send the file or file path to your java service using an internal REST API and tell the service to start encoding the video and notify the Django service and then the Django service will publish the video on the feed that can itself be written in another language.

I would say go for 1 framework and stick with it. For example Django if you want to code in python, and spring if you want to code in java. Learning both frameworks however brings a lot of value, because you can compare their benefits (eg. spring forces you to write clean code, django has build-in and simpler database management)
I like Django's build-in tooling a lot, you only need to know python for it to work. Spring requires a bit more knowledge of eg. hibernate for database management. However I predict Django will outgrow spring at some point, because of cloud valuing fast iteration over code and quick startup time (auto-scaling apps) over large overhead apps and long boot times. Hoever, if you like java, I can recommend JHipster for java/spring webapp development to get up to speed very fast and learning the ways of REST CRUD api fast.
To combine 2 programs: write your main logic in one app, and write a small service in the second language, making sure its independent of the first app (no back and forth communication and complicated logic, but simple independent request/response, as if the main app was never there). Add a REST api to the second app and use eg. http requests to communicate.
What's possible in terms of combining languages:
connect different applications with each other: by letting them communicate through their APIs. For example a python api developed with flask or django can send requests to a java api developed with spring, as long as they have a way to communicate (eg over http, or via some queue like rabbitmq)
connect a webapp to 2 different backends: by using a shared authentication system: For example a keycloak authentication server to handle tokens, that your backend applications know about.
What's not possible (and also not preferable):
combining java with python code in the same program: there are some hacky ways to get it to work, but its asking for trouble and not readable.

External plugin framework to my JAVA application

I built a large JAVA web application using SPRING & MongoDB,
In some scenarios, I want to allow my users to upload their own code, and the application will run it later on when necessary.
I called this operation "Plugin framework", the plugin is the user's code of course which I prefer to be in NodeJS for now.
There is any recommended / known architecture for that purpose?
I've read about pf4j and senecajs, but they quite different from my needs.
Thanks!

You loose complete control over code running on node. The uploaded code can access network, files, database, you name it. That is not a good plan.
I suggest to work with the embedded JS module in Java, called rhino. Here, you define which environment the code can access.
You find samples of using the scripting in Java here http://docs.oracle.com/javase/7/docs/technotes/guides/scripting/programmer_guide/index.html for jdk7, the Javadocs https://docs.oracle.com/javase/8/docs/api/javax/script/ScriptEngine.html and here some info on Java8 changes http://www.oracle.com/technetwork/articles/java/jf14-nashorn-2126515.html
UPDATE:
On the comment below, you state that you think you are safe, if the code runs on the other server. Actually, the problem is still the same. Just it won't hit your application's server but the JS code server.
My advice stands. Implement a JS execution service using the built-in Javascript engine (Rhino or Nashorn) and restrict the running JS to a sandbox, you control the script's reach out of the box through carefully implemented env-access methods. It is actually pretty easy to get started, no more complicated than implementing a remote javascript implementation engine on top of node...

how to fit and score a machine learning models in Java/JVM based application

Could you please guide me on how to create and execute a machine learning models/statistical models (regression, Decision tree, K means clustering, Naive bayes, scorecard/linear/logistic regression etc. and GBM, GLM ) in Java/JVM based application (in production).
We have an ETL sort of Java based product where one can do most of data Preparation steps for machine learning, like data ingestion from JDBC, files, HDFS, No SQL etc., joins and aggregations etc.(which are required for Feature engineering) and now we want to add Analytics capabilities using machine learning/statistical modeling.
Right now, we are using JPMML- evaluator to score the models created in PMML format using R and python (and Knime) but it needs three separate and unconnected steps:-
1- first step for data preparation in our Java/JVM application and save the sampling data (training and test) data in csv file or in DB, -
2- Create a machine learning Model in R and python (and Knime) and export it in PMML 4.2 format -
3- Import/deploy the PMML in our Java based application and use JPMML evaluator to execute it in production.
I am sure it's a common problem in machine learning as generally in Production JAVA is preferred over Python or R. Could you suggest what is the better approach(s) to create as well as execute a python/scikit based machine learning model in JVM based application.
What are your thought to achieve the steps # 2 and #3 more seamlessly in a JVM based application, without compromising performance and usability:-
1- Call a java program which internally calls the python scikit script (under the hood) to create a model in PMML and then use JPMML evaluator. It will pretend to the user that he is in a single JVM based application (better usability). I am not sure what are the limitations and short coming of using PMML as not all features are supported in jpmml-sklearn.
2- Call a java program which internally calls the python script and do the model creation as well as execution in an external python environment and serialized the model and the results in a file/csv or in memory DB (or cache, like hazelcast) from where the parent Java application will fetch the results etc.. I researched that I can’t use Jython for executing Sci-kit models.
3- Can I use Jep (Embed Python in Java) to embed Cpython in JVM ? Does anybody tried it for sci-kit models?
Alternatively, I should explore to use Mahout or weka - java based machine learning libraries in my JVM based application. (I need to support both windows and non-windows platforms)
I am also exploring H2Oai which is java based. Does anybody tried it.

I use IntelliJ IDEA with the python plugin. This way I have both java and python code in one and the same project. The data is in the database; the connection is always visible and accessible, independently of whether I have a .java or a .py file currently in the editor. In the list of configurations you can have Python scripts, Java applications, maven goals etc.
Therefore I don't think you have to mix Python and Java code together (by calling Python scripts out of Java). That is completely unnecessary.
My workflow is (everything in IntelliJ IDEA):
1. Prepare the data (usually SQL)
2. Run python script, which applies a pipeline of transformators to the pandas data frame constructed from a certain database table and outputs a PMML.
3. Use the scikit-learn model in your java application.

If you have an ETL with HDFS backend, I would suggest deploying Spark on the cluster and using Spark's MLib machine learning algorithms. They support the methods you mentioned above.
Do you mind giving some context as to what the size (rows, columns, type) of the data that you plan to work with? Java would not be my recommended goto-language for ML but Scala compiles to JVM bytecode and has a similar syntax to java (in addition to having a Java API).
If you're producing a proof-of-concept, then Java is fine but if you're planning on working with big data, it doesn't really scale well.

I have found a decent solution for my problem. I am using H2O.ai developed in Java for scalable machine learning using open source. It offers APIs in Java (Restful API), Python, R and Scala. It has best of class algorithms for classification, Regression, Clustering etc. and seamlessly integrates with Apache Hadoop and Spark (sparkling-water) as well, if someone has Spark cluster. It also offers a deep learning algorithm which is based on a multi-layer feedforward artificial neural network. I am using Java binding API/Rest API and sometimes the low-level H2o API (for h2o 3 nodes cluster management).
I come across another java based alternative, called Smile - Statistical Machine Intelligence and Learning Engine which provides regression, classification, clustering, association rule mining, feature selection etc. Does anybody have more feedback on these or similar Java based ML library?

Embedding DITA Open Toolkit in a php based application

We are looking to integrate DITA in our web application, which in an E- Learning platform. The DITA Open Toolkit processes all files using java. Wee are looking for a solution that allows us to work with the DITA content on the fly from a php - based application.
Does anyone know of any php projects that are written to work with DITA maps and content?
After searching we came across XMLmind DITA Converter (DITAC) and
Designed to be easily embedded in any JavaTM, desktop or server-side,
application.
is one of its features. But in the documentation, only how to embed in java application is described.
Can anyone provide any help to sort it out. I dont have any idea about implementing it in our php based web application.

PHP as a dynamic XML rendering platform is limited by having only XSLT 1.0 as a native library for transforms within PHP as the logic layer. However, this standard LAMP/WAMP platform works fairly well for dynamic delivery of DITA content if you treat topics and maps as individually-addressable resources bypassing the usual multi-pass, map-driven processing.
I've been developing this concept into a DITA-based site-building tool that I've named expeDITA. I had put some earlier code for this project into SourceForge but I don't recommend using that code base--it was an RPC-based proof of concept whereas the latest version supports RESTful addressing with a front controller setup and vastly improved theming. The latest version is just about ready to put into a new project, and now that conference season is over for me, I can focus on prepping the docs and headers.
For the moment, you can check out this latest code running on a staging server at http://expedita.x10host.com/. But note that this free-hosted site seems to throttle access to the DTDs from time to time, hosing the class-based transforms for minutes at a time. Once I get the project into a repository, I'll set up a demo site on a less persnickety hosted account.
If you are looking for full DITA rendering, this is not the project for you. The typical use case here would be for any web presence for which DITA as source would be preferred over HTML. You might use it as a wiki for collecting SME contributions as DITA source, or to use DITA's filtering and flagging features to produce adaptive content for responsive themes, or to produce site content that can be aggregated as a single-page view or served via API as XML or JSON formats for consuming in mobile apps. I've even added slide capabilities that might fit into dynamic eLearning content delivery modes.
This blog post gives some background into the project and its goals: http://contelligencegroup.com/ditaperday/what-is-dita-for-the-web/ . I hope this is helpful information. Can you mention more about what goals you have for a hosted DITA application? Would the serve-on-demand model work okay for you, or do you require the map-driven extended features of DITA-OT/DITAC based processing?

How to integrate programs written in different programming languages?

I have two developers in my team. One will develop a Python application, the other will develop a Java application. The Java app generates a boolean value which is used by the Python app.
How can I integrate these applications? I have thought about using:
Return codes: Python app calls the Java app, then the Java app uses
the return code to inform the boolean value.
Sockets: Connect both
applications through sockets and exchange information. I think this
is overkill.
Files: The Java app does its stuff, writes the output to
a file, then the Python app reads this file and retrieves the boolean
value it needs.
Any other suggestions? I'm not just looking for a solution, I'm also considering here aspects such as code organization and "beauty" of the overall solution.
Edit 1:
Thank you #user2387370 for the recommendation of using Jython, but I can't use it.
Edit 2: Thank you #RickyA, I'll have a look at messaging systems (such as zeromq, which you mentioned).

Use a messaging system like zeromq. That has libraries for both languages and allows you to integrate them seamlessly.
Your proposed options will get clunky interoperability. (filelocks, dead sockets, dead processes etc..)
Also this page lists some tools that can be used for pyton/java interop. I can't recommend one since I used none.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.