run python sklearn classifier from java - java

I trained a SVC classifier in python using Sklearn and other libraries. I did it through building pipeline(sklearn)
I am able to dump the trained model in pickle file and made another python script which would load the pickle file and takes input from command line to do prediction. I am able to call this python script from java and its working fine.
Only issue is that it takes a lot of time, as I have nltk, numpy, panda libraries called in the python script, required for the preprocessing of the input argument. I am calling this python script multiple times and that's increasing the time.
How can I work around this issue.
thats how my pipleline looks
pipeline = Pipeline([
# Use FeatureUnion to combine the features from dataset
('union', FeatureUnion(
transformer_list=[
# Pipeline for getting POS
('ngrams', Pipeline([
('selector', ItemSelector(key='Sentence')),
('vect', CountVectorizer(analyzer='word')),
('tfidf', TfidfTransformer()),
])),
],
# weight components in FeatureUnion
transformer_weights={
'ngrams': 0.7,
},
)),
# Use a SVC classifier on the combined features
('clf', LinearSVC()),
])

Here's an example of setting a simple FLASK serving REST API for a scikit model.
import sys
import os
import time
import traceback
from flask import Flask, request, jsonify
from sklearn.externals import joblib
app = Flask(__name__)
model_directory = 'model'
model_file_name = '%s/model.pkl' % model_directory
# These will be populated at training time
clf = None
#app.route('/predict', methods=['POST'])
def predict():
if clf:
try:
json_ = request.json
# query = get the payload from the json and feed it to your model
prediction = list(clf.predict(query))
return jsonify({'prediction': prediction})
except Exception, e:
return jsonify({'error': str(e), 'trace': traceback.format_exc()})
else:
return 'no model here'
if __name__ == '__main__':
try:
port = int(sys.argv[1])
except Exception, e:
port = 80
try:
clf = joblib.load(model_file_name)
print 'model loaded'
app.run(host='0.0.0.0', port=port, debug=True)

Related

Multiple stdin/stdout actions during one process call

I use Google Closure Compiler to compile automatically javascript using PHP (is needed to do it that way - in PHP, hovewer no security limitations on Windows machine). I wrote simple PHP script which calls process, pass .js content to stdin and receive recompiled .js via stdout. It works fine, problem is, when I compiling for example 40 .js files, it takes on strong machine almost 2 minutes. However, mayor delay is because java starts new instance of .jar app for every script. Is there any way how to modify script below to create process only one and send/receive .js content multiple times before process ends?
function compileJScript($s) {
$process = proc_open('java.exe -jar compiler.jar', array(
0 => array("pipe", "r"), 1 => array("pipe", "w")), $pipes);
if (is_resource($process)) {
fwrite($pipes[0], $s);
fclose($pipes[0]);
$output = stream_get_contents($pipes[1]);
fclose($pipes[1]);
if (proc_close($process) == 0) // If fails, keep $s intact
$s = $output;
}
return $s;
}
I can see several options, but don't know if it is possible and how to do it:
Create process once and recreate only pipes for every file
Force java to keep JIT-ed .jar in memory for much faster re-executing
If PHP can't do it, is possible to use bridge (another .exe file which will start fast every time, transfer stdin/out and redirects it to running compiler; if something like this even exists)
This is really a matter of coordination between the two process.
Here I wrote a quick 10-minutes script (just for the fun) that launches a JVM and sends an integer value, which java parses and returns incremented.. which PHP will just send it back ad-infinitum..
PHP.php
<?php
echo 'Compiling..', PHP_EOL;
system('javac Java.java');
echo 'Starting JVM..', PHP_EOL;
$pipes = null;
$process = proc_open('java Java', [0 => ['pipe', 'r'],
1 => ['pipe', 'w']], $pipes);
if (!is_resource($process)) {
exit('ERR: Cannot create java process');
}
list($javaIn, $javaOut) = $pipes;
$i = 1;
while (true) {
fwrite($javaIn, $i); // <-- send the number
fwrite($javaIn, PHP_EOL);
fflush($javaIn);
$reply = fgetss($javaOut); // <-- blocking read
$i = intval($reply);
echo $i, PHP_EOL;
sleep(1); // <-- wait 1 second
}
Java.java
import java.util.Scanner;
class Java {
public static void main(String[] args) {
Scanner s = new Scanner(System.in);
while (s.hasNextInt()) { // <-- blocking read
int i = s.nextInt();
System.out.print(i + 1); // <-- send it back
System.out.print('\n');
System.out.flush();
}
}
}
To run the script simply put those files in the same folder and do
$ php PHP.php
you should start seeing the numbers being printed like:
1
2
3
.
.
.
Note that while those numbers are printed by PHP, they are actually generated by Java
I don't think #1 from your list is possible because compiler.jar would need to have native support for keeping the process alive, which it doesn't (and if you consider that a compression algorithm needs the entire input before it can start processing data, it makes sense that the process doesn't stay alive).
According to Anyway to Boost java JVM Startup Speed? some people have been able to reduce their jvm startup times with nailgun
Nailgun is a client, protocol, and server for running Java programs
from the command line without incurring the JVM startup overhead.
Programs run in the server (which is implemented in Java), and are
triggered by the client (written in C), which handles all I/O.

How to save a Java object in Jython/Python

I'm building a Python UI using Tkinter. For the needs of the program, I've to connect Python with Java to do some stuff, so I'm using a simple Jython script as a linker. I cant use Tkinter with Jython because it's not supported.
Python (ui.py) -> Jython (linker.py) -> Java (compiled in jars)
To call the Jython function in Python I use subprocess as follows:
ui.py:
cmd = 'jython linker.py"'
my_env = os.environ
my_env["JYTHONPATH"] = tons_of_jars
subprocess.Popen(cmd, shell=True, env=my_env)
Then, in the Jython file, linker.py, I import the Java classes already added on the JYTHONPATH, and I create an object with the name m and call to some functions of the Java class.
linker.py:
import handler.handler
m = handler.handler(foo, moo, bar)
m.schedule(moo)
m.getFullCalendar()
m.printgantt()
The thing is that I've created a m object, that will be destroyed after the execution of jython linker.py ends.
So the question is: Is possible to save that m object somewhere so I can call it from ui.py whenever I want? If it's not possible, is there any other way to do this?
Thanks in advance.
I finally solved it by using ObjectOutputStream.
from java import io
def saveObject(x, fname="object.bin"):
outs = io.ObjectOutputStream(io.FileOutputStream(fname))
outs.writeObject(x)
outs.close()
def loadObject(fname="object.bin"):
ins = io.ObjectInputStream(io.FileInputStream(fname))
x=ins.readObject()
ins.close()
return x

Pass real data to the Storms Spout using Non-JVM language in Twitter-Storm

I'm having trouble to understand how to pass real data to the Spout,
For example:
I have this two files (they are working fine):
#! /usr/bin/env python
import os, random, sys, time
for i in xrange(50):
print("%s\t%s"%(os.getpid(), i))
sys.stdout.flush()
time.sleep(random.randint(0,5))
And
#! /usr/bin/env python
from __future__ import print_function
from select import select
from subprocess import Popen,PIPE
p = Popen(['./rand_lines.py'], stdout=PIPE, bufsize=1, close_fds=True, universal_newlines=True)
timeout = 0.1 # seconds
while p:
# remove finished processes from the list
if p.poll() is not None: # process ended
print(p.stdout.read(), end='') # read the rest
p.stdout.close()
processes.remove(p)
# wait until there is something to read
rlist = select([p.stdout], [],[], timeout)[0]
# read a line from each process that has output ready
for f in rlist:
print(f.readline(), end='') #NOTE: it can block
Now imagine that I want to pass those random lines to the spout for the future processing, I was trying this:
from uuid import uuid4
from select import select
from subprocess import Popen,PIPE
import storm
class TwitterSpout(storm.Spout):
def initialize(self, conf, context):
self.pid = os.getpid()
try:
self.p= Popen(['./rand_lines.py'], stdout=PIPE, bufsize=1, close_fds=True, universal_newlines=True)
except OSError, e:
self.log('%s'%e)
sys.exit(1)
and than in nextTuple():
def nextTuple(self):
timeout = 0.1 # seconds
while self.p:
# remove finished processes from the list
if self.p.poll() is not None: # process ended
self.log ("%s"%self.p.stdout.read()) # read the rest
self.p.stdout.close()
processes.remove(self.p)
# wait until there is something to read
rlist = select([self.p.stdout], [],[], timeout)[0]
# read a line from each process that has output ready
for f in rlist:
self.log ("%s%s"%f.readline()) #NOTE: it can block
msgId = random.randint(0,500)
self.log('MSG IN SPOUT %s\n'%msgId)
storm.emit([f.readline()], id=msgId)
But this structure doesn't work, I'm always getting error "Pipi seems to be broken..." or if I try different variations of this code I am blocking the process, and Storm never riches the NextTuple. Please help me to solve my problem, or if someone can give me some example how to do similar thing, or just some advice.
Thank you
There could be multiple issues.
There is no break in the while loop -- infinite loop.
You call f.readline() twice. You probably meant to call it only once after each select.
To avoid blocking, use data = os.read(f.fileno(), 1024) after select.
I don't know whether it is acceptable to block nextTuple() until the child process exits.
If all you do is read lines from the subprocess then you don't need the select:
def iter_lines(*args, DEVNULL=open(os.devnull, 'r+')):
p = Popen(args, stdin=DEVNULL, stdout=PIPE, stderr=DEVNULL,
bufsize=1, close_fds=True)
for line in iter(p.stdout.readline, b''): # expect b'\n' newline
yield line
p.stdout.close()
raise StopIteration(p.wait())
Example:
# ...
self.lines = iter_lines(sys.executable, '-u', 'rand_lines.py')
#...
def nextTuple(self):
try:
line = next(self.lines).decode('ascii', 'ignore')
except StopIteration as e:
self.exit_status = e.args[0]
else:
storm.emit([line.strip()])

Interact with java program using python [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Calling Java app with “subprocess” from Python and reading the Java app output
Basically what I am looking for is, I want to interact with java program while its running using python so I can access its output and pass input to it.
I have managed to run a Java program using python. I want to know can i access the outputs of java program in my python program.
For example.
In java program: System.out.println("Enter no.");
In python i should be able to get "Enter no" as string and also pass value to java program from python.a
What I managed to do till no :
Python program :
import sys
import os.path,subprocess
def compile_java(java_file):
subprocess.check_call(['javac', java_file])
def execute_java(java_file):
java_class,ext = os.path.splitext(java_file)
cmd = ['java', java_class]
subprocess.call(cmd, shell=False)
def run_java(java_file):
compile_java(java_file)
execute_java(java_file)
Java Program :
import java.io.*;
import java.util.*;
class Hi
{
public static void main(String args[])throws IOException
{
Scanner t=new Scanner(System.in);
System.out.println("Enter any integer");
int str1=t.nextInt();
System.out.println("You entered"+str1);
}
}
Thanx :)
If all you need is to get the output from a non-interactive execution of your Java program, use subprocess.check_output instead of subprocess.call.
http://docs.python.org/library/subprocess.html
You need Python 2.7 or newer for check_output to be available.
If you need to interact with the Java program, you can do so using Popen.communicate, where you can read the process's output and send stuff to its input using file descriptors.
You can also use the pexpect python library to automate this kind of interaction, pexpect abstracts a lot of the legwork involved in using Popen.communicate.
Note that these techniques apply for any kind of executable you need your Python program to interact with, not just Java; as long as it uses stdin and stdout, using these calls should work for you.
The easiest way would be to use Jython, which is a complete Python implementation that runs in the JVM, and can interact with native Java code. But if you want to use CPython, and generally continue down the path you've sketched out above, you'll want to create a live Python Popen object that you can interact with. For example:
import sys
import os.path,subprocess
def compile_java(java_file):
subprocess.check_call(['javac', java_file])
def execute_java(java_file):
java_class,ext = os.path.splitext(java_file)
cmd = ['java', java_class]
return subprocess.Popen(cmd, shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
def run_java(java_file):
compile_java(java_file)
process = execute_java(java_file)
for i in range(10):
process.stdin.write(str(i) + "\n")

Python: How can I execute a jar file through a python script

I have been looking for an answer for how to execute a java jar file through python and after looking at:
Execute .jar from Python
How can I get my python (version 2.5) script to run a jar file inside a folder instead of from command line?
How to run Python egg files directly without installing them?
I tried to do the following (both my jar and python file are in the same directory):
import os
if __name__ == "__main__":
os.system("java -jar Blender.jar")
and
import subprocess
subprocess.call(['(path)Blender.jar'])
Neither have worked. So, I was thinking that I should use Jython instead, but I think there must a be an easier way to execute jar files through python.
Do you have any idea what I may do wrong? Or, is there any other site that I study more about my problem?
I would use subprocess this way:
import subprocess
subprocess.call(['java', '-jar', 'Blender.jar'])
But, if you have a properly configured /proc/sys/fs/binfmt_misc/jar you should be able to run the jar directly, as you wrote.
So, which is exactly the error you are getting?
Please post somewhere all the output you are getting from the failed execution.
This always works for me:
from subprocess import *
def jarWrapper(*args):
process = Popen(['java', '-jar']+list(args), stdout=PIPE, stderr=PIPE)
ret = []
while process.poll() is None:
line = process.stdout.readline()
if line != '' and line.endswith('\n'):
ret.append(line[:-1])
stdout, stderr = process.communicate()
ret += stdout.split('\n')
if stderr != '':
ret += stderr.split('\n')
ret.remove('')
return ret
args = ['myJarFile.jar', 'arg1', 'arg2', 'argN'] # Any number of args to be passed to the jar file
result = jarWrapper(*args)
print result
I used the following way to execute tika jar to extract the content of a word document. It worked and I got the output also. The command I'm trying to run is "java -jar tika-app-1.24.1.jar -t 42250_EN_Upload.docx"
from subprocess import PIPE, Popen
process = Popen(['java', '-jar', 'tika-app-1.24.1.jar', '-t', '42250_EN_Upload.docx'], stdout=PIPE, stderr=PIPE)
result = process.communicate()
print(result[0].decode('utf-8'))
Here I got result as tuple, hence "result[0]". Also the string was in binary format (b-string). To convert it into normal string we need to decode with 'utf-8'.
With args: concrete example using Closure Compiler (https://developers.google.com/closure/) from python
import os
import re
src = test.js
os.execlp("java", 'blablabla', "-jar", './closure_compiler.jar', '--js', src, '--js_output_file', '{}'.format(re.sub('.js$', '.comp.js', src)))
(also see here When using os.execlp, why `python` needs `python` as argv[0])
How about using os.system() like:
os.system('java -jar blabla...')
os.system(command)
Execute the command (a string) in a subshell. This is implemented by calling the Standard C function system(), and has the same limitations. Changes to sys.stdin, etc. are not reflected in the environment of the executed command.

Categories