Py4j launch_gateway not connecting properly

Py4j launch_gateway not connecting properly - java

I am trying to use py4j to open up a gateway that I can use to pass objects from java into python. When I try to open a gateway with the py4j function launch_gateway it does not seem to properly connect to my Java class. However, when I launch my java class in the command line and then connect to it in python using JavaGateway everything works as expected. I would like to be able to use the built in method as I am sure that I am not accounting for things that have already been considered in the design of py4j, but I'm just not sure what I'm doing wrong.
Let's say I wanted to create a gateway to the class sandbox.demo.solver.UtilityReporterEntryPoint.class. In the command line I can do this by executing the following:
java -cp /Users/grr/anaconda/share/py4j/py4j0.10.4.jar: sandbox.demo.solver.UtilityReporterEntryPoint py4j.GatewayServer
This launches as expected and I can use the methods in my class from within python after connecting to the gateway. So far so good.
My understanding of the py4j documentation would lead me to believe I should do the following to launch the gateway in python:
port = launch_gateway(classpath='sandbox.demo.solver.UtilityReporterEntryPoint')
params = GatewayParameters(port=port)
gateway= JavaGateway(gateway_parameters=params)
I get no errors when executing these three lines, but when I try to access my java class methods with gateway.entry_point.someMethod() it fails with the following error:
Py4JError: An error occurred while calling t.getReport. Trace:
py4j.Py4JException: Target Object ID does not exist for this gateway :t
at py4j.Gateway.invoke(Gateway.java:277)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
Obviously something is not getting called correctly within launch_gateway or I am feeding it the wrong information.
In the py4j source code for launch_gateway you can see that given the inputs you provide and those constructed by the function, a command is constructed that eventually gets called by subprocess.Popen. So given the input passed to launch_gateway above the command passed into Popen would be:
command = ['java', '-classpath', '/Users/grr/anaconda/share/py4j/py4j0.10.4.jar:sandbox.demo.solver.UtilityReporterEntryPoint', 'py4j.GatewayServer', '0']
Passing this command to Popen returns the listening port as expected. However, connecting to this listening port still does not allow access to my class methods.
Finally, passing the command as a single string to Popen without the final argument ('0'), properly launches a gateway which again operates as expected. Having taken a glance at the Java source code for py4j.GatewayServer.class this makes no sense as the main method seems to indicate that the class should exit with status 1 if the length of arguments is 0.
At this point I'm kind of at a loss. I can hack my way into a workable solution, but as I said I'm sure that ignores important aspects of the gateway behavior and I don't like hacky solutions. I'd love to tag #Barthelemy in this one, but hopefully he reads this. Thanks in advance for any help.
EDIT
For now I have been able to work around this issue with the following steps.
Package entire project including all external dependencies into a single jar file magABM-all.jar, with 'Main-Class' set to UtilityReporterEntryPoint.
Include if...else block regarding presence of --die-on-exit exactly like it is in GatewayServer.java
Use subprocess.Popen to call the command to run the project jar.
UtilityReporterEntryPoint.java
public static void main(String[] args) throws IOException {
GatewayServer server = new GatewayServer(new UtilityReporterEntryPoint());
System.out.println("Gateway Server Started");
server.start();
if (args[0].equals("--die-on-exit")) {
try {
BufferedReader stdin = new BufferedReader(new InputStreamReader(System.in, Charset.forName("UTF-8")));
stdin.readLine();
System.exit(0);
} catch (java.io.IOException e) {
System.exit(1);
}
}
}
app.py
def setup_gateway()
"""Launch a py4j gateway using UtilityReporterEntryPoint."""
process = subprocess.Popen('java -jar magABM-all.jar --die-on-exit', shell=True)
time.sleep(0.5)
gateway = JavaGateway()
return gateway
In this way I can still use gateway.shutdown if necessary and if the python process that starts the py4j gateway dies or is closed the gateway will be closed.
N.B I would by no means consider this a final solution as py4j was written by much smarter individuals with a clear purpose in mind and I am sure that there is a way to manage this exact workflow within the confines of py4j. This is just a stopgap solution.

There are a few issues:
The classpath parameter in launch_gateway should be a directory or a jar file, not a class name. For example, if you want to include additional Java libraries, you would add them to the classpath parameter.
The error you receive when you call gateway.entry_point.someMethod() means that you have no entry point. When you call launch_gateway, the JVM is started with GatewayServer.main, which launches a GatewayServer with no entry point: GatewayServer server = new GatewayServer(null, port). It is not possible currently to use launch_gateway and specify an entry point.
When you start the JVM with java -cp /Users/grr/anaconda/share/py4j/py4j0.10.4.jar: sandbox.demo.solver.UtilityReporterEntryPoint py4j.GatewayServer I believe the JVM uses UtilityReporterEntryPoint as the main class. Although you did not provide the code, I assume that this class has a main method and that it launches a GatewayServer with an instance of UtilityReporterEntryPoint as the entry point. Note that there is a whitespace between the colon and the class name so UtilityReporterEntryPoint is seen as the main class and not as being part of the classpath.

Related

Fortify issue - Command Injection

I am trying to do hp fortify security scan for my java application. I have few issues and i have fixed it. But i am unable to find the fix for the below issue.
Command Injection
String hostname = execReadToString("hostname").split("\\.")[0];
public static String execReadToString(String execCommand) throws IOException {
try (Scanner s = new Scanner(Runtime.getRuntime().exec(execCommand).getInputStream()).useDelimiter("\\A")) {
return s.hasNext() ? s.next() : "";
}
The method execReadToString() calls exec() to execute a command. This call might allow an attacker to inject malicious commands.
So i have tried with process builder also.
private static void gethostname(String cmd1) throws IOException {
if(Pattern.matches("[A-Za-z]+", cmd1)) {
ProcessBuilder pb = new ProcessBuilder(cmd1);
Process p = pb.start();
BufferedReader reader = new BufferedReader(new InputStreamReader(
p.getInputStream()));
String readline;
while ((readline = reader.readLine()) != null) {
System.out.println(readline);
}
}
}
Even this is giving me an security issue This start() call might allow an attacker to inject malicious commands.
What will be the ideal fix for this issue?
Thanks in advance

Usually this is because you're using user input to frame the command string, wherein user can inject malicious code to manipulate what command is being run ultimately (even if you add validation there will be ways to circumvent that).
In your case you seem to be hardcoding the command so this shouldn't be a problem, however, see the OWASP page on hardcoded command invocation (emphasis mine):
Unlike the previous examples, the command in this example is
hardcoded, so an attacker cannot control the argument passed to
system(). However, since the program does not specify an absolute path
for make, and does not scrub any environment variables prior to
invoking the command, the attacker can modify their $PATH variable to
point to a malicious binary named make and execute the CGI script from
a shell prompt. And since the program has been installed setuid root,
the attacker's version of make now runs with root privileges.
The environment plays a powerful role in the execution of system
commands within programs. Functions like system() and exec() use the
environment of the program that calls them, and therefore attackers
have a potential opportunity to influence the behavior of these calls.
Resolution:
Use native Java APIs / libraries to achieve what you want, instead of running a command - this is probably the best option. Use commands only when unavoidable, eg: 3rd party tools which do not have a Java client library. This approach has the added advantage of being more portable and in most cases, more efficient too. This library might help your scenario.
If you have to run a command, ensure you do not use user-supplied or external data even indirectly to construct it.
Or if you're hardcoding the command to run from the code, use absolute path to the command and do not use environment variables as part of it. For hostname (assuming you use the built-in command) this is usually /usr/bin/hostname but you can find the command path for your environment using which hostname.

Using java code in the django framework

Okay, so I have a simple interface that I designed with the Django framework that takes natural language input from a user and stores it in table.
Additionally I have a pipeline that I built with Java using the cTAKES library to do named entity recognition i.e. it will take the text input submitted by the user and annotate it with relevant UMLS tags.
What I want to do is take the input given from the user then once, its submitted, direct it into my java-cTAKES pipeline then feed the annotated output back into the database.
I am pretty new to the web development side of this and can't really find anything on integrating scripts in this sense. So, if someone could point me to a useful resource or just in the general right direction that would be extremely helpful.
=========================
UPDATE:
Okay, so I have figured out that the subprocess is the module that I want to use in this context and I have tried implementing some simple code based on the documentation but I am getting an
Exception Type: OSError
Exception Value: [Errno 2] No such file or directory
Exception Location: /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py in _execute_child, line 1335.
A brief overview of what I'm trying to do:
This is the code I have in views. Its intent is to take text input from the model form, POST that to the DB and then pass that input into my script which produces an XML file which is stored in another column in the DB. I'm very new to django so I'm sorry if this is an simple fix, but I couldn't find any documentation relating django to subprocess that was helpful.
def queries_create(request):
if not request.user.is_authenticated():
return render(request, 'login_error.html')
form = QueryForm(request.POST or None)
if form.is_valid():
instance = form.save(commit=False)
instance.save()
p=subprocess.Popen([request.POST['post'], './path/to/run_pipeline.sh'])
p.save()
context = {
"title":"Create",
"form": form,
}
return render(request, "query_form.html", context)
Model code snippet:
class Query(models.Model):
problem/intervention = models.TextField()
updated = models.DateTimeField(auto_now=True, auto_now_add=False)
timestamp = models.DateTimeField(auto_now=False, auto_now_add=True)
UPDATE 2:
Okay so the code is no longer breaking by changing the subprocess code as below
def queries_create(request):
if not request.user.is_authenticated():
return render(request, 'login_error.html')
form = QueryForm(request.POST or None)
if form.is_valid():
instance = form.save(commit=False)
instance.save()
p = subprocess.Popen(['path/to/run_pipeline.sh'], stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
(stdoutdata, stderrdata) = p.communicate()
instance.processed_data = stdoutdata
instance.save()
context = {
"title":"Create",
"form": form,
}
return render(request, "query_form.html", context)
However, I am now getting a "Could not find or load main class pipeline.CtakesPipeline" that I don't understand since the script runs fine from the shell in this working directory. This is the script I am trying to call with subprocess.
#!/bin/bash
INPUT=$1
OUTPUT=$2
CTAKES_HOME="full/path/to/CtakesClinicalPipeline/apache-ctakes-3.2.2"
UMLS_USER="####"
UMLS_PASS="####"
CLINICAL_PIPELINE_JAR="full/path/to/CtakesClinicalPipeline/target/
CtakesClinicalPipeline-0.0.1-SNAPSHOT.jar"
[[ $CTAKES_HOME == "" ]] && CTAKES_HOME=/usr/local/apache-ctakes-3.2.2
CTAKES_JARS=""
for jar in $(find ${CTAKES_HOME}/lib -iname "*.jar" -type f)
do
CTAKES_JARS+=$jar
CTAKES_JARS+=":"
done
current_dir=$PWD
cd $CTAKES_HOME
java -Dctakes.umlsuser=${UMLS_USER} -Dctakes.umlspw=${UMLS_PASS} -cp
${CTAKES_HOME}/desc/:${CTAKES_HOME}/resources/:${CTAKES_JARS%?}:
${current_dir}/${CLINICAL_PIPELINE_JAR} -
-Dlog4j.configuration=file:${CTAKES_HOME}/config/log4j.xml -Xms512M -Xmx3g
pipeline.CtakesPipeline $INPUT $OUTPUT
cd $current_dir
I'm not sure how to go about fixing this error so any help is appreciated.

If I understand you correctly, you want to pipe the value of request.POST['post'] to the program run_pipeline.sh and store the output in a field of your instance.
You are calling subprocess.Popen incorrectly. It should be:
p = subprocess.Popen(['/path/to/run_pipeline.sh'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
Then pass in the input and read the output
(stdoutdata, stderrdata) = p.communicate()
Then save the data, e.g. in a field of your instance
instance.processed_data = stdoutdata
instance.save()
I suggest you first make sure to get the call to the subprocess working in a Python shell and then integrate it in your Django app.
Please note that creating a (potentially long-running) subprocess in a request is really bad practice and can lead to a lot of problems. The best practice is to delegate long-running tasks in a job queue. For Django, Celery is probably most commonly used. There is a bit of setup involved, though.

How can I adapt my code to be used by different entities?

I have started adapting an older part of my code base and realised I had made the mistake of overusing the System.out.printf() method. Previously the class would handle commands given by the CLI user who was operating the server, however now I am adding the capability for connecting clients to essentially become administrators (assuming they have been issued with the admin status by the initial CLI user).
In order to save rewriting a lot of my code base I figured the best idea would be to issue certain commands given by the admin clients using the same class/methods as the CLI user (So the client's command has the exact same effect as a CLI user's command and so the client may see the same output a CLI user would).
My problem is the method that I am using for the CLI user's commands over-uses the printf() method from System.out for command output. How can I adapt this class so that both CLI users and clients may obtain the same output.
Things to note:
client refers to a Socket connection of user that is connecting remotely and is using a username that is registered with the server.
The output of some of the methods contain strings that need to be given in "real-time" correspondence to the event; therefore returning the output String from the method would not be suitable in this scenario.
The following is a very rough 'pseudo' copy of the class outlining the issue. I am willing to show people the main class through a GitHub link or similar, but I did not want to initially swamp this question with code.
Code
public boolean executeCommand(String[] command) {
switch (command[0]) {
case "kill":
return kill(command);
case "clients":
if (!clientList.isEmpty())
for (String username: clientList.keySet())
System.out.printf("%s\t%s\n\n", username, clientList.get(username).getAddress());
else
System.out.println("No clients connected!");
return true;
// ...and so on
default:
System.out.printf("\"%s\": command unknown.\n Type \"help\" for a list of commands.\n", command[0]);
}
}
private boolean kill(String[] args) {
args[1].disconnect(args[2]);
System.out.printf("Killed %s with reason %s", args[1], args[2]);
}

You can create your own PrintStream and assign it to System.out before you call your legacy code. All the calls to System.out will be written instead to your stream.

Sourcing r-files only once on Rserve

I have written a small Java client which does some calculations on an Rserver. For this purpose, there are functions.r- and libraries.r files on the server side, which have to be sourced, before the actual calculation can be done.
Currently I load the files on every new connection:
import org.rosuda.REngine.Rserve.RConnection;
public class RserveTester {
public void doOnRserve() {
RConnection c = new RConnection( "rserve.domain.local" );
c.login( "foo", "user" );
c.eval("source(\"/home/rserve/lib/libraries.r\")");
c.eval("source(\"/home/rserve/lib/functions.r\")");
c.eval( "someCalculation()" )
c.close();
}
}
where doOnRserve() is called due to some events on the client side couple of times in a minute.
My Question is: Is it possibility to source the libraries only once, such that they are available during all new RSessions without individual sourcing?
I tried on the client side something like:
c.serverSource("/home/rserve/lib/libraries.r" )
c.serverSource("/home/rserve/lib/functions.r" )
Which gives me te following exception (no idea why this does not work wile eval does):
org.rosuda.REngine.Rserve.RserveException: serverSource failed, request status: access denied (local to the server)
Can I start the Rserve with a specific .Rprofile?
EDIT:
Basically, there seam to be three possible methods:
Let the /home/rserve/.Rprofile source the .r files. But this seams to source them each time I call new RConnection()
Passing the source commands directly to R when starting Rserve (no idea how to do this).
My preferred method: doing it from the client side using serverSource(), which throws these "access denied" exceptions.
EDIT2:
Rserve version v0.6-8 (338)
R version 2.15.2 for x86_64-pc-linux-gnu.

This is trivially done by adding source lines to your configuration file, i.e., putting
source "/foo/bar.R"
in /etc/Rserv.conf will source /foo/bar.R on startup. If you want to use another config file, use --RS-conf command line argument to specify it. Finally, Rserve 1.x supports --RS-source option on the command line as well.
Without the quotations in the filepath, it may give File Not Found Error.
BTW: you mentioned serverSource() access denied - that means you did not enable control commands in Rserve (control enable in the configuration or --RS-enable-control on the command line).
PS: Please use stats-rosuda-devel mailing list for Rserve questions.

Yes you can. Always remember this:
R> fortunes::fortune("Yoda")
Evelyn Hall: I would like to know how (if) I can extract some of the information
from the summary of my nlme.
Simon Blomberg: This is R. There is no if. Only how.
-- Evelyn Hall and Simon 'Yoda' Blomberg
R-help (April 2005)
R>
Or as the documentation for Rserve states:
\description{ Starts Rserve in daemon mode (unix only).
Any additional
parameters not related to Rserve will be passed straight to the
underlying R. For configuration, usage and command line parameters
please consult the online documentation at
http://www.rforge.net/Rserve. Use \code{R CMD Rserve --help} for a
brief help.

Using Inline::Java in perl with threads

I am writing a trading program in perl with the newest Finance::InteractiveBrokers::TWS
module. I kick off a command line interface in a separate thread at the
beginning of my program but then when I try to create a tws object, my program
exits with this message:
As of Inline v0.30, use of the Inline::Config module is no longer supported or
allowed. If Inline::Config exists on your system, it can be removed. See the
Inline documentation for information on how to configure Inline. (You should
find it much more straightforward than Inline::Config :-)
I have the newest versions of Inline and Inline::Java. I looked at TWS.pm and it doesn't seem to be using Inline::Config. I set 'SHARED_JVM => 1' in the 'use Inline()' and 'Inline->bind()' calls in TWS.pm but that did not resolve the issue...
My Code:
use Finance::InteractiveBrokers::TWS;
use threads;
use threads::shared;
our $callback;
our $tws;
my $interface = UserInterface->new();
share($interface);
my $t = threads->create(sub{$interface->runUI()});
$callback= TWScallback->new();
$tws = Finance::InteractiveBrokers::TWS->new($manager); #This is where the program fails

So is Inline::Config installed on your system or not? A cursory inspection of the code is not sufficient to tell whether Perl is loading a module or not. There are too many esoteric ways (some intentional and some otherwise) to load a package or otherwise populate a namespace.
The error message in question comes from this line of code in Inline.pm:
croak M14_usage_Config() if %main::Inline::Config::;
so something in your program is populating the Inline::Config namespace. You should do what the program instructs you to do: find out where Inline/Config.pm is installed on your system (somewhere in your #INC path) and delete it.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.