Python Jpype can't see Java classes in JAR file - java

I am trying to work through what should be a simple problem, but am missing something. I have a very simple Java class:
package blah.blah;
public class Tester {
public String testMethod1() { return "GotHere"; }
public String testMethod2() { return "GotHereToo"; }
} // Tester
I am trying to get it to load into JPype:
import jpype
import jpype.imports
from jpype.types import *
path_to_jvm = "C:\\Java\\OracleJDK-8_241_x64\\jre\\bin\\server\\jvm.dll"
jpype.startJVM(path_to_jvm, classpath=["C:\\Users\\Administrator\\eclipse-workspace\\JpypeTest\\src;"])
from java.lang import System
print(System.getProperty("java.class.path"))
from java.io import ObjectInputStream
from blah.blah import Tester
tester = Tester()
print(tester.testMethod1())
print(tester.testMethod2())
jpype.shutdownJVM()
Everything works fine when I load the class directly using the above code. It stops working when I try to get it to load the Class file via a JAR.
jpype.startJVM(path_to_jvm, classpath=["C:\\Users\\Administrator\\JpypeTest.jar;"])
The error I get is:
Traceback (most recent call last):
File ".\jpype_test.py", line 16, in <module>
from blah.blah import Tester
ModuleNotFoundError: No module named 'blah'
PS C:\Users\Administrator> python .\jpype_test.py
C:\Users\Administrator\JpypeTest.jar;
Traceback (most recent call last):
File ".\jpype_test.py", line 16, in <module>
from blah.blah import Tester
ModuleNotFoundError: No module named 'blah'
I have built the JAR file a few different ways from Eclipse 2020/09 using JAR export and from the command line using:
C:\Users\Administrator\eclipse-workspace\JpypeTest\src>"C:\Java\OracleJDK-8_241_x64\bin"\jar cf C:\Users\Administrator\JpypeTest.jar blah\blah\*.class
I have confirmed that the JVM and the python interpreter are both 64 bit. I have also done my best to ensure the JVM is exactly the same between Eclipse, the command line and Python.
From what I can see, the JAR looks fine regardless of how I build it. It contains the blah\blah directory and the Tester.class file under it. The manifest is the only other file in the JAR.
I have also tried creating the JAR file with the class file at different directory levels (ie. one level of blah directories and no levels of blah directory).
Here are the versions of the python software:
Python 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Package Version
------------ -------------------
certifi 2020.6.20
JPype1 1.0.2
pip 20.2.3
setuptools 50.3.0.post20201006
wheel 0.35.1
wincertstore 0.2
I clearly have the base plumbing working since it can see the class file. Any thoughts on why the JAR is failing?
Thanks a bunch for your consideration.

Check that you don't have a directory called "blah" that python would look in for modules. This would hide your Java classes, as per https://jpype.readthedocs.io/en/latest/userguide.html#importing-java-classes :
One important caveat when dealing with importing Java modules. Python always imports local directories as modules before calling the Java importer. So any directory named java, com, or org will hide corresponding Java package. We recommend against naming directories as java or top level domain.

Related

sun.misc.InvalidJarIndexException: Invalid index when importing from com.* package in Jython standalone

I'm getting an InvalidJarIndexException when trying to utilize Jython standalone JAR inside my application and I'm unable to figure out what I'm doing wrong.
As soon as I attempt to execute a Python script with an import statement for any Java class from a package starting with "com.", e.g.: "com.foo.Bar", the following exception is thrown (truncated):
Traceback (most recent call last):
File "<string>", line 1, in <module>
sun.misc.InvalidJarIndexException: Invalid index
at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:1152)
at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:1062)
at sun.misc.URLClassPath$JarLoader.findResource(URLClassPath.java:1032)
at sun.misc.URLClassPath.findResource(URLClassPath.java:225)
at java.net.URLClassLoader$2.run(URLClassLoader.java:572)
at java.net.URLClassLoader$2.run(URLClassLoader.java:570)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findResource(URLClassLoader.java:569)
at java.lang.ClassLoader.getResource(ClassLoader.java:1089)
at java.net.URLClassLoader.getResourceAsStream(URLClassLoader.java:233)
at org.python.core.ClasspathPyImporter.tryClassLoader(ClasspathPyImporter.java:221)
at org.python.core.ClasspathPyImporter.makeEntry(ClasspathPyImporter.java:208)
at org.python.core.ClasspathPyImporter.makeEntry(ClasspathPyImporter.java:18)
at org.python.core.util.importer.getModuleInfo(importer.java:174)
at org.python.core.util.importer.importer_find_module(importer.java:98)
at org.python.core.ClasspathPyImporter.ClasspathPyImporter_find_module(ClasspathPyImporter.java:134)
at org.python.core.ClasspathPyImporter$ClasspathPyImporter_find_module_exposer.__call__(Unknown Source)
at org.python.core.PyBuiltinMethodNarrow.__call__(PyBuiltinMethodNarrow.java:48)
at org.python.core.imp.find_module(imp.java:761)
at org.python.core.imp.import_next(imp.java:1158)
at org.python.core.imp.import_module_level(imp.java:1350)
at org.python.core.imp.importName(imp.java:1528)
at org.python.core.ImportFunction.__call__(__builtin__.java:1285)
at org.python.core.PyObject.__call__(PyObject.java:433)
at org.python.core.__builtin__.__import__(__builtin__.java:1232)
at org.python.core.imp.importOneAs(imp.java:1564)
at org.python.pycode._pyx0.f$0(<string>:1)
at org.python.pycode._pyx0.call_function(<string>)
at org.python.core.PyTableCode.call(PyTableCode.java:173)
at org.python.core.PyCode.call(PyCode.java:18)
at org.python.core.Py.runCode(Py.java:1687)
at org.python.core.Py.exec(Py.java:1731)
at org.python.util.PythonInterpreter.exec(PythonInterpreter.java:268)
at com.so.Script.execute(Script.java:20)
Here's all I'm doing in my code (I'm actually invoking this via a Swing action on a JMenuItem calling new Script().execute(), which is most likely irrelevant):
package com.so;
import org.python.core.PyDictionary;
import org.python.core.PySystemState;
import org.python.util.PythonInterpreter;
public class Script {
public Script() {
}
public void execute() {
PyDictionary table = new PyDictionary();
PySystemState state = new PySystemState();
PythonInterpreter interp = new PythonInterpreter(table, state);
String script;
script = "" +
"import com.foo.Bar as Bar\n" +
"";
interp.exec(script);
}
}
It doesn't even matter that there is no such package/class in my classpath. But what baffles me the most is that when I, thinking this has to be classpath related, created a separate mock project with the exact same classpath (same JAR files from the same locations on disk), the other project works just fine when run and it executes the actual script.
What could I be doing wrong here?
This happens with Java 1.8u241 (x64) and both jython-standalone-2.7.2.jar and an earlier 2.7.1 version. The ClassLoader in the stack trace is attempting to resolve "com".
I found the culprit entirely by accident.
My broken project used an older (ancient) version of the Glazed Lists library, namely glazedlists-1.8.0_java15.jar. This JAR seems to be directly incompatible with jython-standalone-2.7.2.jar. As soon as you put them on the same classpath and attempt to execute a python script, which imports any Java package starting with "com.", you end up with an InvalidJarIndexException. Updating to a newer version of said JAR resolved the issue.
Therefore, if you encounter a similar exception trying to run Jython, I suggest you update all dependencies of your project to latest or newer versions of them. In fact this would probably be the solution, even if not using Jython at all.

Matlab installation (LD_LIBRARY_PATH) messes up other library files

I am trying to install Matlab on a Linux machine, but setting LD_LIBRARY_PATH (as the installation requires) breaks other library files. I am not an Linux expert, but I have tried several things and cannot get it working correctly. I have even contacted Matlab support, got the issue elevated to the dev team, and was basically told "haha sucks to suck". I have seen a few other people online have had the same issue, but either their questions were never answered or they had a slightly different problem and their solution didn't apply to me.
Installing on a VM running Ubuntu:
I set LD_LIBRARY_PATH as the instructions say, then it breaks network files. I can ping google.com, but I cannot nslookup google.com or visit it in a browser. Nslookup provides this error:
nslookup: /usr/local/MATLAB/MATLAB_Runtime/v90/bin/glnxa64/libcrypto.so.1.0.0: no version information available (required by /usr/lib/libdns.so.100)
03-Feb-2016 11:32:22.361 ENGINE_by_id failed (crypto failure)
03-Feb-2016 11:32:22.362 error:25070067:DSO support routines:DSO_load:could not load the shared library:dso_lib.c:244:
03-Feb-2016 11:32:22.363 error:260B6084:engine routines:DYNAMIC_LOAD:dso not found:eng_dyn.c:447:
03-Feb-2016 11:32:22.363 error:2606A074:engine routines:ENGINE_by_id:no such engine:eng_list.c:418:id=gost
(null): dst_lib_init: crypto failure
The installation worked though (I can run my Java programs that reference compiled Matlab functions). Unsetting LD_LIBRARY_PATH fixes the network files but then I can't run programs anymore.
Installing on EC2 instance:
On an EC2 instance it does not break the network files (nslookup is fine). Instead it messes up Python library files. Trying to use any aws cli command, I get the error:
File "/usr/bin/aws", line 19, in <module>
import awscli.clidriver
File "/usr/lib/python2.7/dist-packages/awscli/clidriver.py", line 16, in <module>
import botocore.session
File "/usr/lib/python2.7/dist-packages/botocore/session.py", line 25, in <module>
import botocore.config
File "/usr/lib/python2.7/dist-packages/botocore/config.py", line 18, in <module>
from botocore.compat import six
File "/usr/lib/python2.7/dist-packages/botocore/compat.py", line 139, in <module>
import xml.etree.cElementTree
File "/usr/lib64/python2.7/xml/etree/cElementTree.py", line 3, in <module>
from _elementtree import *
ImportError: PyCapsule_Import could not import module "pyexpat"
Printing sys.path in Python shows lib-dynload is already there though, so it doesn't seem to the problem.
And when trying to run the program, I get:
Exception in thread "main" java.lang.LinkageError: libXt.so.6: cannot open shared object file: No such file or directory
at com.mathworks.toolbox.javabuilder.internal.DynamicLibraryUtils.dlopen(Native Method)
at com.mathworks.toolbox.javabuilder.internal.DynamicLibraryUtils.loadLibraryAndBindNativeMethods(DynamicLibraryUtils.java:134)
at com.mathworks.toolbox.javabuilder.internal.MWMCR.<clinit>(MWMCR.java:1529)
at VectorAddExample.VectorAddExampleMCRFactory.newInstance(VectorAddExampleMCRFactory.java:48)
at VectorAddExample.VectorAddExampleMCRFactory.newInstance(VectorAddExampleMCRFactory.java:59)
at VectorAddExample.VectorAddClass.<init>(VectorAddClass.java:62)
at com.mypackage.Example.main(Example.java:13)
I'm at a brick wall and really have no clue how to proceed.
Maybe something else already needs LD_LIBRARY_PATH set to work. Make sure you prepend not overwrite:
export LD_LIBRARY_PATH=new/path:$LD_LIBRARY_PATH
Edit:
OK, if LD_LIBRARY_PATH was initially empty, this suggests that Matlab comes with shared libraries that are incompatible with your system ones:
nslookup: /usr/local/MATLAB/MATLAB_Runtime/v90/bin/glnxa64/libcrypto.so.1.0.0: no version information available (required by /usr/lib/libdns.so.100)
suggests that /usr/lib/libdns.so.100 needs libcrypto.so.1.0.0, which is now being resolved to the one that comes with MATLAB, which is incompatible.
You can check the dependencies of a dll by
ldd /usr/lib/libcrypto.so.1.0.0
and hopefully you can find a configuration that keeps both MATLAB and your system happy. Unfortunately, this may involve a lot of trial and error.
If there is no such configuration, you can try setting LD_LIBRARY_PATH only when you run MATLAB:
LD_LIBRARY_PATH=$MATLAB_LD_LIBRARY_PATH matlab
Edit 2:
Well, for the Python issue, it seems to boil down to pyexpat, which is a wrapper around the standard expat XML parser. Try doing (name guessed since I don't have a Linux right now):
ldd /usr/local/lib/python2.7/site-packages/libpyexpat.so
and see what that depends on. Probably, it will be libexpat.so, which is now being resolved to MATLAB's version.
try the following command:
export LD_LIBRARY_PATH=/usr/local/MATLAB/MATLAB_Runtime/v90/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v90/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v90/sys/os‌​/glnxa64:$LD_LIBRARY_PATH
Perhaps not helpful for OP but if you are generating a python package with MATLAB, you could modify the generated __init__.py file MATLAB creates for your package.
Specifically, the generated __init__.py file contains the following line (as of MATLAB 2017a):
PLATFORM_DICT = {'Windows': ['PATH','dll',''], 'Linux': ['LD_LIBRARY_PATH','so','libmw'], 'Darwin': ['DYMCR_LIBRARY_PATH','dylib','libmw']}
For Linux platform, you could simply replace LD_LIBRARY_PATH with something else such as MCR_LIBRARY_PATH to prevent mucking with your shared libs.
sed -i -e 's/LD_LIBRARY_PATH/MCR_LIBRARY_PATH/g' /MY/PACKAGE/BUILD/PATH/__init__.py
Then obviously export MCR_LIBRARY_PATH before using python.

pyjnius "Class not found" when importing jar file

I'm trying to make pyjnius work with a jar file I built from java application, but I keep getting the "Class not found" error:
>>> import os
>>> os.environ['CLASSPATH'] = "~/workspace/myapp-Tools/Admin/Console/couchdb/myapp-web.jar"
>>> from jnius import autoclass
>>> bla = autoclass('com/myapp/webapp/server/helpers/licensee/CalculationHelper')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sam/workspace/myapp-Tools/Admin/Console/couchdb/virtualenv/local/lib/python2.7/site-packages/jnius/reflect.py", line 150, in autoclass
c = find_javaclass(clsname)
File "jnius_export_func.pxi", line 23, in jnius.find_javaclass (jnius/jnius.c:12815)
jnius.JavaException: Class not found 'com/myapp/webapp/server/helpers/licensee/CalculationHelper'
>>>
of course I've checked:
jar tf myapp-web.jar
and com/myapp/webapp/server/helpers/licensee/CalculationHelper.class is in there
I've also tried setting the classpath this way:
import jnius_config
jnius_config.set_classpath('.', '~/workspace/myapp-Tools/Admin/Console/couchdb/')
#import jnius
from jnius import autoclass
But this gave me the same result.
I'm working on a virtualenv btw.
I've tried all approaches I could find online, but it is simply not working. I had to manually install pyjnius because using pip got me an old version of it.
Any help would be greatly appreciated.
Edit: tried this with a jar not created by me and I see a different error
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import jnius_config
>>> jnius_config.add_classpath('/home/sam/workspace/someproject/*')
>>> jnius_config.expand_classpath()
'/home/sam/workspace/someproject/annotations.jar:/home/sam/workspace/someproject/junit-4.10.jar:/home/sam/workspace/someproject/ postgresql-8.1-408.jdbc3.jar'
>>> import jnius
>>> from jnius import autoclass
>>> test = autoclass('org/postgresql/geometric/PGcircle.class')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sam/workspace/myapp-Tools/Admin/Console/couchdb/virtualenv/local/lib/python2.7/site-packages/jnius/reflect.py", line 150, in autoclass
c = find_javaclass(clsname)
File "jnius_export_func.pxi", line 23, in jnius.find_javaclass (jnius/jnius.c:12815)
jnius.JavaException: Class not found 'org/postgresql/geometric/PGcircle/class'
>>> test = autoclass('org/postgresql/geometric/PGcircle')
Exception in thread "main" java.lang.NoClassDefFoundError: org/postgresql/geometric/PGcircle/class
Caused by: java.lang.ClassNotFoundException: org.postgresql.geometric.PGcircle.class
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sam/workspace/myapp-Tools/Admin/Console/couchdb/virtualenv/local/lib/python2.7/site-packages/jnius/reflect.py", line 156, in autoclass
for constructor in c.getConstructors():
File "jnius_export_class.pxi", line 562, in jnius.JavaMethod.__call__ (jnius/jnius.c:19385)
File "jnius_export_class.pxi", line 649, in jnius.JavaMethod.call_method (jnius/jnius.c:20409)
File "jnius_utils.pxi", line 43, in jnius.check_exception (jnius/jnius.c:3533)
jnius.JavaException: JVM exception occured
>>> test = autoclass('org/postgresql/geometric/PGcircl')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sam/workspace/myapp-Tools/Admin/Console/couchdb/virtualenv/local/lib/python2.7/site-packages/jnius/reflect.py", line 150, in autoclass
c = find_javaclass(clsname)
File "jnius_export_func.pxi", line 23, in jnius.find_javaclass (jnius/jnius.c:12815)
jnius.JavaException: Class not found 'org/postgresql/geometric/PGcircl'
>>>
and here are the contents of jar tf on that jar:
sam#lambda ~/workspace$ jar tf ./someproject/postgresql-8.1-408.jdbc3.jar
META-INF/
META-INF/MANIFEST.MF
...
org/postgresql/geometric/PGbox.class
org/postgresql/geometric/PGcircle.class
org/postgresql/geometric/PGline.class
org/postgresql/geometric/PGlseg.class
org/postgresql/geometric/PGpath.class
org/postgresql/geometric/PGpoint.class
org/postgresql/geometric/PGpolygon.class
...
sam#lambda ~/workspace$
Again... any help will be greatly appreciated!
tl;dr: make sure the .java files are compiled to (at most) the same Java version .class files as the Java version on the system that will import the file with pyjnius.
Longer version:
I had a very similar problem, with one big difference: some files worked without any problem and others (in the same directory) didn't.
The problem with the files that resulted in the 'Class not found' exception was that I compiled them under Windows, which has Java 8. Ubuntu however currently installs Java 7 when you run "sudo apt-get install default-jdk".
And so, pyjnius couldn't import the Java 8 files on my Java 7 Ubuntu install. It's strange that it throws a 'Class not found' exception, instead of something more descriptive. Changing the target output to 1.7 fixed my problem.
I solved this problem by exporting the JAR as a runnable JAR file in Eclipse:
create an empty main method somewhere if you don't have one (export didn't work for me otherwise)
go to File->Export...
select Java->Runnable JAR file
click Next
select the main method in the Launch configuration
select your Export destination
select "Copy required libraries into a sub-folder next to the generated JAR" as Library handling (the only option that worked for me in my special case, but you can also test the others)
click Finish
It works fine when I use my jar file.
Did you try to use the full path to define CLASSPATH?
Windows 7
Python 2.7.8
jnius 1.1-dev
This page will be useful as a reference.
http://www.hackzine.org/using-apache-tika-from-python-with-jnius.html
I just wanted to leave a comment but I don't have enough reputation for it.
So I leave an answer as a comment.
And you'd better not to use "/" instead of "." when you call autoclass.
See the link below.
http://pyjnius.readthedocs.org/en/latest/api.html#jnius.autoclass
Old Post, but posting an answer if it is useful for someone.
I see two issues there, not sure if you have tried these together:
Change
jnius_config.set_classpath('.', '~/workspace/myapp-Tools/Admin/Console/couchdb/')
To
jnius_config.set_classpath('.', '~/workspace/myapp-Tools/Admin/Console/couchdb/*')
And
test = autoclass('org/postgresql/geometric/PGcircl')
To
test = autoclass('org.postgresql.geometric.PGcircl')
Have you tried to add CLASSPATH via export then run your python script? This worked for me.
$ export CLASSPATH="~/workspace/myapp-Tools/Admin/Console/couchdb/myapp-web.jar"
i solved this isue by putting CLASSPATH in .bashrc
CLASSPATH="~/documents/download/programs/tika-app.jar"
and it works properly
The code below adds a jar to the classpath and then displays the classpath to further debug.
import os
import jnius_config
jnius_config.add_classpath("PATH_HERE/SOME.jar")
from jnius import autoclass, cast
ClassLoader = autoclass('java.lang.ClassLoader')
cl = ClassLoader.getSystemClassLoader()
ucl = cast('java.net.URLClassLoader', cl)
urls = ucl.getURLs()
tmp = [url.getFile() for url in urls]
print('\n'.join(tmp))

Jython, subprocess and msvcrt... is it possible?

i'm trying to build a wrapper around a Python module, to embed it in my java code.
looks like this module use many tricks like subprocess, threading and so on
(actually it is itself a module that control a C utility provided AS-IS and only as a binary, i am trying to avoid the overcost to recode the inner logic and others tools that this python wrapper already provided)
by the way, when instantiate my own wrapper it from Java i get :
------------------
Exception in thread "MainThread" Traceback (most recent call last):
File "<string>", line 1, in <module>
File "__pyclasspath__/mywrapper.py", line 303, in <module>
File "C:\jython2.5.2\Lib\subprocess.py", line 375, in <module>
import msvcrt
ImportError: No module named msvcrt
if i look on my harddisk, there is no msvscrt.py where is it suppose to live ?
i am launching my jython with :
PythonInterpreter interpreter = new PythonInterpreter(null, new PySystemState());
PySystemState sys = Py.getSystemState();
sys.path.append(new PyString("C:/jython2.5.2/Lib"));
sys.platform = new PyString("win32"); // this is a trick for the wrapper to not fail on a inner plateform test detection with java1.7.0_03
msvcrt is not available in Jython. In CPython on Windows, msvcrt is a built-in module compiled into the Python interpreter (you can check this with sys.builtin_module_names). There is no msvcrt.py file.
Why you need "a trick for the wrapper to not fail on a inner plateform test detection with java1.7.0_03", I can't say. But setting sys.platform to win32 makes Jython try to import msvcrt when using subprocess, which doesn't work.

How to use Sqoop in Java Program?

I know how to use sqoop through command line.
But dont know how to call sqoop command using java programs .
Can anyone give some code view?
You can run sqoop from inside your java code by including the sqoop jar in your classpath and calling the Sqoop.runTool() method. You would have to create the required parameters to sqoop programmatically as if it were the command line (e.g. --connect etc.).
Please pay attention to the following:
Make sure that the sqoop tool name (e.g. import/export etc.) is the first parameter.
Pay attention to classpath ordering - The execution might fail because sqoop requires version X of a library and you use a different version. Ensure that the libraries that sqoop requires are not overshadowed by your own dependencies. I've encountered such a problem with commons-io (sqoop requires v1.4) and had a NoSuchMethod exception since I was using commons-io v1.2.
Each argument needs to be on a separate array element. For example, "--connect jdbc:mysql:..." should be passed as two separate elements in the array, not one.
The sqoop parser knows how to accept double-quoted parameters, so use double quotes if you need to (I suggest always). The only exception is the fields-delimited-by parameter which expects a single char, so don't double-quote it.
I'd suggest splitting the command-line-arguments creation logic and the actual execution so your logic can be tested properly without actually running the tool.
It would be better to use the --hadoop-home parameter, in order to prevent dependency on the environment.
The advantage of Sqoop.runTool() as opposed to Sqoop.Main() is the fact that runTool() return the error code of the execution.
Hope that helps.
final int ret = Sqoop.runTool(new String[] { ... });
if (ret != 0) {
throw new RuntimeException("Sqoop failed - return code " + Integer.toString(ret));
}
RL
Find below a sample code for using sqoop in Java Program for importing data from MySQL to HDFS/HBase. Make sure you have sqoop jar in your classpath:
SqoopOptions options = new SqoopOptions();
options.setConnectString("jdbc:mysql://HOSTNAME:PORT/DATABASE_NAME");
//options.setTableName("TABLE_NAME");
//options.setWhereClause("id>10"); // this where clause works when importing whole table, ie when setTableName() is used
options.setUsername("USERNAME");
options.setPassword("PASSWORD");
//options.setDirectMode(true); // Make sure the direct mode is off when importing data to HBase
options.setNumMappers(8); // Default value is 4
options.setSqlQuery("SELECT * FROM user_logs WHERE $CONDITIONS limit 10");
options.setSplitByCol("log_id");
// HBase options
options.setHBaseTable("HBASE_TABLE_NAME");
options.setHBaseColFamily("colFamily");
options.setCreateHBaseTable(true); // Create HBase table, if it does not exist
options.setHBaseRowKeyColumn("log_id");
int ret = new ImportTool().run(options);
As suggested by Harel, we can use the output of the run() method for error handling. Hoping this helps.
There is a trick which worked out for me pretty well. Via ssh, you can execute the Sqoop command directly. Just you have to use is an SSH Java Library
This is independent of Java. You just need to include any SSH library and sqoop installed in the remote system you want to perform the import. Now connect to the system via ssh and execute the commands which will export data from MySQL to hive.
You have to follow this step.
Download sshxcute java library: https://code.google.com/p/sshxcute/
and Add it to the build path of your java project which contains the following Java code
import net.neoremind.sshxcute.core.SSHExec;
import net.neoremind.sshxcute.core.ConnBean;
import net.neoremind.sshxcute.task.CustomTask;
import net.neoremind.sshxcute.task.impl.ExecCommand;
public class TestSSH {
public static void main(String args[]) throws Exception{
// Initialize a ConnBean object, the parameter list is IP, username, password
ConnBean cb = new ConnBean("192.168.56.102", "root","hadoop");
// Put the ConnBean instance as parameter for SSHExec static method getInstance(ConnBean) to retrieve a singleton SSHExec instance
SSHExec ssh = SSHExec.getInstance(cb);
// Connect to server
ssh.connect();
CustomTask sampleTask1 = new ExecCommand("echo $SSH_CLIENT"); // Print Your Client IP By which you connected to ssh server on Horton Sandbox
System.out.println(ssh.exec(sampleTask1));
CustomTask sampleTask2 = new ExecCommand("sqoop import --connect jdbc:mysql://192.168.56.101:3316/mysql_db_name --username=mysql_user --password=mysql_pwd --table mysql_table_name --hive-import -m 1 -- --schema default");
ssh.exec(sampleTask2);
ssh.disconnect();
}
}
If you know the location of the executable and the command line arguments you can use a ProcessBuilder, this can then be run a separate Process that Java can monitor for completion and return code.
Please follow the code given by vikas it worked for me and include these jar files in classpath and import these packages
import com.cloudera.sqoop.SqoopOptions;
import com.cloudera.sqoop.tool.ImportTool;
Ref Libraries
Sqoop-1.4.4 jar /sqoop
ojdbc6.jar /sqoop/lib (for oracle)
commons-logging-1.1.1.jar hadoop/lib
hadoop-core-1.2.1.jar /hadoop
commons-cli-1.2.jar hadoop/lib
commmons-io.2.1.jar hadoop/lib
commons-configuration-1.6.jar hadoop/lib
commons-lang-2.4.jar hadoop/lib
jackson-core-asl-1.8.8.jar hadoop/lib
jackson-mapper-asl-1.8.8.jar hadoop/lib
commons-httpclient-3.0.1.jar hadoop/lib
JRE system library
1.resources.jar jdk/jre/lib
2.rt.jar jdk/jre/lib
3. jsse.jar jdk/jre/lib
4. jce.jar jdk/jre/lib
5. charsets,jar jdk/jre/lib
6. jfr.jar jdk/jre/lib
7. dnsns.jar jdk/jre/lib/ext
8. sunec.jar jdk/jre/lib/ext
9. zipfs.jar jdk/jre/lib/ext
10. sunpkcs11.jar jdk/jre/lib/ext
11. localedata.jar jdk/jre/lib/ext
12. sunjce_provider.jar jdk/jre/lib/ext
Sometimes u get error if your eclipse project is using JDK1.6 and the libraries you add are JDK1.7 for this case configure JRE while creating project in eclipse.
Vikas if i want to put the imported files into hive should i use options.parameter ("--hive-import") ?

Categories