Hive running locally include native libraries for LZO - java

I'm trying to run Hive locally on OSX Mountain Lion and I'm trying to follow the instructions here:
https://github.com/twitter/hadoop-lzo
I've compiled the native OSX libraries and jar, but I'm not sure how I'm supposed to launch Hive locally such that Hive/Hadoop uses the native libraries.
I've tried including it through the JAVA_LIBRARY_PATH environment variable but I think that's just for Hadoop in general.
export JAVA_LIBRARY_PATH="${SCRIPTS_DIR}/jars/native/Mac_OS_X-x86_64-64"
When I run hive using the LzopCodec e.g.:
SET mapred.output.compression.codec = com.hadoop.compression.lzo.LzopCodec;
I get the following error when I run a query that runs a map/reduce job:
SELECT COUNT(*) from test_table;
Job running in-process (local Hadoop)
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: native-lzo library not available
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:237)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:477)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:525)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:959)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:995)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:303)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:530)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:262)
Caused by: java.lang.RuntimeException: native-lzo library not available
at com.hadoop.compression.lzo.LzoCodec.getCompressorType(LzoCodec.java:155)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:100)
at com.hadoop.compression.lzo.LzopCodec.getCompressor(LzopCodec.java:135)
at com.hadoop.compression.lzo.LzopCodec.createOutputStream(LzopCodec.java:70)
at org.apache.hadoop.hive.ql.exec.Utilities.createCompressedStream(Utilities.java:868)
at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.getHiveRecordWriter(HiveIgnoreKeyTextOutputFormat.java:80)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:246)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:234)
... 14 more
I've also tried setting in a Hive script the mapred.child.env LD_LIBRARY_PATH (no luck):
SET mapred.child.env="LD_LIBRARY_PATH=../../scripts/jars/native/Mac_OS_X-x86_64-64";

Reading the clear instructions again:
How do I configure Hadoop to use these classes?
# Copy the native library
tar -cBf - -C build/hadoop-gpl-compression-0.1.0-dev/lib/native . | tar -xBvf - -C /path/to/hadoop/dist/lib/native
Basically I just needed to copy the built native library into my hadoop installation:
ant compile-native tar
cp -r build/hadoop-lzo-0.4.17-SNAPSHOT/lib/native/Mac_OS_X-x86_64-64 /usr/local/Cellar/hadoop/1.1.2/libexec/lib/native/

Related

Error: Could not find or load main class org.apache.spark.launcher.Main in Java Spark

I am trying to run the spark application which is written in java in GKP. For the same I am able to build the image and placed in the container. But while running the spark application with spark-submit command I am facing an error which is
Error: Could not find or load main class org.apache.spark.launcher.Main
The java and spark versions i am using for this was jdk-11 and spark-3.2.1
I am running this application via IntelliJ with maven. Also tried adding the spark-launcher maven dependency still the issue exists.
Can I know where it is going wrong with this versions.
NOTE : I can see the spark-launcher jar in the spark-3.2.1 jar folder as well.
I had that error message. It probably may have several root causes but this how I investigated and solved the problem (on linux):
instead of launching spark-submit, try using bash -x spark-submit to
see which line fails.
do that process several times ( since spark-submit calls nested
scripts ) until you find the underlying process called : in my case
something like :
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -cp
'/opt/spark-2.2.0-bin-hadoop2.7/conf/:/opt/spark-2.2.0-bin-hadoop2.7/jars/*'
-Xmx1g org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main --name 'Spark shell' spark-shell
So, spark-submit launches a java process and can't find the org.apache.spark.launcher.Main class using the files in /opt/spark-2.2.0-bin-hadoop2.7/jars/* (see the -cp option above). I did an ls in this jars folder and counted 4 files instead of the whole spark distrib (~200 files). It was probably a problem during the installation process. So I reinstalled spark, checked the jar folder and it worked like a charm.
So, you should:
check the java command (cp option)
check your jars folder ( does it contain ths at least all the
spark-*.jar
?)
Hope it helps.

bazel remote worker deployable jar not working

I have an issue packing bazel-remote-worker into deployable jar.
I ran the following command:
bazel build //src/tools/remote_worker:remote_worker_deploy.jar
But when I try to run the jar I get this error:
➜ bazel git:(master) ✗ java -jar remote_worker_deploy.jar --work_path=/tmp/test --listen_port=3030
*** Initializing in-memory cache server.
*** Not using remote cache. This should be used for testing only!
Exception in thread "main" java.lang.UnsatisfiedLinkError: no unix in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
at java.lang.Runtime.loadLibrary0(Runtime.java:870)
at java.lang.System.loadLibrary(System.java:1122)
at com.google.devtools.build.lib.UnixJniLoader.loadJni(UnixJniLoader.java:28)
at com.google.devtools.build.lib.unix.NativePosixFiles.<clinit>(NativePosixFiles.java:136)
at com.google.devtools.build.lib.unix.UnixFileSystem.createDirectory(UnixFileSystem.java:309)
at com.google.devtools.build.lib.vfs.Path.createDirectory(Path.java:829)
at com.google.devtools.build.lib.vfs.FileSystemUtils.createDirectoryAndParentsWithCache(FileSystemUtils.java:692)
at com.google.devtools.build.lib.vfs.FileSystemUtils.createDirectoryAndParents(FileSystemUtils.java:652)
at com.google.devtools.build.remote.RemoteWorker.<init>(RemoteWorker.java:114)
at com.google.devtools.build.remote.RemoteWorker.main(RemoteWorker.java:621)
The only way I can start it is by running the executable from bazel-bin:
bazel-bin/src/tools/remote_worker/remote_worker --work_path=/tmp/test --listen_port=3030
I'm running bazel latest (currently a3e26835890a543ff84cce90c879f9196ae06348) on mac osx sierra.
I tried it with either oracle-jdk-1.8.131 or openjdk-1.8.91 and it behaved the same.
End goal is to create a docker image that runs this jar but even inside the openjdk:8 this jar acts the same...
Apparently we're not packing the native code into the deploy jar. I'd actually prefer to refactor the RemoteWorker to avoid most of Bazel's internal libraries, although it's unlikely to happen soon. You could ship the libunix.so with the deploy jar and set the java.library.path appropriately. Alternatively, you can take the entire runfiles tree after building the remote worker (bazel-bin/src/tools/remote_worker/remote_worker.runfiles/).
Since the question was asked, the paths in Bazel source tree have changed. Nowadays the build commands to get the _deploy.jar are as follows.
bazel build src/tools/remote:worker_deploy.jar
mkdir -p /tmp/cas /tmp/cache /tmp/work
java -jar bazel-bin/src/tools/remote/worker_deploy.jar \
--cas_path /tmp/cas --disk_cache /tmp/cache --work_path /tmp/work
bazel build --spawn_strategy=remote \
--remote_cache=grpc://${IP}:8080 --remote_executor=grpc://${IP}:8080

How to run ZooInspector from Windows

Here's what I did:
Downloaded Apache ZooKeeper 3.4.6 (.tar file), extracted to C:\cygwin\home\user\zookeeper-3.4.6\
Ran ant at the root of the ZooKeeper folder (C:\cygwin\home\user\zookeeper-3.4.6)
Navigated to C:\cygwin\home\user\zookeeper-3.4.6\contrib\ZooInspector\
Ran ant, and I get the following error:
Output:
Buildfile: C:\cygwin\home\Jean\zookeeper-3.4.6\contrib\ZooInspector\build.xml
BUILD FAILED
C:\cygwin\home\user\zookeeper-3.4.6\contrib\ZooInspector\build.xml:19: Cannot find C:\cygwin\home\user\zookeeper-3.4.6\contrib\build-contrib.xml imported from C:\cygwin\home\user\zookeeper-3.4.6\contrib\ZooInspector\build.xml
Total time: 0 seconds
This leaves me with no .cmd or .sh file to execute. How come the build-contrib.xml file isn't there?
Also, I noticed that there seems to be an already-compiled ZooInspector JAR file: zookeeper-3.4.6-ZooInspector.jar. However, attempting to run it with the following command yields failure too:
$ java -cp zookeeper-3.4.6-ZooInspector.jar:lib/* org.apache.zookeeper.inspector.ZooInspector
Error: Could not find or load main class org.apache.zookeeper.inspector.ZooInspector
This is a bit frustrating -- setting up the ZooKeeper server was straightforward but for some reason I just can't figure out how to run this standalone GUI. What am I missing?
For windows:
#echo off
set cp="./*;./lib/*;../../*;../../lib/*"
java -cp %cp% org.apache.zookeeper.inspector.ZooInspector
ZooInspector 3.4.6 (that's bundled with ZooKeeper 3.4.6) doesn't seem to be able to connect to a running ZooKeeper instance on Windows.
Better use zkui:
https://github.com/echoma/zkui/wiki/Download
zooInspector just need 3 libraries and 1 jar to load the main class.
the mainclass lives zookeeper-3.3.0-ZooInspector.jar and it needs jtoaster-1.0.4.jar, zookeeper-3.3.0.jar and finally log4j-1.2.15.jar
After download the tar.gz file from apache servers, you must untar and build with ant. finally copy the zookeeper-3.3.0.jar and log4j-1.2.15.jar to contrib/ZooInspector/lib/. Finally cd to contrib/ZooInspector and launch this command
java -jar zookeeper-3.3.0-ZooInspector.jar -cp lib/*
I met the same issue today, and created a pre-compiled version, which should work on Windows as well. You can find details here:
https://www.admon.org/scripts/zooinspector-zookeeper-graphic-interface/

Running Rake Tasks In WAS without JRuby

Related to : Executing rake tasks on an exploded war on tomcat without jruby being installed
I'm trying to run rake tasks in my Tomcat server that doesn't have JRuby installed. I'm using warbler to create a war file.
Using the answer to the linked question, I ran:
java -cp lib/jruby-core*.jar:lib/jruby-stdlib*.jar org.jruby.Main -S rake -T
This gets me the error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/jruby/Main
Caused by: java.lang.ClassNotFoundException: org.jruby.Main
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
ls lib gets me:
ems-gems-activerecord-jdbc-adapter-1.2.2-lib-arjdbc-jdbc-adapter_java.jar
gems-gems-jdbc-sqlite3-3.7.2-lib-sqlite-jdbc-3.7.2.jar
gems-gems-jruby-jars-1.6.8-lib-jruby-core-1.6.8.jar
gems-gems-jruby-jars-1.6.8-lib-jruby-stdlib-1.6.8.jar
gems-gems-jruby-rack-1.1.10-lib-jruby-rack-1.1.10.jar
gems-gems-json-1.7.5-java-lib-json-ext-generator.jar
gems-gems-json-1.7.5-java-lib-json-ext-parser.jar
gems-gems-therubyrhino_jar-1.7.4-jar-rhino-1.7R4.jar
gems-gems-warbler-1.3.6-lib-warbler_jar.jar
jruby-core-1.6.8.jar
jruby-rack-1.1.10.jar
jruby-stdlib-1.6.8.jar
ojdbc6.jar
Opening up the jruby-core-1.6.8.jar file, I can see that there is indeed a org/jruby/Main.class file.
As one can see from the file listing, there is no jruby-complete jar file, so I can't run the command from https://stackoverflow.com/a/9982556/684934
What am I doing wrong, and is there by now a better way to do this?
I was working on a similar problem 2 months ago, so things may have changed, but I needed to include all the jars in my class path, had to use bin stubs, and had to set GEM_HOME to get everything working. It may have been simpler, but the posts you referenced didn't work for me either.
I actually had jruby installed, (but I was only using it to build the concatenated class path), so my setup was something like:
cd /path/to/application/
export GEM_HOME=/path/to/application/gems
export CLASSPATH=$(jruby -e 'puts Dir["lib/*.jar"].join(":")')
RAILS_ENV=production java -cp $CLASSPATH org.jruby.Main bin/rake -T
Also useful, the jruby-jars gem can be included in your gemfile to set the version of jruby that warbler includes (I was using gem 'jruby-jars', '1.7.0.preview2' as 1.7.0 wasn't released yet)

Where do I install a jdbc driver on ubuntu?

I'm trying to install the MS SQL JDBC driver on ubuntu to be used with sqoop for Hadoop. I'm totally new to java and linux, so I'm not sure where to extract everything to.
Just put it in the runtime classpath or add its path to the runtime classpath.
How to do it depends on how you're executing the program. If you're using java command in command console to execute a .class file, then use the -cp argument to specify the paths to classes and/or JAR files which are to be taken in the classpath. The classpath is basically a collection of absolute/relative disk file system paths where Java has to look for JAR files and classes.
Assuming that you've downloaded a .zip, you need to extract it and then look for a .jar file (usually in a /lib folder). For starters, it's the easiest to put the .jar in the current working directory and then execute your program (with the Class.forName("com.mysql.jdbc.Driver"); line) as follows:
java -cp .:mysql.jar com.example.YourClass
The . signifies the current path and the : is the separator (which I believe is correct for Ubuntu, on Windows it's ;).
To install the driver, you can:
Download the driver from Microsoft: https://www.microsoft.com/en-us/download/details.aspx?id=11774
Unzip and untar it (gzip -d sqljdbc_6.0.7507.100_enu.tar.gz and
tar -xf sqljdbc_6.0.7507.100_enu.tar)
Install it by copying the correct version into /usr/share/java (It will need to be world readable.) (sudo cp sqljdbc42.jar /usr/share/java/)
In the tomcat directory (/usr/share/tomcat8/lib but it could be tomcat7 if you are running a different version.) run sudo ln -s ../../java/sqljdbc42.jar sqljdbc42.jar (with the correct version names from below).
If you are using Maven, see Setting up maven dependency for SQL Server
The correct version is as follows: (Under System Requirements)
Sqljdbc.jar requires a JRE of 5 and supports the JDBC 3.0 API
Sqljdbc4.jar requires a JRE of 6 and supports the JDBC 4.0 API
Sqljdbc41.jar requires a JRE of 7 and supports the JDBC 4.1 API
Sqljdbc42.jar requires a JRE of 8 and supports the JDBC 4.2 API
Just put your jdbc jar file into /usr/lib/jvm/java-8-oracle/jre/lib/ext by using this command:
sudo cp ojdbc6.jar /usr/lib/jvm/java-8-oracle/jre/lib/ext

Categories