I have a spark scala program which loads a jar I wrote in java. From that jar a static function is called, which tried to read a serialized object from a file (Pattern.class), but throws a java.lang.ClassNotFoundException.
Running the spark program locally works, but on the cluster workers it doesn't. It's especially weird because before I try to read from the file, I instantiate a Pattern object and there are no problems.
I am sure that the Pattern objects I wrote in the file are the same as the Pattern objects I am trying to read.
I've checked the jar in the slave machine and the Pattern class is there.
Does anyone have any idea what the problem might be ? I can add more detail if it's needed.
This is the Pattern class
public class Pattern implements Serializable {
private static final long serialVersionUID = 588249593084959064L;
public static enum RelationPatternType {NONE, LEFT, RIGHT, BOTH};
RelationPatternType type;
String entity;
String pattern;
List<Token> tokens;
Relation relation = null;
public Pattern(RelationPatternType type, String entity, List<Token> tokens, Relation relation) {
this.type = type;
this.entity = entity;
this.tokens = tokens;
this.relation = relation;
if (this.tokens != null)
this.pattern = StringUtils.join(" ", this.tokens.toString());
}
}
I am reading the file from S3 the following way:
AmazonS3 s3Client = new AmazonS3Client(credentials);
S3Object confidentPatternsObject = s3Client.getObject(new GetObjectRequest("xxx","confidentPatterns"));
objectData = confidentPatternsObject.getObjectContent();
ois = new ObjectInputStream(objectData);
confidentPatterns = (Map<Pattern, Tuple2<Integer, Integer>>) ois.readObject();
LE: I checked the classpath at runtime and the path to the jar was not there. I added it for the executors but I still have the same problem. I don't think that was it, as I have the Pattern class inside the jar that is calling the readObject function.
Would suggest this adding this kind method to find out the classpath resources before call, to make sure that everything is fine from caller's point of view
public static void printClassPathResources() {
final ClassLoader cl = ClassLoader.getSystemClassLoader();
final URL[] urls = ((URLClassLoader) cl).getURLs();
LOG.info("Print All Class path resources under currently running class");
for (final URL url : urls) {
LOG.info(url.getFile());
}
}
This is sample configuration spark 1.5
--conf "spark.driver.extraLibrayPath=$HADOOP_HOME/*:$HBASE_HOME/*:$HADOOP_HOME/lib/*:$HBASE_HOME/lib/htrace-core-3.1.0-incubating.jar:$HDFS_PATH/*:$SOLR_HOME/*:$SOLR_HOME/lib/*" \
--conf "spark.executor.extraLibraryPath=$HADOOP_HOME/*" \
--conf "spark.executor.extraClassPath=$(echo /your directory of jars/*.jar | tr ' ' ',')
As described by this Trouble shooting guide :Class Not Found: Classpath Issues
Another common issue is seeing class not defined when compiling Spark programs this is a slightly confusing topic because spark is actually running several JVM’s when it executes your process and the path must be correct for each of them. Usually this comes down to correctly passing around dependencies to the executors. Make sure that when running you include a fat Jar containing all of your dependencies, (I recommend using sbt assembly) in the SparkConf object used to make your Spark Context. You should end up writing a line like this in your spark application:
val conf = new SparkConf().setAppName(appName).setJars(Seq(System.getProperty("user.dir") + "/target/scala-2.10/sparktest.jar"))
This should fix the vast majority of class not found problems. Another option is to place your dependencies on the default classpath on all of the worker nodes in the cluster. This way you won’t have to pass around a large jar.
The only other major issue with class not found issues stems from different versions of the libraries in use. For example if you don’t use identical versions of the common libraries in your application and in the spark server you will end up with classpath issues. This can occur when you compile against one version of a library (like Spark 1.1.0) and then attempt to run against a cluster with a different or out of date version (like Spark 0.9.2). Make sure that you are matching your library versions to whatever is being loaded onto executor classpaths. A common example of this would be compiling against an alpha build of the Spark Cassandra Connector then attempting to run using classpath references to an older version.
Related
I'm using Google OR-tools library (v6.4) for a project (though my question is not specific to this library). This consists of one jar, which has a few native dependencies (a bunch of ".so"/".dylib" object files, depending on the OS). This build for my project is being made on Ubuntu 14.04
The problem I'm facing: On trying to load a specific object file at runtime (using System.load()), I'm getting an UnsatisfiedLinkError with the message as "undefined symbol" (I've added the stacktrace below). However, I am loading the object file defining this symbol just before this, so I'm not sure why this error is being thrown.
I'm loading the dependencies in the following way: The object files are being packed into the jar created by Maven during build, and are being extracted and loaded (using System.load()) at runtime. The method for that is as follows:
public class EnvironmentUtils {
public static void loadResourceFromJar(String prefix, String suffix) {
String tempFilesDirectory = System.getProperty("java.io.tmpdir");
File tempFile = null;
try {
tempFile = new File(tempFilesDirectory + "/" + prefix + suffix);
tempFile.deleteOnExit();
try (final InputStream inputStream = EnvironmentUtils.class.getClassLoader().
getResourceAsStream(prefix+suffix)) {
if (inputStream == null) {
throw new RuntimeException(prefix + suffix + " was not found inside JAR.");
} else {
Files.copy(inputStream, tempFile.toPath(), StandardCopyOption.REPLACE_EXISTING);
}
}
System.load(tempFile.getAbsolutePath());
} catch (Exception e) {
//Log top 10 lines of stack trace
}
}
}
This method is being called inside a static block for all dependencies:
public class DummyClass {
static {
String sharedLibraryExtension = EnvironmentUtils.getSharedLibraryExtension(); //.so for linux, .dylib for Mac
String jniLibraryExtension = EnvironmentUtils.getJniLibraryExtension(); //.so for linux, .jnilib for Mac
EnvironmentUtils.loadResourceFromJar("libfap", sharedLibraryExtension);
EnvironmentUtils.loadResourceFromJar("libcvrptw_lib", sharedLibraryExtension);
EnvironmentUtils.loadResourceFromJar("libortools", sharedLibraryExtension);
EnvironmentUtils.loadResourceFromJar("libdimacs", sharedLibraryExtension);
EnvironmentUtils.loadResourceFromJar("libjniortools", jniLibraryExtension);
}
}
On running System.load() for libdimacs.so, an UnsatisfiedLinkError is thrown. Stacktrace:
java.lang.UnsatisfiedLinkError: /tmp/libdimacs.so: /tmp/libdimacs.so: undefined symbol: _ZN6google14FlagRegistererC1IbEEPKcS3_S3_PT_S5_
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824)
at java.lang.Runtime.load0(Runtime.java:809)
at java.lang.System.load(System.java:1086)
at com.(PROJECT_NAME).utils.EnvironmentUtils.loadResourceFromJar(EnvironmentUtils.java:78)
at com.(PROJECT_NAME).DummyClass.<clinit>(DummyClass.java:28)
However, this symbol "_ZN6google14FlagRegistererC1IbEEPKcS3_S3_PT_S5_" is present in libortools.so, which is being loaded before libdimacs. I verified this by running the following command:
objdump -t (LIBRARY_PATH)/libortools.so | grep _ZN6google14FlagRegistererC1IbEEPKcS3_S3_PT_S5_
This gave me the following output:
0000000000ce12cc gw F .text 00000091 _ZN6google14FlagRegistererC1IbEEPKcS3_S3_PT_S5_
So it would seem that the symbol should have been defined at the time of the System.load() call, unless there was some issue in loading the containing object file. To check if the object file had been loaded correctly, I used the approach detailed in this solution. Apart from the class detailed in that answer, I added the following lines after System.load() call in EnvironmentUtils.loadResourceFromJar() to print the most recently loaded library name:
public class EnvironmentUtils {
public static void loadResourceFromJar(String prefix, String suffix) {
...
System.load(tempFile.getAbsolutePath());
final String[] libraries = ClassScope.getLoadedLibraries(ClassLoader.getSystemClassLoader());
System.out.println(libraries[libraries.length - 1]);
}
}
The output (till just before the UnsatisfiedLinkError) is as follows:
/tmp/libfap.so
/tmp/libcvrptw_lib.so
/tmp/libortools.so
So libortools.so seems to be loading correctly, which means the symbol should be loaded in memory. The exact same code is working perfectly with the corresponding Mac (".dylib") dependencies (Built on MacOS Sierra 10.12.5). Would appreciate any advice on resolving this. Thank you.
I'm apologize that the java artifact may be broken currently...
you can use c++filt to demangle the symbol ;)
c++filt _ZN6google14FlagRegistererC1IbEEPKcS3_S3_PT_S5_
google::FlagRegisterer::FlagRegisterer<bool>(char const*, char const*, char const*, bool*, bool*)
In fact gflag has recently change its namespace from google:: to gflags:: and glog or protobobuf? try to find the correct one and I guess it failed...
note: Still not completely sure whose is the bad guy who use the google:: namespace since libortools merge all its static dependencies but I guess now you understand the bug...
note2: I have a patch in mizux/shared branch https://github.com/google/or-tools/commit/805bc0600f4b5645114da704a0eb04a0b1058e28#diff-e8590fe6fb5044985c8bf8c9e73c0d88R114
warning: this branch is currently broken and not ready yet. I'm trying ,for unix, to move from static to dynamic dependencies, so I need to fix all rpath, transitives deps etc... and in the process I also had to fix this issue (that I didn't reproduced while using static dependencies)
If too long to finish (we should create a release 6.7.2 or 6.8 (i.e. new artifact) by the end of May 2018) which maybe only contains this fix and not my branch...
I'm writing an intelij plugin and would like to download the platform specific artefact at runtime.
I've loaded the platform specific jar into a class loader but the ChromiumExtractor cannot access the nested resources when prefixed with "/". So I can access the resource as "chromium-mac.zip" but the library cannot.
I've tried to unzip the nested zipped chromium artefact into the correct directory but this does not leading to a working solution. So now I've been trying to piece together the way the library extracts the artefact but it's rather tedious as the code is obfuscated.
Does the jxbrowser plugin have some support for retrieving the artefact at runtime. Could such support be added (jxbtrowser devs use SO for support questions etc, this is a message to them :D ) ?
Approach taken :
// inside intelij plugin . The plugin has the jxbrowser-6.6.jar
// and license.jar loaded into the classloader. the platform specific
// artefact will be retrieved manual).
val cl = URLClassLoader(arrayOf(URL("file://.../jxbrowser-mac-6.6.jar")), Browser::class.java.classLoader)
val backup = Thread.currentThread().contextClassLoader
try {
Thread.currentThread().contextClassLoader = cl
// can access like this
Thread.currentThread().contextClassLoader.getResource("chromium-mac.zip")
val ce = ChromiumExtractor.create()
// cannot access as resource is retrieved "/chromium-mac.zip" ?
ce.extract(BrowserPreferences.getChromiumDir())
browser = Browser()
} finally {
Thread.currentThread().contextClassLoader = backup
}
The following does the trick, The resource jar had to be in the same class loader as the client jar (as well as the license). It would be nice if JxBrowser added a helper for this that is capable of performing the download and initialising chromium, perhaps taking just a path for a persistent storage directory.
private fun initializeJxBrowser(): Browser {
if(ChromiumExtractor.create().shouldExtract(BrowserPreferences.getChromiumDir())) {
val cl = URLClassLoader(arrayOf(
URL("file:.../license.jar"),
URL("file:.../jxbrowser-mac-6.6.jar"),
URL("file:../jxbrowser-6.6.jar")
))
cl.loadClass("com.teamdev.jxbrowser.chromium.BrowserContext")
.getMethod("defaultContext")
.invoke(null)
}
return Browser()
}
I'm trying to write a UDF for Hadoop Hive, that parses User Agents. Following code works fine on my local machine, but on Hadoop I'm getting:
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public java.lang.String MyUDF .evaluate(java.lang.String) throws org.apache.hadoop.hive.ql.metadata.HiveException on object MyUDF#64ca8bfb of class MyUDF with arguments {All Occupations:java.lang.String} of size 1',
Code:
import java.io.IOException;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.*;
import com.decibel.uasparser.OnlineUpdater;
import com.decibel.uasparser.UASparser;
import com.decibel.uasparser.UserAgentInfo;
public class MyUDF extends UDF {
public String evaluate(String i) {
UASparser parser = null;
parser = new UASparser();
String key = "";
OnlineUpdater update = new OnlineUpdater(parser, key);
UserAgentInfo info = null;
info = parser.parse(i);
return info.getDeviceType();
}
}
Facts that come to my mind I should mention:
I'm compiling with Eclipse with "export runnable jar file" and extract required libraries into generated jar option
I'm uploading this "fat jar" file with Hue
Minimum working example I managed to run:
public String evaluate(String i) {
return "hello" + i.toString()";
}
I guess the problem lies somewhere around that library (downloaded from https://udger.com) I'm using, but I have no idea where.
Any suggestions?
Thanks, Michal
It could be a few things. Best thing is to check the logs, but here's a list of a few quick things you can check in a minute.
jar does not contain all dependencies. I am not sure how eclipse builds a runnable jar, but it may not include all dependencies. You can do
jar tf your-udf-jar.jar
to see what was included. You should see stuff from com.decibel.uasparser. If not, you have to build the jar with the appropriate dependencies (usually you do that using maven).
Different version of the JVM. If you compile with jdk8 and the cluster runs jdk7, it would also fail
Hive version. Sometimes the Hive APIs change slightly, enough to be incompatible. Probably not the case here, but make sure to compile the UDF against the same version of hadoop and hive that you have in the cluster
You should always check if info is null after the call to parse()
looks like the library uses a key, meaning that actually gets data from an online service (udger.com), so it may not work without an actual key. Even more important, the library updates online, contacting the online service for each record. This means, looking at the code, that it will create one update thread per record. You should change the code to do that only once in the constructor like the following:
Here's how to change it:
public class MyUDF extends UDF {
UASparser parser = new UASparser();
public MyUDF() {
super()
String key = "PUT YOUR KEY HERE";
// update only once, when the UDF is instantiated
OnlineUpdater update = new OnlineUpdater(parser, key);
}
public String evaluate(String i) {
UserAgentInfo info = parser.parse(i);
if(info!=null) return info.getDeviceType();
// you want it to return null if it's unparseable
// otherwise one bad record will stop your processing
// with an exception
else return null;
}
}
But to know for sure, you have to look at the logs...yarn logs, but also you can look at the hive logs on the machine you're submitting the job on ( probably in /var/log/hive but it depends on your installation).
such a problem probably can be solved by steps:
overide the method UDF.getRequiredJars(), make it returning a hdfs file path list which values are determined by where you put the following xxx_lib folder into your hdfs. Note that , the list mist exactly contains each jar's full hdfs path strings ,such as hdfs://yourcluster/some_path/xxx_lib/some.jar
export your udf code by following "Runnable jar file exporting wizard" (chose "copy required libraries into a sub folder next to the generated jar". This steps will result in a xxx.jar and a lib folder xxx_lib next to xxx.jar
put xxx.jar and the folders xxx_lib to your hdfs filesystem according to your code in step 0.
create a udf using: add jar ${the-xxx.jar-hdfs-path}; create function your-function as $}qualified name of udf class};
Try it. I test this and it works
I ran into library loading problems after creating a jar from my code via maven. I use intelliJ idea on Ubuntu. I broke the problem down to this situation:
Calling the following code from within idea it prints the path correctly.
package com.myproject;
public class Starter {
public static void main(String[] args) {
File classpathRoot = new File(Starter.class.getResource("/").getPath());
System.out.println(classpathRoot.getPath());
}
}
Output is:
/home/ted/java/myproject/target/classes
When I called mvn install and try to run it from command line using the following command I'm getting a NullPointerException since class.getResource() returns null:
cd /home/ted/java/myproject/target/
java -cp myproject-0.1-SNAPSHOT.jar com.myproject.Starter
same for calling:
cd /home/ted/java/myproject/target/
java -Djava.library.path=. -cp myproject-0.1-SNAPSHOT.jar com.myproject.Starter
It doesn't matter if I use class.getClassLoader().getRessource("") instead. Same problem when accessing single files inside of the target directory instead via class.getClassLoader().getRessource("file.txt").
I want to use this way to load native files in the same directory (not from inside the jar). What's wrong with my approach?
The classpath loading mechanism in the JVM is highly extensible, so it's often hard to guarantee a single method that would work in all cases. e.g. What works in your IDE may not work when running in a container because your IDE and your container probably have highly specialized class loaders with different requirements.
You could take a two tiered approach. If the method above fails, you could get the classpath from the system properties, and scan it for the jar file you're interested in and then extract the directory from that entry.
e.g.
public static void main(String[] args) {
File f = findJarLocation("jaxb-impl.jar");
System.out.println(f);
}
public static File findJarLocation(String entryName) {
String pathSep = System.getProperty("path.separator");
String[] pathEntries = System.getProperty("java.class.path").split(pathSep);
for(String entry : pathEntries) {
File f = new File(entry);
if(f.getName().equals(entryName)) {
return f.getParentFile();
}
}
return null;
}
I am starting to switch from a well-known Java build system to Gradle to build all my projects, and after barely two hours into it I have already been able to publish a new version of one of my projects without a problem -- a breeze.
But now I encounter a difficulty. In short, I need to replicate the functionality of this Maven plugin which generates the necessary files for a ServiceLoader-enabled service.
In short: given a base class foo.bar.MyClass, it generates a file named META-INF/services/foo.bar.MyClass whose content is a set of classes in the current project which implement that interface/extend that base class. Such a file would look like:
com.mycompany.MyClassImpl
org.othercompany.MyClassImpl
In order to do this, it uses I don't know what as a classloader, loads the Class objects for com.myCompany.MyClassImpl or whatever and checks whether this class implements the wanted interface.
I am trying to do the same in Gradle. Hours of googling led me to this plugin, but after discussing with its author a little, it appears this plugin is able to merge such files, not create them. So, I have to do that myself...
And I am a real beginner both with Gradle and Groovy, which does not help! Here is my current code, link to the full build.gradle here; output (which I managed to get somehow; doesn't work from a clean dir) shown below (and please bear with me... I do Java, and I am final happy; Groovy is totally new to me):
/*
* TEST CODE
*/
final int CLASS_SUFFIX = ".class".length();
final URLClassLoader classLoader = this.class.classLoader;
// Where the classes are: OK
final File classesDir = sourceSets.main.output.classesDir;
final String basePath = classesDir.getCanonicalPath();
// Add them to the classloader: OK
classLoader.addURL(classesDir.toURI().toURL())
// Recurse over each file
classesDir.eachFileRecurse {
// You "return" from a closure, you do not "continue"...
if (!isPotentialClass(it))
return;
// Transform into a class name
final String path = it.getAbsolutePath();
final String name = path.substring(basePath.length() + 1);
final String className = name.substring(0, name.length() - CLASS_SUFFIX)
.replace('/', '.');
// Try and load it
try {
classLoader.loadClass(className);
println(className);
} catch (NoClassDefFoundError ignored) {
println("failed to load " + className + ": " + ignored);
}
}
boolean isPotentialClass(final File file)
{
return file.isFile() && file.name.endsWith(".class")
}
The output:
com.github.fge.msgsimple.InternalBundle
failed to load com.github.fge.msgsimple.bundle.MessageBundle: java.lang.NoClassDefFoundError: com/github/fge/Frozen
failed to load com.github.fge.msgsimple.bundle.MessageBundleBuilder: java.lang.NoClassDefFoundError: com/github/fge/Thawed
com.github.fge.msgsimple.bundle.PropertiesBundle$1
com.github.fge.msgsimple.bundle.PropertiesBundle
com.github.fge.msgsimple.provider.MessageSourceProvider
com.github.fge.msgsimple.provider.LoadingMessageSourceProvider$1
com.github.fge.msgsimple.provider.LoadingMessageSourceProvider$2
com.github.fge.msgsimple.provider.LoadingMessageSourceProvider$3
com.github.fge.msgsimple.provider.LoadingMessageSourceProvider$Builder
com.github.fge.msgsimple.provider.LoadingMessageSourceProvider
com.github.fge.msgsimple.provider.MessageSourceLoader
com.github.fge.msgsimple.provider.StaticMessageSourceProvider$Builder
com.github.fge.msgsimple.provider.StaticMessageSourceProvider$1
com.github.fge.msgsimple.provider.StaticMessageSourceProvider
com.github.fge.msgsimple.source.MessageSource
com.github.fge.msgsimple.source.MapMessageSource$Builder
com.github.fge.msgsimple.source.MapMessageSource$1
com.github.fge.msgsimple.source.MapMessageSource
com.github.fge.msgsimple.source.PropertiesMessageSource
com.github.fge.msgsimple.locale.LocaleUtils
com.github.fge.msgsimple.serviceloader.MessageBundleFactory
com.github.fge.msgsimple.serviceloader.MessageBundleProvider
:compileJava UP-TO-DATE
The problem is in the two first lines: Frozen and Thawed are in a different project, which is in the compile classpath but not in the classpath I managed to grab so far... As such, these classes cannot even load.
How do I modify that code so as to have the full compile classpath availabe? Is my first question. Second question: how do I plug that code, when it works, into the build process?
Here are some hints:
Create a new URLClassLoader, rather than reusing an existing one.
Initialize the class loader with sourceSets.main.compileClasspath (which is an Iterable<File>) rather than classesDir.
Turn the code into a Gradle task class. For more information, see "Writing a simple task class" in the Gradle User Guide.
Ideally, you'd use a library like ASM to analyze the code, rather than using a class loader. To avoid the case where you cannot load a class because it internally references a class that's not on the compile class path, you may want to initialize the class loader with sourceSets.main.runtimeClasspath instead.