Loading Mllib models outside Spark - java

I'm training a model in spark with mllib and saving it:
val model = SVMWithSGD.train(training, numIterations)
model.save(sc, "~/model")
but I'm having trouble loading it from a java app without spark to make real time predictions.
SparkConf sconf = new SparkConf().setAppName("Application").setMaster("local");
SparkContext sc = new SparkContext(sconf);
SVMModel model = SVMModel.load(sc, "/model");
I'm getting:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkConf
at ModelUser$.main(ModelUser.scala:11)
at ModelUser.main(ModelUser.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkConf
Is there a way to load the model in normal java app?

Have a look PMML model export here

PPML model export in spark is not being maintained anymore, and only the old RDD api support it.
I've been using jpmml-sparkml to solve the problem. It also has the java runtime for standalone model execution.

Related

Flink can not find groovy class during checkpoint

i have a problem in flink. my real-time compute engine use groovy script to expend compute type(like:sum、average、count adn so on). we define a standard compute interface(AbstractCompute),and if i want expand a compute type in this framework i just need impl AbstractCompute.and then store groovy script in the db. then application can read script by task and load into jvm by GroovyClassLoader.
This process does not use Flink again, depending on the work is very good。The reason is that Flink uses another ClassLoader (FlinkUserCodeClassLoaders$ChildFirstClassLoader) to load the object instantiated by the groovy script at checkpoint instead of using GroovyClassLoader.
Code
// Init Groovy ClassLoader
CompilerConfiguration classLoaderConfig = new CompilerConfiguration();
classLoaderConfig.setSourceEncoding("UTF-8");
CLASS_LOADER = new GroovyClassLoader(Thread.currentThread().getContextClassLoader(), classLoaderConfig);
......
......
// parse script and new instance and put into cache
Class clazz = CLASS_LOADER.parseClass(computeType.getScript());
AbstractComputable computableObject = (AbstractComputable) clazz.newInstance();
removeComputeType(computeType);
// 自定义计算方式对象存入缓存
IndicatorCache.COMPUTABLE_OBJECT_CACHE.put(computeType.getId().intValue(), computableObject);
......
......
AbstractComputable computable = IndicatorCache.COMPUTABLE_OBJECT_CACHE.get(indicator.getComputeType());
if (computable == null) {
if (log.isDebugEnabled()) {
log.debug("without computeType:{} in cache", indicator.getComputeType());
}
return false;
}
indicator.setComputableObject(computable);
Exception stack:
com.esotericsoftware.kryo.KryoException: Unable to find class: com.xxx.xxx.common.computable.CurValueCompute
Serialization trace:
computableObject (com.xxx.xxx.common.pojo.property.IndicatorProperty)
normalIndicatorList (com.xxx.xxx.common.pojo.property.ComputeTuple)
at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:116)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:22)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:657)
at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.copy(KryoSerializer.java:231)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:577)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at org.apache.flink.streaming.api.operators.StreamFilter.processElement(StreamFilter.java:40)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollectWithTimestamp(StreamSourceContexts.java:310)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collectWithTimestamp(StreamSourceContexts.java:409)
at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
at org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.emitRecord(Kafka010Fetcher.java:89)
at org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:154)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:738)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:94)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:58)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.xxx.xxx.common.computable.CurValueCompute
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$ChildFirstClassLoader.loadClass(FlinkUserCodeClassLoaders.java:129)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
... 41 common frames omitted
How to use Groovy dynamic language correctly in Flink?
Flink needs to know the types it is processing. Otherwise it is not possible to serialize and deserialize the instances. Therefore, the class definitions need to be contained in the user code jar which you submit to the Flink cluster.
If you want to support dynamically loaded classes, then you should serialize these instances into a generic format (e.g. AbstractComputeContainer) which you completely resolve in the user code function where you have the GroovyClassLoader.

Jython 2.2 - java.lang.ExceptionInInitializerError when accessing class .getPackage()

I have a Java class myObj with a static block as shown:
static{
Class<myObj> klass = myObj.class;
log.info("\nClientAPIVersion : "+ klass.getPackage().getImplementationVersion());
}
I'm creating an instance of this class in jython 2.2.
When I run my python script using :
java -Dlog4j.configuration=file://${CLASSPATH}/log4j.xml -jar ~/jython_2.2/jython.jar test.py
I get an exception as shown:
File "test.py", line 42, in ?
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
java.lang.ExceptionInInitializerError: java.lang.ExceptionInInitializerError
I found online that java.lang.ExceptionInInitializerError occurs due to a crash in the static block.
When I remove the log.info, the python code executes correctly.
I think that klass.getPackage() might be null, but if so, it should simply print 'null' in the log. Why the exception ?
What is the source of the problem and how may I fix it ? Thanks

UDF using Java methods breaks on spark

I have done this code on databricks environment but when I try it on my local env it breaks...
val _event_day_of_week = (event_date_of_event: String) => {
import java.time.LocalDate
import java.time.format.DateTimeFormatter
val formatter: DateTimeFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd")
val dayOfWeek: String = LocalDate.parse(event_date_of_event.substring(0,10), formatter).getDayOfWeek.toString
dayOfWeek
}
val event_day_of_weekUDF = udf(_event_day_of_week)
df.select($"uuid", event_day_of_weekUDF($"event_date_of_event") as "event_day_of_week").first
Error:
Exception in thread "main" java.lang.NullPointerException
at com.faniak.ml.eventBuzz$.delayedEndpoint$com$faniak$ml$eventBuzz$1(eventBuzz.scala:72)
at com.faniak.ml.eventBuzz$delayedInit$body.apply(eventBuzz.scala:17)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at com.faniak.ml.eventBuzz$.main(eventBuzz.scala:17)
at com.faniak.ml.eventBuzz.main(eventBuzz.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
version is Spark 2.1
The problem had nothing to do with the UDFs. When prototyping on Apache Spark, do not extend the Scala class App because it does not work properly with Spark.
object EventBuzzDataset extends App {
In order to make it work you should write:
object EventBuzzDataset{
def main(args: Array[String])
The problem is well detailed here:
https://issues.apache.org/jira/browse/SPARK-4170
and
https://github.com/apache/spark/pull/3497
Thanks to #puhlen for the hint!

InvocationTargetException. Cannot cast class X to class X. When invoked in Scala Imain through spark-submit

So, I have the following use case.
I'm simplifying the usage of Spark dataframes for a particular domain by providing a DSL like interface.
All this code goes in a fat jar created by maven shade plugin. (fat jar = without spark and hadoop dependencies)
This fat jar has a main class, lets call it JavaMain.
Inside JavaMain, I make a rest call to get a string whose contents are valid DSL.
I instantiate a IMain object with initial Settings object.
And I bind a few variables. using the imain.bind method.
However this bind fails with the following error:
Set failed in bind(results, com.dhruv.dsl.DslDataFrame.DSLResults, com.dhruv.dsl.DslDataFrame$DSLResults#7650a5f3)
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.callEither(IMain.scala:738)
at scala.tools.nsc.interpreter.IMain.bind(IMain.scala:625)
at scala.tools.nsc.interpreter.IMain.bind(IMain.scala:661)
at scala.tools.nsc.interpreter.IMain.bind(IMain.scala:662)
at com.thoughtworks.dsl.DSL.run(DSL.scala:44)
at com.thoughtworks.dsl.JavaMain.run(JavaMain.java:30)
at com.thoughtworks.dsl.JavaMain.main(JavaMain.java:43)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassCastException: com.thoughtworks.dsl.DslDataFrame$DSLResults cannot be cast to com.thoughtworks.dsl.DslDataFrame$DSLResults
at $line3.$eval$.set(<console>:6)
at $line3.$eval.set(<console>)
... 21 more
More context:
I had issues with classpath when trying this out. Although it seems like I haven't been able to resolve them all.
Earlier when creating Setting object, I was doing something like this:
val settings = {
val x = new Settings()
x.classpath.value += File.pathSeparator + System.getProperty("java.class.path")
x.usejavacp.value = true
x.verbose.value = true
x
}
However this didn't seem to work as when doing a spark submit this only had the spark and hadoop related jars on the classpath.
I then added the following to the classpath:
val urLs: Array[URL] = Thread.currentThread.getContextClassLoader.asInstanceOf[URLClassLoader].getURLs
and did following:
val settings = {
val x = new Settings()
x.classpath.value += File.pathSeparator + urLs(0)
x.usejavacp.value = true
x.verbose.value = true
x
}
This is the code I'm using to bind the objects:
interpreter.bind("notagin", new SomeDummyObject)
This throws the exception I attached earlier.
Interestingly the following code works: (i.e. imports and new of the same object inside Interpreter doesn't cause a problem)
interpreter.interpret(
"""
import com.dhruv.dsl.operations._
import com.dhruv.dsl.implicits._
import com.dhruv.dsl.DslDataFrame._
import org.apache.spark.sql.Column
import com.dhruv.dsl._
implicit def RichColumn(column: Column): RichColumn = new RichColumn(column)
val justdont = new SomeDummyObject()
justdont.justdontcallme(thatJson)
"""
)
Another detail that I'm aware of and is bothering me is that IMain internally does change the classloader. Not sure if that is causing the issue.
Any help is more than appreciated.
Okay. So we figured out how to solve the problem for us.
I think IMain uses a different classloader to load classes then the one they are supposed to be loaded with. Anyways, the following solves the problem, leaving it for others to have a look.
val interpreter = new IMain(settings){
override protected def parentClassLoader: ClassLoader = this.getClass.getClassLoader
}

Using Google-Reflection within Groovy causes exception whereas equivalent Java code works

I'm trying to use some code from another answer on SO, and while the code run in Java, from Groovy it causes an exception.
The code in question is:
Reflections reflections = new Reflections(new ConfigurationBuilder()
.setScanners( new SubTypesScanner(false /* don't exclude Object.class */), new ResourcesScanner() )
.setUrls(ClasspathHelper.forClassLoader(classLoadersList.toArray(new ClassLoader[0])))
.filterInputsBy(
new FilterBuilder()
.include( prefix( "net.initech" ) )
.exclude( prefix( "net.initech.util" )
)));
The exception is getting throwline in question seems to be: ClasspathHelper.forClassLoader(...)
The happens regardless of whether I'm using #CompileStatic or not. Also, tried just using this.getClassLoader() and the same issue occurs.
The exception is:
Exception in thread "main" java.lang.NoClassDefFoundError: javax/servlet/ServletContext
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2688)
at java.lang.Class.getDeclaredMethods(Class.java:1962)
at org.codehaus.groovy.reflection.stdclasses.CachedSAMClass.getAbstractMethods(CachedSAMClass.java:91)
at org.codehaus.groovy.reflection.stdclasses.CachedSAMClass.getSAMMethod(CachedSAMClass.java:155)
at org.codehaus.groovy.reflection.ClassInfo.isSAM(ClassInfo.java:280)
at org.codehaus.groovy.reflection.ClassInfo.createCachedClass(ClassInfo.java:270)
at org.codehaus.groovy.reflection.ClassInfo.access$400(ClassInfo.java:36)
at org.codehaus.groovy.reflection.ClassInfo$LazyCachedClassRef.initValue(ClassInfo.java:441)
at org.codehaus.groovy.reflection.ClassInfo$LazyCachedClassRef.initValue(ClassInfo.java:432)
at org.codehaus.groovy.util.LazyReference.getLocked(LazyReference.java:46)
at org.codehaus.groovy.util.LazyReference.get(LazyReference.java:33)
at org.codehaus.groovy.reflection.ClassInfo.getCachedClass(ClassInfo.java:89)
at org.codehaus.groovy.reflection.ReflectionCache.getCachedClass(ReflectionCache.java:107)
at groovy.lang.MetaClassImpl.(MetaClassImpl.java:163)
at groovy.lang.MetaClassImpl.(MetaClassImpl.java:187)
at groovy.lang.MetaClassImpl.(MetaClassImpl.java:193)
at groovy.lang.MetaClassRegistry$MetaClassCreationHandle.createNormalMetaClass(MetaClassRegistry.java:158)
at groovy.lang.MetaClassRegistry$MetaClassCreationHandle.createWithCustomLookup(MetaClassRegistry.java:148)
at groovy.lang.MetaClassRegistry$MetaClassCreationHandle.create(MetaClassRegistry.java:131)
at org.codehaus.groovy.reflection.ClassInfo.getMetaClassUnderLock(ClassInfo.java:175)
at org.codehaus.groovy.reflection.ClassInfo.getMetaClass(ClassInfo.java:192)
at org.codehaus.groovy.runtime.metaclass.MetaClassRegistryImpl.getMetaClass(MetaClassRegistryImpl.java:255)
at org.codehaus.groovy.runtime.InvokerHelper.getMetaClass(InvokerHelper.java:859)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.createCallStaticSite(CallSiteArray.java:72)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.createCallSite(CallSiteArray.java:159)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at net.initech.DeltaCodeGen.main(DeltaCodeGen.groovy:27)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: java.lang.ClassNotFoundException: javax.servlet.ServletContext
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 35 more
I can work around this by adding to my POM.xml
<dependency>
<groupId>org.apache.tomcat</groupId>
<artifactId>servlet-api</artifactId>
<version>6.0.37</version>
</dependency>
but I shouldn't have, and don't have it in the Java version.
You might be running into the well-known problem that the Groovy compiler sometimes needs runtime dependencies to be put on its compile class path. This is because the compiler uses Java reflection to access its compile class path. There are concrete plans to fix this in an upcoming release (don't remember if it's 2.x or 3.0).
Looks like the domain you wish to scan is "net.initech". In that case, why not using ClasspathHelper.forPackage("net.initech") (and leaving the exclude pattern)?
Second, what's the idea of using new ClassLoader[0]?
Also, note the using new SubTypesScanner(false) is not a best practice, as it might create a huge md store of all classes (well, all classes are derived from Object).
Basically Reflections does not intend to list all classes (though it obviously can), but to aggregate types based on some criteria (annotation/supertype and so).

Categories