UDF using Java methods breaks on spark - java

I have done this code on databricks environment but when I try it on my local env it breaks...
val _event_day_of_week = (event_date_of_event: String) => {
import java.time.LocalDate
import java.time.format.DateTimeFormatter
val formatter: DateTimeFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd")
val dayOfWeek: String = LocalDate.parse(event_date_of_event.substring(0,10), formatter).getDayOfWeek.toString
dayOfWeek
}
val event_day_of_weekUDF = udf(_event_day_of_week)
df.select($"uuid", event_day_of_weekUDF($"event_date_of_event") as "event_day_of_week").first
Error:
Exception in thread "main" java.lang.NullPointerException
at com.faniak.ml.eventBuzz$.delayedEndpoint$com$faniak$ml$eventBuzz$1(eventBuzz.scala:72)
at com.faniak.ml.eventBuzz$delayedInit$body.apply(eventBuzz.scala:17)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at com.faniak.ml.eventBuzz$.main(eventBuzz.scala:17)
at com.faniak.ml.eventBuzz.main(eventBuzz.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
version is Spark 2.1

The problem had nothing to do with the UDFs. When prototyping on Apache Spark, do not extend the Scala class App because it does not work properly with Spark.
object EventBuzzDataset extends App {
In order to make it work you should write:
object EventBuzzDataset{
def main(args: Array[String])
The problem is well detailed here:
https://issues.apache.org/jira/browse/SPARK-4170
and
https://github.com/apache/spark/pull/3497
Thanks to #puhlen for the hint!

Related

How do I import singleton object from Scala package in Java?

I am trying to use ARIMA object (Scala), which is imported from a package, in my Java program. Although the compilation succeeds, meaning that ARIMA class is recognized during compilation, there is NoClassDefFoundError for the ARIMA object in runtime. ARIMAModel class has no problem with importing since it is a class.
Is there any way to use the Scala object from my Java program?
Here is the source code for the object in Scala package.
File: .../com/cloudera/sparkts/models/ARIMA.scala
package com.cloudera.sparkts.models
object ARIMA {
def autoFit(ts: Vector, maxP: Int = 5, maxD: Int = 2, maxQ: Int = 5): ARIMAModel = {
...
}
}
class ARIMAModel(...) {
...
}
Here is my Java code.
File: src/main/java/SingleSeriesARIMA.java
import com.cloudera.sparkts.models.ARIMA;
import com.cloudera.sparkts.models.ARIMAModel;
public class SingleSeriesARIMA {
public static void main(String[] args) {
...
ARIMAModel arimaModel = ARIMA.autoFit(tsVector, 1, 0, 1);
...
}
}
Here is the error.
Exception in thread "main" java.lang.NoClassDefFoundError: com/cloudera/sparkts/models/ARIMA
at SingleSeriesARIMA.main(SingleSeriesARIMA.java:43)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.cloudera.sparkts.models.ARIMA
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 10 more
I am using Scala version 2.11.8 and Java 1.8
You need to supply the dependency having Arima object present to the spark cluster using --jars option as below-
spark-submit --jars <path>/<to>/sparkts-0.4.1.jar --class SingleSeriesARIMA target/simple-project-1.0.jar
This will pass the other dependency along with the application jar to be available at spark-runtime.
TO call ARIMA object from java use-
ARIMA$.MODULE$.autoFit(tsVector, 1, 0, 1);

Got NoClassDefFoundError when passing map of lambda

I create an Android app by using Kotlin, and there's something strange with lambda. I pass mapOf(1 to {...}, 2 to {...}) and get NoClassDefFoundError or ClassNotFoundException.
I try to rewrite it in desktop, and get the same, but with different stack trace.
fun main(args: Array<String>) {
call(mapOf(
1 to { "asd" },
2 to { 999 }
))
}
fun call(x: Map<Int, () -> Any>) {
}
This is the stack trace:
Exception in thread "main" java.lang.NoClassDefFoundError: kotlin/jvm/internal/Intrinsics
at TestKt.main(test.kt)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.jetbrains.kotlin.runner.AbstractRunner.run(runners.kt:61)
at org.jetbrains.kotlin.runner.Main.run(Main.kt:109)
at org.jetbrains.kotlin.runner.Main.main(Main.kt:119)
Caused by: java.lang.ClassNotFoundException: kotlin.jvm.internal.Intrinsics
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 8 more
It is not a problem in your code. It is a problem with running your application.
Kotlin comes with its own standard class library kotlin-runtime.jar. You should have this library in classpath:
java -cp $KOTLIN_HOME/lib/kotlin-runtime.jar:MyApp.jar com.my.AppKt
or your should compile the application with -include-runtime:
kotlinc app.kt -include-runtime -d MyApp.jar

Jython 2.2 - java.lang.ExceptionInInitializerError when accessing class .getPackage()

I have a Java class myObj with a static block as shown:
static{
Class<myObj> klass = myObj.class;
log.info("\nClientAPIVersion : "+ klass.getPackage().getImplementationVersion());
}
I'm creating an instance of this class in jython 2.2.
When I run my python script using :
java -Dlog4j.configuration=file://${CLASSPATH}/log4j.xml -jar ~/jython_2.2/jython.jar test.py
I get an exception as shown:
File "test.py", line 42, in ?
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
java.lang.ExceptionInInitializerError: java.lang.ExceptionInInitializerError
I found online that java.lang.ExceptionInInitializerError occurs due to a crash in the static block.
When I remove the log.info, the python code executes correctly.
I think that klass.getPackage() might be null, but if so, it should simply print 'null' in the log. Why the exception ?
What is the source of the problem and how may I fix it ? Thanks

Loading Mllib models outside Spark

I'm training a model in spark with mllib and saving it:
val model = SVMWithSGD.train(training, numIterations)
model.save(sc, "~/model")
but I'm having trouble loading it from a java app without spark to make real time predictions.
SparkConf sconf = new SparkConf().setAppName("Application").setMaster("local");
SparkContext sc = new SparkContext(sconf);
SVMModel model = SVMModel.load(sc, "/model");
I'm getting:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkConf
at ModelUser$.main(ModelUser.scala:11)
at ModelUser.main(ModelUser.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkConf
Is there a way to load the model in normal java app?
Have a look PMML model export here
PPML model export in spark is not being maintained anymore, and only the old RDD api support it.
I've been using jpmml-sparkml to solve the problem. It also has the java runtime for standalone model execution.

Jimfs path to ceylon Directory

I'm trying to create a directory (using ceylon's file module) in the Jimfs file system but I'm having problems with the Jimfs provider not being installed when accessing the filesystem from ceylon.
This is my test program:
// File: test.se.gustavkarlsson.autogit.file.watcher.run
import ceylon.file {
Nil,
parseURI
}
import com.google.common.jimfs {
Jimfs {
jimFs=newFileSystem
}
}
shared void run() {
value fs = jimFs();
value jPath = fs.getPath("directory");
value uri = jPath.toUri().string;
value path = parseURI(uri);
value resource = path.resource;
assert (is Nil resource);
resource.createDirectory();
}
When run, prints the following stacktrace:
ceylon run: Provider "jimfs" not found
java.nio.file.ProviderNotFoundException: Provider "jimfs" not found
at java.nio.file.FileSystems.newFileSystem(FileSystems.java:341)
at java.nio.file.FileSystems.newFileSystem(FileSystems.java:276)
at ceylon.file.internal.createSystem_.createSystem(ConcreteSystem.ceylon:64)
at ceylon.file.createSystem_.createSystem(System.ceylon:43)
at test.se.gustavkarlsson.autogit.file.watcher.run_.run(run.ceylon:17)
at test.se.gustavkarlsson.autogit.file.watcher.run_.main(run.ceylon)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at ceylon.modules.api.runtime.SecurityActions.invokeRunInternal(SecurityActions.java:57)
at ceylon.modules.api.runtime.SecurityActions.invokeRun(SecurityActions.java:48)
at ceylon.modules.api.runtime.AbstractRuntime.invokeRun(AbstractRuntime.java:75)
at ceylon.modules.api.runtime.AbstractRuntime.execute(AbstractRuntime.java:122)
at ceylon.modules.api.runtime.AbstractRuntime.execute(AbstractRuntime.java:106)
at ceylon.modules.Main.execute(Main.java:69)
at ceylon.modules.Main.main(Main.java:42)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.jboss.modules.Module.run(Module.java:312)
at org.jboss.modules.Main.main(Main.java:460)
at ceylon.modules.bootstrap.CeylonRunTool.run(CeylonRunTool.java:244)
at com.redhat.ceylon.common.tools.CeylonTool.run(CeylonTool.java:491)
at com.redhat.ceylon.common.tools.CeylonTool.execute(CeylonTool.java:380)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.redhat.ceylon.launcher.Launcher.runInJava7Checked(Launcher.java:114)
at com.redhat.ceylon.launcher.Launcher.run(Launcher.java:41)
at com.redhat.ceylon.launcher.Launcher.run(Launcher.java:34)
at com.redhat.ceylon.launcher.Launcher.main(Launcher.java:27)
Any ideas on how to install that provider?
I'm running Ceylon 1.2.0 on Linux, with JimFs 1.0 (also tested 1.1-rc1) and working with Jimfs the "intended" way (pure java nio) works fine.
This is related to module visibility, where we need to add a "read" (using Jigsaw terminology) from the JDK to the jimFs module.
I've opened https://github.com/ceylon/ceylon/issues/5995 to investigate.

Categories