JSON4s can't find constructor w/spark - java

I've run into an issue with attempting to parse json in my spark job. I'm using spark 1.1.0, json4s, and the Cassandra Spark Connector, with DSE 4.6. The exception thrown is:
org.json4s.package$MappingException: Can't find constructor for BrowserData org.json4s.reflect.ScalaSigReader$.readConstructor(ScalaSigReader.scala:27)
org.json4s.reflect.Reflector$ClassDescriptorBuilder.ctorParamType(Reflector.scala:108)
org.json4s.reflect.Reflector$ClassDescriptorBuilder$$anonfun$6.apply(Reflector.scala:98)
org.json4s.reflect.Reflector$ClassDescriptorBuilder$$anonfun$6.apply(Reflector.scala:95)
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
My code looks like this:
case class BrowserData(navigatorObjectData: Option[NavigatorObjectData],
flash_version: Option[FlashVersion],
viewport: Option[Viewport],
performanceData: Option[PerformanceData])
.... other case classes
def parseJson(b: Option[String]): Option[String] = {
implicit val formats = DefaultFormats
for {
browserDataStr <- b
browserData = parse(browserDataStr).extract[BrowserData]
navObject <- browserData.navigatorObjectData
userAgent <- navObject.userAgent
} yield (userAgent)
}
def getJavascriptUa(rows: Iterable[com.datastax.spark.connector.CassandraRow]): Option[String] = {
implicit val formats = DefaultFormats
rows.collectFirst { case r if r.getStringOption("browser_data").isDefined =>
parseJson(r.getStringOption("browser_data"))
}.flatten
}
def getRequestUa(rows: Iterable[com.datastax.spark.connector.CassandraRow]): Option[String] = {
rows.collectFirst { case r if r.getStringOption("ua").isDefined =>
r.getStringOption("ua")
}.flatten
}
def checkUa(rows: Iterable[com.datastax.spark.connector.CassandraRow], sessionId: String): Option[Boolean] = {
for {
jsUa <- getJavascriptUa(rows)
reqUa <- getRequestUa(rows)
} yield (jsUa == reqUa)
}
def run(name: String) = {
val rdd = sc.cassandraTable("beehive", name).groupBy(r => r.getString("session_id"))
val counts = rdd.map(r => (checkUa(r._2, r._1)))
counts
}
I use :load to load the file into the REPL, and then call the run function. The failure is happening in the parseJson function, as far as I can tell. I've tried a variety of things to try to get this to work. From similar posts, I've made sure my case classes are in the top level in the file. I've tried compiling just the case class definitions into a jar, and including the jar in like this: /usr/bin/dse spark --jars case_classes.jar
I've tried adding them to the conf like this: sc.getConf.setJars(Seq("/home/ubuntu/case_classes.jar"))
And still the same error. Should I compile all of my code into a jar? Is this a spark issue or a JSON4s issue? Any help at all appreciated.

Related

Cannot resolve symbol union

I am using below function to list the directories. It works in Azure databricks but when I am adding in IntelliJ project code, it is not able to resolve "union" keyword. Do I need import anything here?
def listLeafDirectories(path: String): Array[String] =
dbutils.fs.ls(path).map(file => {
// Work around double encoding bug
val path = file.path.replace("%25", "%").replace("%25", "%")
if (file.isDir) listLeafDirectories(path)
else Array[String](path.substring(0,path.lastIndexOf("/")+1))
}).reduceOption(_ union _).getOrElse(Array()).distinct
ADB Notebook successful execution:
Sorry, my mistake.
In the screenshot provided highlighting error, I have recurse flag which I was not passing to the function (in same function).
So below one worked:
def listDirectories(dir: String, recurse: Boolean): Array[String] = {
dbutils.fs.ls(dir).map(file => {
val path = file.path.replace("%25", "%").replace("%25", "%")
if (file.isDir) listDirectories(path,recurse)
else Array[String](path.substring(0, path.lastIndexOf("/")+1))
}).reduceOption(_ union _).getOrElse(Array()).distinct
}

scala matchers works from jar not from sbt

I have a code blurb thats doing some reflection on scala class files and looking for annotations,something like
// Create a new class loader with the directory
val cl = new URLClassLoader(allpaths.toArray)
getTypeNamesFromPath(sourceDir).map(s => {
val cls = cl.loadClass(s)
cls.getAnnotations.map(an =>
an match {
case q: javax.ws.rs.Path => println("found annotation")
case _ =>
})
})
This code prints "found annotation" when assembling this in a jar and running using java -jar, but doesn't print anything when running from sbt run.
sbt version is 13.8,
scala version 2.11.7
for completeness
private def getTypeNamesFromPath(file: File, currentPath: mutable.Stack[String] = new mutable.Stack[String]()): List[String] = {
if (file.isDirectory) {
var list = List[String]()
for (f <- file.listFiles()) {
currentPath.push(f.getName)
list = list ++ getTypeNamesFromPath(f, currentPath)
}
return list
}
if (currentPath.isEmpty)
throw new IllegalArgumentException(file.getAbsolutePath)
currentPath.pop()
if (file.getName.endsWith(".class")) {
return List(currentPath.foldRight("")((s, b) => if (b.isEmpty) s else b + "." + s) + "." + file.getName.stripSuffix(".class"))
}
return List()
}
so it turns out i need two things...
use URLClassLoader from scala.tools.nsc.util.ScalaClassLoader.URLClassLoader, this will solve loading the right classes.
set the fork := true in build.sbt, because once you do step #1, you will run into issues with class name clashes because sbt runs with own class paths and own version of class loader, more on it here http://www.scala-sbt.org/0.13.0/docs/Detailed-Topics/Forking.html

How to link classes from JDK into scaladoc-generated doc?

I'm trying to link classes from the JDK into the scaladoc-generated doc.
I've used the -doc-external-doc option of scaladoc 2.10.1 but without success.
I'm using -doc-external-doc:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/rt.jar#http://docs.oracle.com/javase/7/docs/api/, but I get links such as index.html#java.io.File instead of index.html?java/io/File.html.
Seems like this option only works for scaladoc-generated doc.
Did I miss an option in scaladoc or should I fill a feature request?
I've configured sbt as follows:
scalacOptions in (Compile,doc) += "-doc-external-doc:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/rt.jar#http://docs.oracle.com/javase/7/docs/api"
Note: I've seen the Opts.doc.externalAPI util in the upcoming sbt 0.13. I think a nice addition (not sure if it's possible) would be to pass a ModuleID instead of a File. The util would figure out which file corresponds to the ModuleID.
I use sbt 0.13.5.
There's no out-of-the-box way to have the feature of having Javadoc links inside scaladoc. And as my understanding goes, it's not sbt's fault, but the way scaladoc works. As Josh pointed out in his comment You should report to scaladoc.
There's however a workaround I came up with - postprocess the doc-generated scaladoc so the Java URLs get replaced to form proper Javadoc links.
The file scaladoc.sbt should be placed inside a sbt project and whenever doc task gets executed, the postprocessing via fixJavaLinksTask task kicks in.
NOTE There are lots of hardcoded paths so use it with caution (aka do the polishing however you see fit).
import scala.util.matching.Regex.Match
autoAPIMappings := true
// builds -doc-external-doc
apiMappings += (
file("/Library/Java/JavaVirtualMachines/jdk1.8.0_11.jdk/Contents/Home/jre/lib/rt.jar") ->
url("http://docs.oracle.com/javase/8/docs/api")
)
lazy val fixJavaLinksTask = taskKey[Unit](
"Fix Java links - replace #java.io.File with ?java/io/File.html"
)
fixJavaLinksTask := {
println("Fixing Java links")
val t = (target in (Compile, doc)).value
(t ** "*.html").get.filter(hasJavadocApiLink).foreach { f =>
println("fixing " + f)
val newContent = javadocApiLink.replaceAllIn(IO.read(f), fixJavaLinks)
IO.write(f, newContent)
}
}
val fixJavaLinks: Match => String = m =>
m.group(1) + "?" + m.group(2).replace(".", "/") + ".html"
val javadocApiLink = """\"(http://docs\.oracle\.com/javase/8/docs/api/index\.html)#([^"]*)\"""".r
def hasJavadocApiLink(f: File): Boolean = (javadocApiLink findFirstIn IO.read(f)).nonEmpty
fixJavaLinksTask <<= fixJavaLinksTask triggeredBy (doc in Compile)
I took the answer by #jacek-laskowski and modified it so that it avoid hard-coded strings and could be used for any number of Java libraries, not just the standard one.
Edit: the location of rt.jar is now determined from the runtime using sun.boot.class.path and does not have to be hard coded.
The only thing you need to modify is the map, which I have called externalJavadocMap in the following:
import scala.util.matching.Regex
import scala.util.matching.Regex.Match
val externalJavadocMap = Map(
"owlapi" -> "http://owlcs.github.io/owlapi/apidocs_4_0_2/index.html"
)
/*
* The rt.jar file is located in the path stored in the sun.boot.class.path system property.
* See the Oracle documentation at http://docs.oracle.com/javase/6/docs/technotes/tools/findingclasses.html.
*/
val rtJar: String = System.getProperty("sun.boot.class.path").split(java.io.File.pathSeparator).collectFirst {
case str: String if str.endsWith(java.io.File.separator + "rt.jar") => str
}.get // fail hard if not found
val javaApiUrl: String = "http://docs.oracle.com/javase/8/docs/api/index.html"
val allExternalJavadocLinks: Seq[String] = javaApiUrl +: externalJavadocMap.values.toSeq
def javadocLinkRegex(javadocURL: String): Regex = ("""\"(\Q""" + javadocURL + """\E)#([^"]*)\"""").r
def hasJavadocLink(f: File): Boolean = allExternalJavadocLinks exists {
javadocURL: String =>
(javadocLinkRegex(javadocURL) findFirstIn IO.read(f)).nonEmpty
}
val fixJavaLinks: Match => String = m =>
m.group(1) + "?" + m.group(2).replace(".", "/") + ".html"
/* You can print the classpath with `show compile:fullClasspath` in the SBT REPL.
* From that list you can find the name of the jar for the managed dependency.
*/
lazy val documentationSettings = Seq(
apiMappings ++= {
// Lookup the path to jar from the classpath
val classpath = (fullClasspath in Compile).value
def findJar(nameBeginsWith: String): File = {
classpath.find { attributed: Attributed[File] => (attributed.data ** s"$nameBeginsWith*.jar").get.nonEmpty }.get.data // fail hard if not found
}
// Define external documentation paths
(externalJavadocMap map {
case (name, javadocURL) => findJar(name) -> url(javadocURL)
}) + (file(rtJar) -> url(javaApiUrl))
},
// Override the task to fix the links to JavaDoc
doc in Compile <<= (doc in Compile) map {
target: File =>
(target ** "*.html").get.filter(hasJavadocLink).foreach { f =>
//println(s"Fixing $f.")
val newContent: String = allExternalJavadocLinks.foldLeft(IO.read(f)) {
case (oldContent: String, javadocURL: String) =>
javadocLinkRegex(javadocURL).replaceAllIn(oldContent, fixJavaLinks)
}
IO.write(f, newContent)
}
target
}
)
I am using SBT 0.13.8.

Strange NullPointerException in Actors in scala using play framework

I want to do some parallel computation and I'm getting a really strange java.lang.NullPointerException on calling ANY functions outside the object I have.
Take a look:
case class Return(session: String, job: Int)
case class Ready(n: Int)
case class DoJob(session: String, job: Int)
case class NotReady
object Notifications extends Controller with Secure {
class AtorMeio extends Actor {
import scala.collection.mutable.{Map => MMap}
val job: MMap[(String, Int), Option[Int]] = MMap()
def act {
loop {
react {
case DoJob(session, jobn) =>
if (job.get((session, jobn)).isEmpty) {
jobn match {
case 1 =>
job.put((session, jobn), None)
val n = Messaging.oi //Messaging.retrieveNumberOfMessages(new FlagTerm(new Flags(Flags.Flag.SEEN), false))
job.put((session, jobn), Some(n))
case 2 =>
// do!
}
}
case Return(session, jobn) =>
if (job.get((session, jobn)).isDefined && job.get((session, jobn)).get.isDefined) {
val ret = job.get((session, jobn)).get.get
job.remove((session, jobn))
reply(Ready(ret))
}
else
reply(NotReady)
}
}
}
}
private var meuator: AtorMeio = null
lazy val ator = {
if (Option(meuator).isEmpty) {
meuator = new AtorMeio
meuator.start
}
meuator
}
def pendingNotifications = {
ator ! DoJob(session.getId, 1)
ator !? Return(session.getId, 1) match {
case Ready(ret) =>
if (ret.toString != Option[String](params.get("current")).getOrElse("-1")) "true" else Suspend("80s")
case _ =>
}
}
}
I'm getting an error in executing Messaging.oi which is basically an object with:
def oi = 4
Here is the stacktrace:
controllers.Notifications$AtorMeio#1889d53: caught java.lang.NullPointerException
java.lang.NullPointerException
at controllers.Messaging$.oi(Messaging.scala:108)
at controllers.Notifications$AtorMeio$$anonfun$act$1$$anonfun$apply$1.apply(Notifications.scala:38)
at controllers.Notifications$AtorMeio$$anonfun$act$1$$anonfun$apply$1.apply(Notifications.scala:31) at scala.actors.ReactorTask.run(ReactorTask.scala:34)
at scala.actors.ReactorTask.compute(ReactorTask.scala:66)
at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:147)
at scala.concurrent.forkjoin.ForkJoinTask.quietlyExec(ForkJoinTask.java:422)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.mainLoop(ForkJoinWorkerThread.java:340)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:325)
Line 108 is exactly this oneliner def. Ahh entrance point is def pendingNotifications.
Anyone can help? Thanks a lot!
Have you tried replacing
private var meuator: AtorMeio = null
by either:
private var meuator: AtorMeio = None
Configure your breakpoints view in your debugger to halt/break on NullPointerExceptions ...
And: you did see you set this to null here:
private var meuator: AtorMeio = null
?? Or?
Ok people, after digging a lot I discovered the problem: Somehow, somewhere if you have the "Controller" class from play framework mixed in, it crashes mercifully. So I just wrapped this thing into a 'clean' class and it worked.

Is there a good library to embed a command prompt in a scala (or java) application

I have an application that I'd like to have a prompt in. If it helps, this is a graph database implementation and I need a prompt just like any other database client (MySQL, Postgresql, etc.).
So far I have my own REPL like so:
object App extends Application {
REPL ! Read
}
object REPL extends Actor {
def act() {
loop {
react {
case Read => {
print("prompt> ")
var message = Console.readLine
this ! Eval(message)
}
case More(sofar) => {
//Eval didn't see a semicolon
print(" --> ")
var message = Console.readLine
this ! Eval(sofar + " " + message)
}
case Eval(message) => {
Evaluator ! Eval(message)
}
case Print(message) => {
println(message)
//And here's the loop
this ! Read
}
case Exit => {
exit()
}
case _ => {
println("App: How did we get here")
}
}
}
}
this.start
}
It works, but I would really like to have something with history. Tab completion is not necessary.
Any suggestions on a good library? Scala or Java works.
Just to be clear I don't need an REPL to evaluate my code (I get that with scala!), nor am I looking to call or use something from the command line. I'm looking for a prompt that is my user experience when my client app starts up.
Scala itself, and lots of programs out there, uses a readline-like library for its REPL. Specifically, JLine.
I found another question about this, for which the answers don't seem promising.
BeanShell does some of what you want: http://www.beanshell.org/
I got it. these two blogs really helped.
http://danielwestheide.com/blog/2013/01/09/the-neophytes-guide-to-scala-part-8-welcome-to-the-future.html
http://danielwestheide.com/blog/2013/01/16/the-neophytes-guide-to-scala-part-9-promises-and-futures-in-practice.html
def interprete(code: String) : Future[String] = {
val p = Promise[String]()
Future {
var result = reader.readLine()
p.success(result)
}
writer.write(code + "\n")
writer.flush()
p.future
}
for (ln <- io.Source.stdin.getLines){
val f = interprete(ln)
f.onComplete {
case Success(s) =>
println("future returned: " + s)
case Failure(ex) =>
println(s"interpreter failed due to ${ex.getMessage}")
}
}

Categories