I am using below function to list the directories. It works in Azure databricks but when I am adding in IntelliJ project code, it is not able to resolve "union" keyword. Do I need import anything here?
def listLeafDirectories(path: String): Array[String] =
dbutils.fs.ls(path).map(file => {
// Work around double encoding bug
val path = file.path.replace("%25", "%").replace("%25", "%")
if (file.isDir) listLeafDirectories(path)
else Array[String](path.substring(0,path.lastIndexOf("/")+1))
}).reduceOption(_ union _).getOrElse(Array()).distinct
ADB Notebook successful execution:
Sorry, my mistake.
In the screenshot provided highlighting error, I have recurse flag which I was not passing to the function (in same function).
So below one worked:
def listDirectories(dir: String, recurse: Boolean): Array[String] = {
dbutils.fs.ls(dir).map(file => {
val path = file.path.replace("%25", "%").replace("%25", "%")
if (file.isDir) listDirectories(path,recurse)
else Array[String](path.substring(0, path.lastIndexOf("/")+1))
}).reduceOption(_ union _).getOrElse(Array()).distinct
}
Related
I'm trying to take the JSON output of an analysis tool, and turn it into a Java list.
I'm doing this with Scala, and while it doesn't sound that hard, I've run into trouble.
So I was hoping the following code would do it:
def returnWarnings(json: JSONObject): java.util.List[String] ={
var bugs = new ArrayBuffer[String]
for (warning <- json.getJSONArray("warnings").){ //might need to add .asScala
bugs += (warning.getString("warning_type") + ": " + warning.getString("message"))
}
val rval: java.util.List[String] = bugs.asJava
rval
}
This block produces two errors when I try to compile it:
Error:(18, 42) value foreach is not a member of org.json.JSONArray
for (warning <- json.getJSONArray("warnings")){ //might need to add .asScala
^
and
Error:(21, 49) value asJava is not a member of scala.collection.mutable.ArrayBuffer[String]
val rval: java.util.List[String] = bugs.asJava
^
I don't know what's wrong with my for loop.
EDIT: with a bit more reading, I figured out what was up with the loop. see https://stackoverflow.com/a/6376083/5843840
The second error is especially baffling, because as far as I can tell it should work. It is really similar to the code from this documentation
scala> val jul: java.util.List[Int] = ArrayBuffer(1, 2, 3).asJava
jul: java.util.List[Int] = [1, 2, 3]
You should try the following:
import scala.collection.JavaConverters._
def returnWarnings(input: JSONObject): java.util.List[String] = {
val warningsArray = input.getJSONArray("warnings")
val output = (0 until warningsArray.length).map { i =>
val warning = warningsArray.getJSONObject(i)
warning.getString("warning_type") + ": " + warning.getString("message")
}
output.asJava
}
That final conversion could be done implicitly (without invoking .asJava), by importing scala.collection.JavaConversions._
I have a code blurb thats doing some reflection on scala class files and looking for annotations,something like
// Create a new class loader with the directory
val cl = new URLClassLoader(allpaths.toArray)
getTypeNamesFromPath(sourceDir).map(s => {
val cls = cl.loadClass(s)
cls.getAnnotations.map(an =>
an match {
case q: javax.ws.rs.Path => println("found annotation")
case _ =>
})
})
This code prints "found annotation" when assembling this in a jar and running using java -jar, but doesn't print anything when running from sbt run.
sbt version is 13.8,
scala version 2.11.7
for completeness
private def getTypeNamesFromPath(file: File, currentPath: mutable.Stack[String] = new mutable.Stack[String]()): List[String] = {
if (file.isDirectory) {
var list = List[String]()
for (f <- file.listFiles()) {
currentPath.push(f.getName)
list = list ++ getTypeNamesFromPath(f, currentPath)
}
return list
}
if (currentPath.isEmpty)
throw new IllegalArgumentException(file.getAbsolutePath)
currentPath.pop()
if (file.getName.endsWith(".class")) {
return List(currentPath.foldRight("")((s, b) => if (b.isEmpty) s else b + "." + s) + "." + file.getName.stripSuffix(".class"))
}
return List()
}
so it turns out i need two things...
use URLClassLoader from scala.tools.nsc.util.ScalaClassLoader.URLClassLoader, this will solve loading the right classes.
set the fork := true in build.sbt, because once you do step #1, you will run into issues with class name clashes because sbt runs with own class paths and own version of class loader, more on it here http://www.scala-sbt.org/0.13.0/docs/Detailed-Topics/Forking.html
I've run into an issue with attempting to parse json in my spark job. I'm using spark 1.1.0, json4s, and the Cassandra Spark Connector, with DSE 4.6. The exception thrown is:
org.json4s.package$MappingException: Can't find constructor for BrowserData org.json4s.reflect.ScalaSigReader$.readConstructor(ScalaSigReader.scala:27)
org.json4s.reflect.Reflector$ClassDescriptorBuilder.ctorParamType(Reflector.scala:108)
org.json4s.reflect.Reflector$ClassDescriptorBuilder$$anonfun$6.apply(Reflector.scala:98)
org.json4s.reflect.Reflector$ClassDescriptorBuilder$$anonfun$6.apply(Reflector.scala:95)
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
My code looks like this:
case class BrowserData(navigatorObjectData: Option[NavigatorObjectData],
flash_version: Option[FlashVersion],
viewport: Option[Viewport],
performanceData: Option[PerformanceData])
.... other case classes
def parseJson(b: Option[String]): Option[String] = {
implicit val formats = DefaultFormats
for {
browserDataStr <- b
browserData = parse(browserDataStr).extract[BrowserData]
navObject <- browserData.navigatorObjectData
userAgent <- navObject.userAgent
} yield (userAgent)
}
def getJavascriptUa(rows: Iterable[com.datastax.spark.connector.CassandraRow]): Option[String] = {
implicit val formats = DefaultFormats
rows.collectFirst { case r if r.getStringOption("browser_data").isDefined =>
parseJson(r.getStringOption("browser_data"))
}.flatten
}
def getRequestUa(rows: Iterable[com.datastax.spark.connector.CassandraRow]): Option[String] = {
rows.collectFirst { case r if r.getStringOption("ua").isDefined =>
r.getStringOption("ua")
}.flatten
}
def checkUa(rows: Iterable[com.datastax.spark.connector.CassandraRow], sessionId: String): Option[Boolean] = {
for {
jsUa <- getJavascriptUa(rows)
reqUa <- getRequestUa(rows)
} yield (jsUa == reqUa)
}
def run(name: String) = {
val rdd = sc.cassandraTable("beehive", name).groupBy(r => r.getString("session_id"))
val counts = rdd.map(r => (checkUa(r._2, r._1)))
counts
}
I use :load to load the file into the REPL, and then call the run function. The failure is happening in the parseJson function, as far as I can tell. I've tried a variety of things to try to get this to work. From similar posts, I've made sure my case classes are in the top level in the file. I've tried compiling just the case class definitions into a jar, and including the jar in like this: /usr/bin/dse spark --jars case_classes.jar
I've tried adding them to the conf like this: sc.getConf.setJars(Seq("/home/ubuntu/case_classes.jar"))
And still the same error. Should I compile all of my code into a jar? Is this a spark issue or a JSON4s issue? Any help at all appreciated.
I'm trying to link classes from the JDK into the scaladoc-generated doc.
I've used the -doc-external-doc option of scaladoc 2.10.1 but without success.
I'm using -doc-external-doc:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/rt.jar#http://docs.oracle.com/javase/7/docs/api/, but I get links such as index.html#java.io.File instead of index.html?java/io/File.html.
Seems like this option only works for scaladoc-generated doc.
Did I miss an option in scaladoc or should I fill a feature request?
I've configured sbt as follows:
scalacOptions in (Compile,doc) += "-doc-external-doc:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/rt.jar#http://docs.oracle.com/javase/7/docs/api"
Note: I've seen the Opts.doc.externalAPI util in the upcoming sbt 0.13. I think a nice addition (not sure if it's possible) would be to pass a ModuleID instead of a File. The util would figure out which file corresponds to the ModuleID.
I use sbt 0.13.5.
There's no out-of-the-box way to have the feature of having Javadoc links inside scaladoc. And as my understanding goes, it's not sbt's fault, but the way scaladoc works. As Josh pointed out in his comment You should report to scaladoc.
There's however a workaround I came up with - postprocess the doc-generated scaladoc so the Java URLs get replaced to form proper Javadoc links.
The file scaladoc.sbt should be placed inside a sbt project and whenever doc task gets executed, the postprocessing via fixJavaLinksTask task kicks in.
NOTE There are lots of hardcoded paths so use it with caution (aka do the polishing however you see fit).
import scala.util.matching.Regex.Match
autoAPIMappings := true
// builds -doc-external-doc
apiMappings += (
file("/Library/Java/JavaVirtualMachines/jdk1.8.0_11.jdk/Contents/Home/jre/lib/rt.jar") ->
url("http://docs.oracle.com/javase/8/docs/api")
)
lazy val fixJavaLinksTask = taskKey[Unit](
"Fix Java links - replace #java.io.File with ?java/io/File.html"
)
fixJavaLinksTask := {
println("Fixing Java links")
val t = (target in (Compile, doc)).value
(t ** "*.html").get.filter(hasJavadocApiLink).foreach { f =>
println("fixing " + f)
val newContent = javadocApiLink.replaceAllIn(IO.read(f), fixJavaLinks)
IO.write(f, newContent)
}
}
val fixJavaLinks: Match => String = m =>
m.group(1) + "?" + m.group(2).replace(".", "/") + ".html"
val javadocApiLink = """\"(http://docs\.oracle\.com/javase/8/docs/api/index\.html)#([^"]*)\"""".r
def hasJavadocApiLink(f: File): Boolean = (javadocApiLink findFirstIn IO.read(f)).nonEmpty
fixJavaLinksTask <<= fixJavaLinksTask triggeredBy (doc in Compile)
I took the answer by #jacek-laskowski and modified it so that it avoid hard-coded strings and could be used for any number of Java libraries, not just the standard one.
Edit: the location of rt.jar is now determined from the runtime using sun.boot.class.path and does not have to be hard coded.
The only thing you need to modify is the map, which I have called externalJavadocMap in the following:
import scala.util.matching.Regex
import scala.util.matching.Regex.Match
val externalJavadocMap = Map(
"owlapi" -> "http://owlcs.github.io/owlapi/apidocs_4_0_2/index.html"
)
/*
* The rt.jar file is located in the path stored in the sun.boot.class.path system property.
* See the Oracle documentation at http://docs.oracle.com/javase/6/docs/technotes/tools/findingclasses.html.
*/
val rtJar: String = System.getProperty("sun.boot.class.path").split(java.io.File.pathSeparator).collectFirst {
case str: String if str.endsWith(java.io.File.separator + "rt.jar") => str
}.get // fail hard if not found
val javaApiUrl: String = "http://docs.oracle.com/javase/8/docs/api/index.html"
val allExternalJavadocLinks: Seq[String] = javaApiUrl +: externalJavadocMap.values.toSeq
def javadocLinkRegex(javadocURL: String): Regex = ("""\"(\Q""" + javadocURL + """\E)#([^"]*)\"""").r
def hasJavadocLink(f: File): Boolean = allExternalJavadocLinks exists {
javadocURL: String =>
(javadocLinkRegex(javadocURL) findFirstIn IO.read(f)).nonEmpty
}
val fixJavaLinks: Match => String = m =>
m.group(1) + "?" + m.group(2).replace(".", "/") + ".html"
/* You can print the classpath with `show compile:fullClasspath` in the SBT REPL.
* From that list you can find the name of the jar for the managed dependency.
*/
lazy val documentationSettings = Seq(
apiMappings ++= {
// Lookup the path to jar from the classpath
val classpath = (fullClasspath in Compile).value
def findJar(nameBeginsWith: String): File = {
classpath.find { attributed: Attributed[File] => (attributed.data ** s"$nameBeginsWith*.jar").get.nonEmpty }.get.data // fail hard if not found
}
// Define external documentation paths
(externalJavadocMap map {
case (name, javadocURL) => findJar(name) -> url(javadocURL)
}) + (file(rtJar) -> url(javaApiUrl))
},
// Override the task to fix the links to JavaDoc
doc in Compile <<= (doc in Compile) map {
target: File =>
(target ** "*.html").get.filter(hasJavadocLink).foreach { f =>
//println(s"Fixing $f.")
val newContent: String = allExternalJavadocLinks.foldLeft(IO.read(f)) {
case (oldContent: String, javadocURL: String) =>
javadocLinkRegex(javadocURL).replaceAllIn(oldContent, fixJavaLinks)
}
IO.write(f, newContent)
}
target
}
)
I am using SBT 0.13.8.
I have an application that I'd like to have a prompt in. If it helps, this is a graph database implementation and I need a prompt just like any other database client (MySQL, Postgresql, etc.).
So far I have my own REPL like so:
object App extends Application {
REPL ! Read
}
object REPL extends Actor {
def act() {
loop {
react {
case Read => {
print("prompt> ")
var message = Console.readLine
this ! Eval(message)
}
case More(sofar) => {
//Eval didn't see a semicolon
print(" --> ")
var message = Console.readLine
this ! Eval(sofar + " " + message)
}
case Eval(message) => {
Evaluator ! Eval(message)
}
case Print(message) => {
println(message)
//And here's the loop
this ! Read
}
case Exit => {
exit()
}
case _ => {
println("App: How did we get here")
}
}
}
}
this.start
}
It works, but I would really like to have something with history. Tab completion is not necessary.
Any suggestions on a good library? Scala or Java works.
Just to be clear I don't need an REPL to evaluate my code (I get that with scala!), nor am I looking to call or use something from the command line. I'm looking for a prompt that is my user experience when my client app starts up.
Scala itself, and lots of programs out there, uses a readline-like library for its REPL. Specifically, JLine.
I found another question about this, for which the answers don't seem promising.
BeanShell does some of what you want: http://www.beanshell.org/
I got it. these two blogs really helped.
http://danielwestheide.com/blog/2013/01/09/the-neophytes-guide-to-scala-part-8-welcome-to-the-future.html
http://danielwestheide.com/blog/2013/01/16/the-neophytes-guide-to-scala-part-9-promises-and-futures-in-practice.html
def interprete(code: String) : Future[String] = {
val p = Promise[String]()
Future {
var result = reader.readLine()
p.success(result)
}
writer.write(code + "\n")
writer.flush()
p.future
}
for (ln <- io.Source.stdin.getLines){
val f = interprete(ln)
f.onComplete {
case Success(s) =>
println("future returned: " + s)
case Failure(ex) =>
println(s"interpreter failed due to ${ex.getMessage}")
}
}