Delete directory recursively in Scala - java

I am writing the following (with Scala 2.10 and Java 6):
import java.io._
def delete(file: File) {
if (file.isDirectory)
Option(file.listFiles).map(_.toList).getOrElse(Nil).foreach(delete(_))
file.delete
}
How would you improve it ? The code seems working but it ignores the return value of java.io.File.delete. Can it be done easier with scala.io instead of java.io ?

With pure scala + java way
import scala.reflect.io.Directory
import java.io.File
val directory = new Directory(new File("/sampleDirectory"))
directory.deleteRecursively()
deleteRecursively() Returns false on failure

Try this code that throws an exception if it fails:
def deleteRecursively(file: File): Unit = {
if (file.isDirectory) {
file.listFiles.foreach(deleteRecursively)
}
if (file.exists && !file.delete) {
throw new Exception(s"Unable to delete ${file.getAbsolutePath}")
}
}
You could also fold or map over the delete if you want to return a value for all the deletes.

Using scala IO
import scalax.file.Path
val path = Path.fromString("/tmp/testfile")
try {
path.deleteRecursively(continueOnFailure = false)
} catch {
case e: IOException => // some file could not be deleted
}
or better, you could use a Try
val path: Path = Path ("/tmp/file")
Try(path.deleteRecursively(continueOnFailure = false))
which will either result in a Success[Int] containing the number of files deleted, or a Failure[IOException].

From
http://alvinalexander.com/blog/post/java/java-io-faq-how-delete-directory-tree
Using Apache Common IO
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.filefilter.WildcardFileFilter;
public void deleteDirectory(String directoryName)
throws IOException
{
try
{
FileUtils.deleteDirectory(new File(directoryName));
}
catch (IOException ioe)
{
// log the exception here
ioe.printStackTrace();
throw ioe;
}
}
The Scala one can just do this...
import org.apache.commons.io.FileUtils
import org.apache.commons.io.filefilter.WildcardFileFilter
FileUtils.deleteDirectory(new File(outputFile))
Maven Repo Imports

Using the Java NIO.2 API:
import java.nio.file.{Files, Paths, Path, SimpleFileVisitor, FileVisitResult}
import java.nio.file.attribute.BasicFileAttributes
def remove(root: Path): Unit = {
Files.walkFileTree(root, new SimpleFileVisitor[Path] {
override def visitFile(file: Path, attrs: BasicFileAttributes): FileVisitResult = {
Files.delete(file)
FileVisitResult.CONTINUE
}
override def postVisitDirectory(dir: Path, exc: IOException): FileVisitResult = {
Files.delete(dir)
FileVisitResult.CONTINUE
}
})
}
remove(Paths.get("/tmp/testdir"))
Really, it's a pity that the NIO.2 API is with us for so many years and yet few people are using it, even though it is really superior to the old File API.

Expanding on Vladimir Matveev's NIO2 solution:
object Util {
import java.io.IOException
import java.nio.file.{Files, Paths, Path, SimpleFileVisitor, FileVisitResult}
import java.nio.file.attribute.BasicFileAttributes
def remove(root: Path, deleteRoot: Boolean = true): Unit =
Files.walkFileTree(root, new SimpleFileVisitor[Path] {
override def visitFile(file: Path, attributes: BasicFileAttributes): FileVisitResult = {
Files.delete(file)
FileVisitResult.CONTINUE
}
override def postVisitDirectory(dir: Path, exception: IOException): FileVisitResult = {
if (deleteRoot) Files.delete(dir)
FileVisitResult.CONTINUE
}
})
def removeUnder(string: String): Unit = remove(Paths.get(string), deleteRoot=false)
def removeAll(string: String): Unit = remove(Paths.get(string))
def removeUnder(file: java.io.File): Unit = remove(file.toPath, deleteRoot=false)
def removeAll(file: java.io.File): Unit = remove(file.toPath)
}

Using java 6 without using dependencies this is pretty much the only way to do so.
The problem with your function is that it return Unit (which I btw would explicit note it using def delete(file: File): Unit = {
I took your code and modify it to return map from file name to the deleting status.
def delete(file: File): Array[(String, Boolean)] = {
Option(file.listFiles).map(_.flatMap(f => delete(f))).getOrElse(Array()) :+ (file.getPath -> file.delete)
}

To add to Slavik Muz's answer:
def deleteFile(file: File): Boolean = {
def childrenOf(file: File): List[File] = Option(file.listFiles()).getOrElse(Array.empty).toList
#annotation.tailrec
def loop(files: List[File]): Boolean = files match {
case Nil ⇒ true
case child :: parents if child.isDirectory && child.listFiles().nonEmpty ⇒
loop((childrenOf(child) :+ child) ++ parents)
case fileOrEmptyDir :: rest ⇒
println(s"deleting $fileOrEmptyDir")
fileOrEmptyDir.delete()
loop(rest)
}
if (!file.exists()) false
else loop(childrenOf(file) :+ file)
}

This one uses java.io but one can delete directories matching it with wildcard string which may or may not contain any content within it.
for (file <- new File("<path as String>").listFiles;
if( file.getName() matches("[1-9]*"))) FileUtils.deleteDirectory(file)
Directory structure e.g.
* A/1/, A/2/, A/300/ ... thats why the regex String: [1-9]*, couldn't find a File API in scala which supports regex(may be i missed something).

Getting little lengthy, but here's one that combines the recursivity of Garrette's solution with the npe-safety of the original question.
def deleteFile(path: String) = {
val penultimateFile = new File(path.split('/').take(2).mkString("/"))
def getFiles(f: File): Set[File] = {
Option(f.listFiles)
.map(a => a.toSet)
.getOrElse(Set.empty)
}
def getRecursively(f: File): Set[File] = {
val files = getFiles(f)
val subDirectories = files.filter(path => path.isDirectory)
subDirectories.flatMap(getRecursively) ++ files + penultimateFile
}
getRecursively(penultimateFile).foreach(file => {
if (getFiles(file).isEmpty && file.getAbsoluteFile().exists) file.delete
})
}

This is recursive method that clean all in directory, and return count of deleted files
def cleanDir(dir: File): Int = {
#tailrec
def loop(list: Array[File], deletedFiles: Int): Int = {
if (list.isEmpty) deletedFiles
else {
if (list.head.isDirectory && !list.head.listFiles().isEmpty) {
loop(list.head.listFiles() ++ list.tail ++ Array(list.head), deletedFiles)
} else {
val isDeleted = list.head.delete()
if (isDeleted) loop(list.tail, deletedFiles + 1)
else loop(list.tail, deletedFiles)
}
}
}
loop(dir.listFiles(), 0)
}

What I ended up with
def deleteRecursively(f: File): Boolean = {
if (f.isDirectory) f.listFiles match {
case files: Array[File] => files.foreach(deleteRecursively)
case null =>
}
f.delete()
}

os-lib makes it easy to delete recursively with a one-liner:
os.remove.all(os.pwd/"dogs")
os-lib uses java.nio under the hood, just doesn't expose all the Java ugliness. See here for more info on how to use the library.

You can do this by excute external system commands.
import sys.process._
def delete(path: String) = {
s"""rm -rf ${path}""".!!
}

Related

ExifTool: Removing all metadata of an image except one in java application

I want to remove all the metadata of an image except "Copyright".I am using exiftool of this .The command for this is "exiftool -all= -tagsFromFile # -copyright Tunis_Bab_Souika_1899.jpg".But when I do this in java application .It somehow deletes all the tags.
here is the code snippet -
val outputConsumer = ArrayListOutputConsumer()
val exiftoolCmd = ExiftoolCmd()
exiftoolCmd.setOutputConsumer(outputConsumer)
val operation = ETOperation()
println("File name" + sourceImage.toFile().absolutePath)
operation.addImage(sourceImage.toFile().absolutePath)
// exiftool -all= -tagsFromFile # -copyright Tunis_Bab_Souika_1899.jpg
operation.addRawArgs("-all=")
operation.addRawArgs("-tagsFromFile #")
operation.addRawArgs("-copyright")
println("About to execute")
try {exiftoolCmd.run(operation)
println("Inside try")
} catch (e: java.lang.Exception) {
throw RuntimeException(e)
}
val output = outputConsumer.output
.stream()
.map { obj: String -> obj.trim { it <= ' ' } }
.collect(toList())
println("ooutput$output")
Answering my question - This worked for me-
operation.delTags("all")
operation.tagsFromFile("#")
operation.getTags("copyright")
operation.addImage(sourceImage.toFile().absolutePath)

Spark-Scala Unable to infer schema (Defer input path validation into DataSource)

SPARK-26039
While loading empty orc folder. Anyways to bypass this.
val df = spark.read.format("orc").load(orcFolderPath)
org.apache.spark.sql.AnalysisException: Unable to infer schema for ORC. It must be specified manually.;
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:185)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:185)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:184)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
... 49 elided
Getting this error may be orc reader trying to infer schema but i want to bypass this special case when somehow in repository blank folder came up but has to be checked.
try {
spark.read.format("orc").load(path)
} catch {
case ex: org.apache.spark.sql.AnalysisException => {
null
}
}
Tried by this way to catch exception. Any other way would be helpful
got one more solution seems like...this is also not best one...
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}
def pathStatus(path: String): Boolean = {
val config: Configuration = new Configuration()
val fs: FileSystem = FileSystem.get(config)
if (fs.globStatus(new Path(path)) == null) {
false
} else {
true
}
}

scala matchers works from jar not from sbt

I have a code blurb thats doing some reflection on scala class files and looking for annotations,something like
// Create a new class loader with the directory
val cl = new URLClassLoader(allpaths.toArray)
getTypeNamesFromPath(sourceDir).map(s => {
val cls = cl.loadClass(s)
cls.getAnnotations.map(an =>
an match {
case q: javax.ws.rs.Path => println("found annotation")
case _ =>
})
})
This code prints "found annotation" when assembling this in a jar and running using java -jar, but doesn't print anything when running from sbt run.
sbt version is 13.8,
scala version 2.11.7
for completeness
private def getTypeNamesFromPath(file: File, currentPath: mutable.Stack[String] = new mutable.Stack[String]()): List[String] = {
if (file.isDirectory) {
var list = List[String]()
for (f <- file.listFiles()) {
currentPath.push(f.getName)
list = list ++ getTypeNamesFromPath(f, currentPath)
}
return list
}
if (currentPath.isEmpty)
throw new IllegalArgumentException(file.getAbsolutePath)
currentPath.pop()
if (file.getName.endsWith(".class")) {
return List(currentPath.foldRight("")((s, b) => if (b.isEmpty) s else b + "." + s) + "." + file.getName.stripSuffix(".class"))
}
return List()
}
so it turns out i need two things...
use URLClassLoader from scala.tools.nsc.util.ScalaClassLoader.URLClassLoader, this will solve loading the right classes.
set the fork := true in build.sbt, because once you do step #1, you will run into issues with class name clashes because sbt runs with own class paths and own version of class loader, more on it here http://www.scala-sbt.org/0.13.0/docs/Detailed-Topics/Forking.html

JSON4s can't find constructor w/spark

I've run into an issue with attempting to parse json in my spark job. I'm using spark 1.1.0, json4s, and the Cassandra Spark Connector, with DSE 4.6. The exception thrown is:
org.json4s.package$MappingException: Can't find constructor for BrowserData org.json4s.reflect.ScalaSigReader$.readConstructor(ScalaSigReader.scala:27)
org.json4s.reflect.Reflector$ClassDescriptorBuilder.ctorParamType(Reflector.scala:108)
org.json4s.reflect.Reflector$ClassDescriptorBuilder$$anonfun$6.apply(Reflector.scala:98)
org.json4s.reflect.Reflector$ClassDescriptorBuilder$$anonfun$6.apply(Reflector.scala:95)
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
My code looks like this:
case class BrowserData(navigatorObjectData: Option[NavigatorObjectData],
flash_version: Option[FlashVersion],
viewport: Option[Viewport],
performanceData: Option[PerformanceData])
.... other case classes
def parseJson(b: Option[String]): Option[String] = {
implicit val formats = DefaultFormats
for {
browserDataStr <- b
browserData = parse(browserDataStr).extract[BrowserData]
navObject <- browserData.navigatorObjectData
userAgent <- navObject.userAgent
} yield (userAgent)
}
def getJavascriptUa(rows: Iterable[com.datastax.spark.connector.CassandraRow]): Option[String] = {
implicit val formats = DefaultFormats
rows.collectFirst { case r if r.getStringOption("browser_data").isDefined =>
parseJson(r.getStringOption("browser_data"))
}.flatten
}
def getRequestUa(rows: Iterable[com.datastax.spark.connector.CassandraRow]): Option[String] = {
rows.collectFirst { case r if r.getStringOption("ua").isDefined =>
r.getStringOption("ua")
}.flatten
}
def checkUa(rows: Iterable[com.datastax.spark.connector.CassandraRow], sessionId: String): Option[Boolean] = {
for {
jsUa <- getJavascriptUa(rows)
reqUa <- getRequestUa(rows)
} yield (jsUa == reqUa)
}
def run(name: String) = {
val rdd = sc.cassandraTable("beehive", name).groupBy(r => r.getString("session_id"))
val counts = rdd.map(r => (checkUa(r._2, r._1)))
counts
}
I use :load to load the file into the REPL, and then call the run function. The failure is happening in the parseJson function, as far as I can tell. I've tried a variety of things to try to get this to work. From similar posts, I've made sure my case classes are in the top level in the file. I've tried compiling just the case class definitions into a jar, and including the jar in like this: /usr/bin/dse spark --jars case_classes.jar
I've tried adding them to the conf like this: sc.getConf.setJars(Seq("/home/ubuntu/case_classes.jar"))
And still the same error. Should I compile all of my code into a jar? Is this a spark issue or a JSON4s issue? Any help at all appreciated.

How to link classes from JDK into scaladoc-generated doc?

I'm trying to link classes from the JDK into the scaladoc-generated doc.
I've used the -doc-external-doc option of scaladoc 2.10.1 but without success.
I'm using -doc-external-doc:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/rt.jar#http://docs.oracle.com/javase/7/docs/api/, but I get links such as index.html#java.io.File instead of index.html?java/io/File.html.
Seems like this option only works for scaladoc-generated doc.
Did I miss an option in scaladoc or should I fill a feature request?
I've configured sbt as follows:
scalacOptions in (Compile,doc) += "-doc-external-doc:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/rt.jar#http://docs.oracle.com/javase/7/docs/api"
Note: I've seen the Opts.doc.externalAPI util in the upcoming sbt 0.13. I think a nice addition (not sure if it's possible) would be to pass a ModuleID instead of a File. The util would figure out which file corresponds to the ModuleID.
I use sbt 0.13.5.
There's no out-of-the-box way to have the feature of having Javadoc links inside scaladoc. And as my understanding goes, it's not sbt's fault, but the way scaladoc works. As Josh pointed out in his comment You should report to scaladoc.
There's however a workaround I came up with - postprocess the doc-generated scaladoc so the Java URLs get replaced to form proper Javadoc links.
The file scaladoc.sbt should be placed inside a sbt project and whenever doc task gets executed, the postprocessing via fixJavaLinksTask task kicks in.
NOTE There are lots of hardcoded paths so use it with caution (aka do the polishing however you see fit).
import scala.util.matching.Regex.Match
autoAPIMappings := true
// builds -doc-external-doc
apiMappings += (
file("/Library/Java/JavaVirtualMachines/jdk1.8.0_11.jdk/Contents/Home/jre/lib/rt.jar") ->
url("http://docs.oracle.com/javase/8/docs/api")
)
lazy val fixJavaLinksTask = taskKey[Unit](
"Fix Java links - replace #java.io.File with ?java/io/File.html"
)
fixJavaLinksTask := {
println("Fixing Java links")
val t = (target in (Compile, doc)).value
(t ** "*.html").get.filter(hasJavadocApiLink).foreach { f =>
println("fixing " + f)
val newContent = javadocApiLink.replaceAllIn(IO.read(f), fixJavaLinks)
IO.write(f, newContent)
}
}
val fixJavaLinks: Match => String = m =>
m.group(1) + "?" + m.group(2).replace(".", "/") + ".html"
val javadocApiLink = """\"(http://docs\.oracle\.com/javase/8/docs/api/index\.html)#([^"]*)\"""".r
def hasJavadocApiLink(f: File): Boolean = (javadocApiLink findFirstIn IO.read(f)).nonEmpty
fixJavaLinksTask <<= fixJavaLinksTask triggeredBy (doc in Compile)
I took the answer by #jacek-laskowski and modified it so that it avoid hard-coded strings and could be used for any number of Java libraries, not just the standard one.
Edit: the location of rt.jar is now determined from the runtime using sun.boot.class.path and does not have to be hard coded.
The only thing you need to modify is the map, which I have called externalJavadocMap in the following:
import scala.util.matching.Regex
import scala.util.matching.Regex.Match
val externalJavadocMap = Map(
"owlapi" -> "http://owlcs.github.io/owlapi/apidocs_4_0_2/index.html"
)
/*
* The rt.jar file is located in the path stored in the sun.boot.class.path system property.
* See the Oracle documentation at http://docs.oracle.com/javase/6/docs/technotes/tools/findingclasses.html.
*/
val rtJar: String = System.getProperty("sun.boot.class.path").split(java.io.File.pathSeparator).collectFirst {
case str: String if str.endsWith(java.io.File.separator + "rt.jar") => str
}.get // fail hard if not found
val javaApiUrl: String = "http://docs.oracle.com/javase/8/docs/api/index.html"
val allExternalJavadocLinks: Seq[String] = javaApiUrl +: externalJavadocMap.values.toSeq
def javadocLinkRegex(javadocURL: String): Regex = ("""\"(\Q""" + javadocURL + """\E)#([^"]*)\"""").r
def hasJavadocLink(f: File): Boolean = allExternalJavadocLinks exists {
javadocURL: String =>
(javadocLinkRegex(javadocURL) findFirstIn IO.read(f)).nonEmpty
}
val fixJavaLinks: Match => String = m =>
m.group(1) + "?" + m.group(2).replace(".", "/") + ".html"
/* You can print the classpath with `show compile:fullClasspath` in the SBT REPL.
* From that list you can find the name of the jar for the managed dependency.
*/
lazy val documentationSettings = Seq(
apiMappings ++= {
// Lookup the path to jar from the classpath
val classpath = (fullClasspath in Compile).value
def findJar(nameBeginsWith: String): File = {
classpath.find { attributed: Attributed[File] => (attributed.data ** s"$nameBeginsWith*.jar").get.nonEmpty }.get.data // fail hard if not found
}
// Define external documentation paths
(externalJavadocMap map {
case (name, javadocURL) => findJar(name) -> url(javadocURL)
}) + (file(rtJar) -> url(javaApiUrl))
},
// Override the task to fix the links to JavaDoc
doc in Compile <<= (doc in Compile) map {
target: File =>
(target ** "*.html").get.filter(hasJavadocLink).foreach { f =>
//println(s"Fixing $f.")
val newContent: String = allExternalJavadocLinks.foldLeft(IO.read(f)) {
case (oldContent: String, javadocURL: String) =>
javadocLinkRegex(javadocURL).replaceAllIn(oldContent, fixJavaLinks)
}
IO.write(f, newContent)
}
target
}
)
I am using SBT 0.13.8.

Categories