Good morning. I am creating a spark application with scala. Seven
You must run shared library (libfreebayes.so) in a distributed node environment. libfreebayes.so runs an external program written in c ++ called freebayes. However, the following errors occur:
java.lang.UnsatisfiedLinkError: Native Library /usr/lib/libfreebayes.so already loaded in another classloader
The CreateFreebayesInput method must be done on a partition-by-partition basis. Is there a problem loading libfreebayes.so for each partition? This application works properly in spark local mode. How do I get it to work in yarn-cluster mode? I can not sleep because of this problem. Help me. :-<
import java.io.{File, FileReader, FileWriter, PrintWriter}
import org.apache.spark.rdd.RDD
import org.apache.spark.{HashPartitioner, SparkConf, SparkContext}
object sparkFreebayes {
def main(args : Array[String]) {
val conf = new SparkConf().setAppName("sparkFreebayes")
val sc = new SparkContext(conf)
val appId = sc.applicationId
val appName = sc.appName
val referencePath="/mnt/data/hg38.fa"
val input = sc.textFile(args(0))
val executerNum = args(1).toInt
val outputDir = args(2)
val inputDir = "/mnt/partitionedSam/"
val header = input.filter(x => x.startsWith("#"))
val body = input.filter(x => !x.startsWith("#"))
val partitioned = body.map{x => (x.split("\t")(2),x)}.repartitionAndSortWithinPartitions(new tmpPartitioner(executerNum)).persist()
val cHeader = header.collect.mkString("\n")
val sorted = partitioned.map( x => (x._2) )
CreateFreebayesInput(sorted)
def CreateFreebayesInput(sortedRDD : RDD[String]) = {
sortedRDD.mapPartitionsWithIndex { (idx, iter) =>
val tmp = iter.toList
val outputPath = outputDir+"/"+appId+"_Out_"+idx+".vcf"
val tmp2 = List(cHeader) ++ tmp
val samString = tmp2.mkString("\n")
val jni = new FreeBayesJni
val file = new File(inputDir + "partitioned_" + idx + ".sam")
val fw = new FileWriter(file)
fw.write(samString)
fw.close()
if (file.exists() || file.length()!=0) {
System.loadLibrary("freebayes")
val freebayesParameter = Array("-f","/mnt/data/hg38.fa",file.getPath,"-file",outputPath)
jni.freebayes_native(freebayesParameter.length,freebayesParameter)
//runFreebayes(file.getPath, referencePath, outputPath )
}
tmp2.productIterator
}
}.collect()
}
}
FreeBayesJni class is next :
class FreeBayesJni {
#native def freebayes_native(argc: Int, args: Array[String]): Int;
}
my spark-submit command:
spark-submit --class partitioning --master yarn-cluster ScalaMvnProject.jar FullOutput_sorted.sam 7 /mnt/OutVcf
thank you.
Related
I want to convert "Any" object into an object of runtime type. Based on the class name (string) at the runtime, how do I convert an ANY object to actual object?
I tried using converting class name into class object using Class.forName
val clazz = Class.forName("my.package.Animal")
val any: Any = Animal(1, "simba")
any.asInstanceOf[clazz] // Compilation Error // Looking for a solution
Try to use
compiler toolbox
package my.package
import scala.tools.reflect.ToolBox
import scala.reflect.runtime.universe._
case class Animal(id: Int, name: String)
object App {
val any: Any = Animal(1, "simba")
val className = "my.package.Animal"
val mirror = runtimeMirror(getClass.getClassLoader)
val tb = mirror.mkToolBox()
tb.eval(tb.parse(
s"""
import my.package.App._
val animal = any.asInstanceOf[$className]
println(animal.id)
println(animal.name)
"""))
}
libraryDependencies += scalaOrganization.value % "scala-reflect" % scalaVersion.value
libraryDependencies += scalaOrganization.value % "scala-compiler" % scalaVersion.value
or Scala reflection
import scala.reflect.runtime.universe._
val mirror = runtimeMirror(getClass.getClassLoader)
val classSymbol = mirror.staticClass(className)
val typ = classSymbol.toType
val idMethodSymbol = typ.decl(TermName("id")).asMethod
val nameMethodSymbol = typ.decl(TermName("name")).asMethod
val instanceMirror = mirror.reflect(any)
val idMethodMirror = instanceMirror.reflectMethod(idMethodSymbol)
val nameMethodMirror = instanceMirror.reflectMethod(nameMethodSymbol)
println(idMethodMirror())
println(nameMethodMirror())
libraryDependencies += scalaOrganization.value % "scala-reflect" % scalaVersion.value
or Java reflection
val clazz = Class.forName(className)
val idMethod = clazz.getMethod("id")
val nameMethod = clazz.getMethod("name")
println(idMethod.invoke(any))
println(nameMethod.invoke(any))
Running test and the ordering of table3 is always different so AssertEquals doesn't work.
val expectedDataSet = new CsvDataSet(new File(BatchJobIntegrationTest.getTestResource("folder/expected/")))
val actualDataSet = connection.createDataSet(Array(
"table1",
"table2",
"table3"
))
Assertion.assertEquals(expectedDataSet, actualDataSet)
Tried but didn't work:
Assertion.assertEquals(new SortedDataSet(expectedDataSet), new SortedDataSet(actualDataSet))
Turns out the primary key was the issue, this helped:
val expectedTable = expectedDataSet.getTable("table")
val actualTable = actualDataSet.getTable("table")
val actualFilteredTable = DefaultColumnFilter.excludedColumnsTable(actualTable, Array("table_id"))
val expectedFilteredTable = DefaultColumnFilter.excludedColumnsTable(expectedTable, Array("table_id"))
//Assertion.assertEquals(expectedTable, actualFilteredTable)
val expectedColumns = expectedFilteredTable.getTableMetaData().getColumns()
val sortedExpected = new SortedTable(expectedFilteredTable, expectedColumns)
val sortedActual = new SortedTable(actualFilteredTable, expectedColumns)
Assertion.assertEquals(sortedExpected, sortedActual)
Is it possible to fail my request?
I would like to put Status = KO in asLongAs() section. My condition is like, if I get WorkflowFailed = True or Count > 8 then I want to fail that request using Status = KO.
I have seen somewhere about session.markAsFailed but how and where to use this?
Thanks.
Here is the code,
class LaunchResources extends Simulation {
val scenarioRepeatCount = Integer.getInteger("scenarioRepeatCount", 1).toInt
val userCount = Integer.getInteger("userCount", 1).toInt
val UUID = System.getProperty("UUID", "24d0e03")
val username = System.getProperty("username", "p1")
val password = System.getProperty("password", "P12")
val testServerUrl = System.getProperty("testServerUrl", "https://someurl.net")
val count = new java.util.concurrent.atomic.AtomicInteger(0)
val httpProtocol = http
.baseURL(testServerUrl)
.basicAuth(username, password)
.connection("""keep-alive""")
.contentTypeHeader("""application/vnd+json""")
val headers_0 = Map(
"""Cache-Control""" -> """no-cache""",
"""Origin""" -> """chrome-extension://fdmmgasdw1dojojpjoooidkmcomcm""")
val scn = scenario("LaunchAction")
.repeat (scenarioRepeatCount) {
exec(http("LaunchAResources")
.post( """/api/actions""")
.headers(headers_0)
.body(StringBody(s"""{"UUID": "$UUID", "stringVariables" : {"externalFilePath" : "/Test.mp4"}}"""))
.check(jsonPath("$.id").saveAs("WorkflowID")))
.exec(http("SaveWorkflowStatus")
.get("""/api/actions/{$WorkflowID}""")
.headers(headers_0)
.check(jsonPath("$.status").saveAs("WorkflowStatus")))
}
.asLongAs(session => session.attributes("WorkflowStatus") != "false" && count.getAndIncrement() < 8) {
doIf(session => session("WorkflowFailed").validate[String].map(WorkflowFailed => !WorkflowFailed.contains("true")).recover(true))
{
pause(pauseTime)
.exec(http("SaveWorkflowStatus")
.get("""/api/actions/${WorkflowID}""")
.headers(headers_0)
.check(jsonPath("$.running").saveAs("WorkflowStatus"))
.check(jsonPath("$.failed").saveAs("WorkflowFailed")))
.exec(session => {
val wflowStatus1 = session.get("WorkflowStatus").asOption[String]
val wflowFailed1 = session.get("WorkflowFailed").asOption[String]
println("Inner Loop Workflow Status: ========>>>>>>>> " + wflowStatus1.getOrElse("COULD NOT FIND STATUS"))
println("Inner Loop Workflow Failed?? ========>>>>>>>> " + wflowFailed1.getOrElse("COULD NOT FIND STATUS"))
println("Count =====>> " + count)
session})
}
}
setUp(scn.inject(atOnceUsers(userCount))).protocols(httpProtocol)
}
there's a method available on the session
exec(session => session.markAsFailed)
Have Seq[Byte] in scala . How to convert it to java byte[] or Input Stream ?
wouldn't
val a: Seq[Byte] = List()
a.toArray
do the job?
You can copy the contents of a Seq With copyToArray.
val myseq: Seq[Byte] = ???
val myarray = new Array[Byte](myseq.size)
myseq.copyToArray(myarray)
Note that this will iterate through the Seq twice, which may be undesirable, impossible, or just fine, depending on your use.
A sensible option:
val byteSeq: Seq[Byte] = ???
val byteArray: Array[Byte] = bSeq.toArray
val inputStream = java.io.ByteArrayInputStream(byteArray)
A less sensible option:
object HelloWorld {
implicit class ByteSequenceInputStream(val byteSeq: Seq[Byte]) extends java.io.InputStream {
private var pos = 0
val size = byteSeq.size
override def read(): Int = pos match {
case `size` => -1 // backticks match against the value in the variable
case _ => {
val result = byteSeq(pos).toInt
pos = pos + 1
result
}
}
}
val testByteSeq: Seq[Byte] = List(1, 2, 3, 4, 5).map(_.toByte)
def testConversion(in: java.io.InputStream): Unit = {
var done = false
while (! done) {
val result = in.read()
println(result)
done = result == -1
}
}
def main(args: Array[String]): Unit = {
testConversion(testByteSeq)
}
}
Im trying to iterate over a java util.iterator with Scala but am having trouble with casting the objects to the correct class.
I get the error:
type mismatch; found: java.util.Iterator[?0] where type ?0
required : java.util.iterator[net.percederberg.mibble.MibSymbol]
val iter:util.Iterator[MibSymbol] == mib_obj.getAllSymbols.iterator()
the code looks like following:
import java.io.File
import java.util
import net.percederberg.mibble._
import scala.collection.immutable.HashMap
import scala.collection.JavaConversions._
object Bacon {
def main(args:Array[String]) {
println("hello")
val mib_obj:Mib = loadMib(new File("/Users/tjones24/dev/mibs/DOCS-IF-MIB.my"))
val iter:util.Iterator[MibSymbol] = mib_obj.getAllSymbols.iterator()
while(iter.hasNext()) {
var obj:MibSymbol = iter.next()
println(obj.getName())
}
}
def loadMib(file: File): Mib = {
var loader: MibLoader = new MibLoader()
loader.addDir(file.getParentFile())
return loader.load(file)
}
}
Use an explicit typecast asInstanceOf[Iterator[MibSymbol]]:
def main(args: Array[String]) {
println("hello")
val mib_obj: Mib = loadMib(new File("/Users/tjones24/dev/mibs/DOCS-IF-MIB.my"))
val x = mib_obj.getAllSymbols.iterator()
val iter: util.Iterator[MibSymbol] = x.asInstanceOf[Iterator[MibSymbol]]
while (iter.hasNext()) {
var obj: MibSymbol = iter.next()
println(obj.getName())
}
}
def loadMib(file: File): Mib = {
var loader: MibLoader = new MibLoader()
loader.addDir(file.getParentFile())
return loader.load(file)
}
NOTE: In absence of runtime type information, this may fail.
EDIT1: You can also use a for comprehension:
val mib_obj: Mib = loadMib(new File("/Users/tjones24/dev/mibs/DOCS-IF-MIB.my"))
for ( obj <- mib_obj.getAllSymbols) {
println(obj.asInstanceOf[MibSymbol].getName())
}
import scala.collection.JavaConversions._ does all the magic for you. You only need to ensure that the types are correct.