Scala Class Loader for Avro Tools Run Method Written in Java - java

I'm having some difficulty in determining the means for loading an "Avro Tools" class and its run method. The issue is somewhere between java and scala interfacing and class loading methods. Due to the fact that avro is used elsewhere in a Spark app with a different version for loading data files, I need to be able to treat this particular method as a siloed call to another version of avro-tools.
The following is my code:
package samples
import java.io.{ByteArrayOutputStream, InputStream}
import org.junit.runner.RunWith
import org.specs2.mutable._
import org.specs2.runner._
import scala.collection.JavaConverters._
#RunWith(classOf[JUnitRunner])
class MySpecTest extends Specification {
"Class Loader" should {
"load an implement a class" in {
var classLoader = new java.net.URLClassLoader(
Array(new java.io.File("./avro-tools-1.9.1.jar").toURI.toURL),
this.getClass.getClassLoader)
var clazzDFRT = classLoader.loadClass("org.apache.avro.tool.DataFileRepairTool")
val objDFRT = clazzDFRT.getConstructor().newInstance()
val toolCmdArgsAsJava = List("-o", "all", "questionable.avro", "fixed.avro").asJava
val stdin : InputStream = null
val out: ByteArrayOutputStream = new ByteArrayOutputStream
val stdout = new PrintStream(out) // added stdout in edit#1
val err = System.err
val toolClassArgsAsJava = List(stdin, stdout, // changed out to stdout in edit#1
err, toolCmdArgsAsJava).asJava
// parameterTypes: Class[_] *
// public int run( InputStream stdin, PrintStream out, PrintStream err, List<String> args)
val paramClasses: Array[Class[_]] = Array(classOf[InputStream], classOf[PrintStream], classOf[PrintStream], classOf[java.util.List[_]])
val method = clazzDFRT.getMethod("run", paramClasses : _*)
// the following produces wrong number of arguments exception
method.invoke(objDFRT.asInstanceOf[Object], toolClassArgsAsJava)
// sidebar: is this the end result for the Unit test - want out str with summary
out.toString("UTF-8").contains("File Summary")
}
}
}
I seem to have some issue in the invoke method part, but maybe the whole solution is a little off - I need to be able to invoke the method as well as load, instantiate or ...
How can I fix this to run the entire code segment (and repair a broken avro)?

It is hard to tell the exact nature of the problem since you didn't include the exception or stack trace. I am not sure why you are loading the avro tools dynamically instead of including the jar statically as part of your build.
// public int run( InputStream stdin, PrintStream out, PrintStream err, List<String> args)
val method = clazzDFRT.getMethod("run", Class[_] : _*)
You are not specifying the parameters correctly.
method.invoke(objDFRT.asInstanceOf[Object], toolClassArgsAsJava)
val params: Array[Class[_]] = Array(classOf[InputStream], classOf[PrintStream], classOf[PrintStream], classOf[java.util.List[_]])
val method = clazzDFRT.getMethod("run", params : _*)
or
val method = clazzDFRT.getMethod("run", classOf[InputStream], classOf[PrintStream], classOf[PrintStream], classOf[java.util.List[_]])
To fix the invoke, you cannot pass the parameters in a list. The invoke method takes variable arguments, you need to pass these in directly.
method.invoke(objDFRT.asInstanceOf[Object], stdin, stdout, stderr, toolCmdArgsAsJava)
or
method.invoke(objDFRT.asInstanceOf[Object], Array(stdin, stdout, stderr, toolCmdArgsAsJava): _*)
Notice the second option uses an Array not a List.
I suggest you read up on the documentation for using var args in Java and Scala
* https://docs.oracle.com/javase/8/docs/technotes/guides/language/varargs.html
* http://daily-scala.blogspot.com/2009/11/varargs.html

Related

The optimal way to lock and write to a file in Scala on Linux

I'm having a hard time finding the correct way to do any advanced file-system operations on Linux using Scala.
The one which I really can't figure out if best described by the following pseudo-code:
with fd = open(path, append | create):
with flock(fd, exclusive_lock):
fd.write(string)
Basically open a file in append mode (create it if it's non existent), get an exclusive lock to it and write to it (with the implicit unlock and close afterwards).
Is there an easy, clean and efficient way of doing this if I know my program will be ran on linux only ? (preferably without glancing offer the exceptions that should be handled).
Edit:
The answer I got is, as far as I've seen and tested is correct. However it's quite verbose, so I'm marking it as ok but I'm leaving this snippet of code here, which is the one I ended up using (Not sure if it's correct, but as far as I see it does everything that I need):
val fc = FileChannel.open(Paths.get(file_path), StandardOpenOption.CREATE, StandardOpenOption.APPEND)
try {
fc.lock()
fc.write(ByteBuffer.wrap(message.getBytes(StandardCharsets.UTF_8)))
} finally { fc.close() }
You can use FileChannel.lock and FileLock to get what you wanted:
import java.nio.ByteBuffer
import java.nio.channels.FileChannel
import java.nio.charset.StandardCharsets
import java.nio.file.{Path, Paths, StandardOpenOption}
import scala.util.{Failure, Success, Try}
object ExclusiveFsWrite {
def main(args: Array[String]): Unit = {
val path = Paths.get("/tmp/file")
val buffer = ByteBuffer.wrap("Some text data here".getBytes(StandardCharsets.UTF_8))
val fc = getExclusiveFileChannel(path)
try {
fc.write(buffer)
}
finally {
// channel close will also release a lock
fc.close()
}
()
}
private def getExclusiveFileChannel(path: Path): FileChannel = {
// Append if exist or create new file (if does not exist)
val fc = FileChannel.open(path, StandardOpenOption.WRITE, StandardOpenOption.APPEND,
StandardOpenOption.CREATE)
if (fc.size > 0) {
// set position to the end
fc.position(fc.size - 1)
}
// get an exclusive lock
Try(fc.lock()) match {
case Success(lock) =>
println("Is shared lock: " + lock.isShared)
fc
case Failure(ex) =>
Try(fc.close())
throw ex
}
}
}

How can I prevent memory leaks in a Scala code?

I have the code below that first read a file and then put these information in a HashMap(indexCategoryVectors). The HashMap contains a String (key) and a Long (value). The code uses the Long value to access a specific position of another file with RandomAccessFile.
By the information read in this last file and some manipulations the code write new information in another file (filename4). The only variable that accumulates information is the buffer (var buffer = new ArrayBuffer[Map[Int, Double]]()) but after each interaction the buffer is cleaned (buffer.clear).
The foreach command should run more than 4 million times, and what I'm realizing there is an accumulation in memory. I tested the code with a million times interaction and the code used more than 32GB of memory. I don't know the reason for that, maybe it's about Garbage Collection or anything else in JVM. Does anybody knows what can I do to prevent this memory leak?
def main(args: Array[String]): Unit = {
val indexCategoryVectors = getIndexCategoryVectors("filename1")
val uriCategories = getMappingURICategories("filename2")
val raf = new RandomAccessFile("filename3", "r")
var buffer = new ArrayBuffer[Map[Int, Double]]()
// Through each hashmap key.
uriCategories.foreach(uri => {
var emptyInterpretation = true
uri._2.foreach(categoria => {
val position = indexCategoryVectors.get(categoria)
// go to position
raf.seek(position.get)
var vectorSpace = parserVector(raf.readLine)
buffer += vectorSpace
//write the information of buffer in file
writeInformation("filename4")
buffer.clear
}
})
})
println("Success!")
}

Getting print calls in LuaJ

I'm writing a Java program which uses Lua scripts to determine what to output to certain areas of the program. Currently, my code looks as such:
Globals globals = JsePlatform.standardGlobals();
LuaValue chunk = globals.loadfile(dir.getAbsolutePath() + "/" + name);
chunk.call();
String output = chunk.tojstring();
The problem is that calling tojstring() appears to return return values from the Lua script. This is fine, but I need to get print calls, as that's what will be displayed on the screen. As of now, the print calls get sent directly to the Console (printed to console), and I cannot figure out a way to retrieve these print calls.
I've tried digging through the documentation but have had little success. Will change from LuaJ if needed.
Expanding on Joseph Boyle's answer (a few years later): You can also set up a printStream to a ByteArrayOutputStream (no need to do it to a file on disk) if that's your poison. I did this in a JUnit test with LuaJ and it works:
#Test
public void testPrintToStringFromLuaj() throws IOException {
String PRINT_HELLO = "print (\"hello world\")";
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PrintStream printStream = new PrintStream(baos, true, "utf-8");
Globals globals = JsePlatform.standardGlobals();
globals.STDOUT = printStream;
LuaValue load = globals.load(PRINT_HELLO);
load.call();
String content = new String(baos.toByteArray(), StandardCharsets.UTF_8);
printStream.close();
assertThat(content, is("hello world\n"));
}
I actually was able to solve the problem by changing the STDOUT variable in the globals object to a temporary file, and then reading the data from the temporary file.
Probably not the best solution, but works perfectly fine.

Java 8 Nashorn - capturing engine.eval("print('hello world')) into String object?

Is there a way to capture the result of print('hello world') inside Nashorn and place it in a String object.
I have tried this:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PrintStream ps = new PrintStream(baos, true);
System.setOut(ps);
String result = (String)engine.eval("print('hello world')");
if (result != null) {
System.out.println(result);
} else {
System.out.println(baos.toString());
}
When engine evaluates this javascript it just prints to stdout so I figured I just redirect stdout to my own OutputStream and then convert it to String, but it doesn't work.
Any thoughts?
You are setting the System.out stream after you have created your engine so it’s very likely that the engine’s context has already recorded the value of System.out/System.err at construction time.
Therefore you still see the output of the script on the console. Even worse, you don’t see the output of your own later-on System.out.println anymore as you have redirected System.out to your ByteArrayOutputStream.
So don’t modify System.out, instead change the output in the context of your scripting engine engine.getContext().setWriter(stringWriter).
Complete code:
StringWriter sw=new StringWriter();
engine.getContext().setWriter(sw);
String result = (String)engine.eval("print('hello world')");
System.out.println("Result returned by eval(): "+result);
System.out.println("Captured output: "+sw);
I suspect the problem is with:
if (result != null)
Evaluating the print(...) statement will still give you the return value of print, which is probably undefined. But my guess is that you still have the right stuff in baos, and if you changed your statement to if (false) it would work as expected.
That said, this may not be the best way to do this, depending what you're trying to do.
You don't want to cast the eval result - that would be JS undefined value. After you set Writer instance for the current ScriptContext, you eval code that writes by print. And then pick up accumulated strings from the Writer instance. (StringWriter's toString would give you that String).

How can I translate this deserialization code from java to scala?

I'm a Scala/Java noob, so sorry if this is a relatively easy solution--but I'm trying to access a model in an external file (an Apache Open NLP model), and not sure where I'm going wrong. Here's how you'd do it in Java, and here's what I'm trying:
import java.io._
val nlpModelPath = new java.io.File( "." ).getCanonicalPath + "/lib/models/en-sent.bin"
val modelIn: InputStream = new FileInputStream(nlpModelPath)
which works fine, but trying to instantiate an object based off the model in that binary file is where I'm failing:
val sentenceModel = new modelIn.SentenceModel // type SentenceModel is not a member of java.io.InputStream
val sentenceModel = new modelIn("SentenceModel") // not found: type modelIn
I've also tried a DataInputStream:
val file = new File(nlpModelPath)
val dis = new DataInputStream(file)
val sentenceModel = dis.SentenceModel() // value SentenceModel is not a member of java.io.DataInputStream
I'm not sure what I'm missing--maybe some method to convert the Stream to some binary object from which I can pull in methods? Thank you for any pointers.
The problem is that you're using wrong syntax (please, don't take it personal, but why don't you read some beginner java book or even just a tutorial first if you planning to stick with java or scala for some time?)
Code you would write in java
SentenceModel model = new SentenceModel(modelIn);
will look similar in scala:
val model: SentenceModel = new SentenceModel(modelIn)
// or just
val model = new SentenceModel(modelIn)
The problem you got with this syntax is that you forgot to import definition of SentenceModel so compiler simply has no clue what is SentenceModel.
Add
import opennlp.tools.sentdetect.SentenceModel
At the top of your .scala file and this will fix it.

Categories