Scala macros and the JVM's method size limit

Scala macros and the JVM's method size limit - java

I'm replacing some code generation components in a Java program with Scala macros, and am running into the Java Virtual Machine's limit on the size of the generated byte code for individual methods (64 kilobytes).
For example, suppose we have a large-ish XML file that represents a mapping from integers to integers that we want to use in our program. We want to avoid parsing this file at run time, so we'll write a macro that will do the parsing at compile time and use the contents of the file to create the body of our method:
import scala.language.experimental.macros
import scala.reflect.macros.Context
object BigMethod {
// For this simplified example we'll just make some data up.
val mapping = List.tabulate(7000)(i => (i, i + 1))
def lookup(i: Int): Int = macro lookup_impl
def lookup_impl(c: Context)(i: c.Expr[Int]): c.Expr[Int] = {
import c.universe._
val switch = reify(new scala.annotation.switch).tree
val cases = mapping map {
case (k, v) => CaseDef(c.literal(k).tree, EmptyTree, c.literal(v).tree)
}
c.Expr(Match(Annotated(switch, i.tree), cases))
}
}
In this case the compiled method would be just over the size limit, but instead of a nice error saying that, we're given a giant stack trace with a lot of calls to TreePrinter.printSeq and are told that we've slain the compiler.
I have a solution that involves splitting the cases into fixed-sized groups, creating a separate method for each group, and adding a top-level match that dispatches the input value to the appropriate group's method. It works, but it's unpleasant, and I'd prefer not to have to use this approach every time I write a macro where the size of the generated code depends on some external resource.
Is there a cleaner way to tackle this problem? More importantly, is there a way to deal with this kind of compiler error more gracefully? I don't like the idea of a library user getting an unintelligible "That entry seems to have slain the compiler" error message just because some XML file that's being processed by a macro has crossed some (fairly low) size threshhold.

Imo putting data into .class isn't really a good idea.
They are parsed as well, they're just binary. But storing them in JVM may have negative impact on performance of the garbagge collector and JIT compiler.
In your situation, I would pre-compile the XML into a binary file of proper format and parse that. Elligible formats with existing tooling can be e.g. FastRPC or good old DBF. Or maybe pre-fill an ElasticSearch repository if you need quick advanced lookups and searches. Some implementations of the latter may also provide basic indexing which could even leave the parsing out - the app would just read from the respective offset.

Since somebody has to say something, I followed the instructions at Importers to try to compile the tree before returning it.
If you give the compiler plenty of stack, it will correctly report the error.
(It didn't seem to know what to do with the switch annotation, left as a future exercise.)
apm#mara:~/tmp/bigmethod$ skalac bigmethod.scala ; skalac -J-Xss2m biguser.scala ; skala bigmethod.Test
Error is java.lang.RuntimeException: Method code too large!
Error is java.lang.RuntimeException: Method code too large!
biguser.scala:5: error: You ask too much of me.
Console println s"5 => ${BigMethod.lookup(5)}"
^
one error found
as opposed to
apm#mara:~/tmp/bigmethod$ skalac -J-Xss1m biguser.scala
Error is java.lang.StackOverflowError
Error is java.lang.StackOverflowError
biguser.scala:5: error: You ask too much of me.
Console println s"5 => ${BigMethod.lookup(5)}"
^
where the client code is just that:
package bigmethod
object Test extends App {
Console println s"5 => ${BigMethod.lookup(5)}"
}
My first time using this API, but not my last. Thanks for getting me kickstarted.
package bigmethod
import scala.language.experimental.macros
import scala.reflect.macros.Context
object BigMethod {
// For this simplified example we'll just make some data up.
//final val size = 700
final val size = 7000
val mapping = List.tabulate(size)(i => (i, i + 1))
def lookup(i: Int): Int = macro lookup_impl
def lookup_impl(c: Context)(i: c.Expr[Int]): c.Expr[Int] = {
def compilable[T](x: c.Expr[T]): Boolean = {
import scala.reflect.runtime.{ universe => ru }
import scala.tools.reflect._
//val mirror = ru.runtimeMirror(c.libraryClassLoader)
val mirror = ru.runtimeMirror(getClass.getClassLoader)
val toolbox = mirror.mkToolBox()
val importer0 = ru.mkImporter(c.universe)
type ruImporter = ru.Importer { val from: c.universe.type }
val importer = importer0.asInstanceOf[ruImporter]
val imported = importer.importTree(x.tree)
val tree = toolbox.resetAllAttrs(imported.duplicate)
try {
toolbox.compile(tree)
true
} catch {
case t: Throwable =>
Console println s"Error is $t"
false
}
}
import c.universe._
val switch = reify(new scala.annotation.switch).tree
val cases = mapping map {
case (k, v) => CaseDef(c.literal(k).tree, EmptyTree, c.literal(v).tree)
}
//val res = c.Expr(Match(Annotated(switch, i.tree), cases))
val res = c.Expr(Match(i.tree, cases))
// before returning a potentially huge tree, try compiling it
//import scala.tools.reflect._
//val x = c.Expr[Int](c.resetAllAttrs(res.tree.duplicate))
//val y = c.eval(x)
if (!compilable(res)) c.abort(c.enclosingPosition, "You ask too much of me.")
res
}
}

Related

Java Zookeeper API weird ZNode behavior. Unable to delete ZNode properly. It has unexpected results

I am trying to create a persistent ZNode and storing the number of lines of a particular file that I have processed. Creation works just like it should, so does reading data from the node, but deletion doesn't work if it's in the same code. I'd explain what I mean.
I have created functions:
setOrCreateFileCheckpoint(fileName: String, lineNumber: Int) :- checks if the ZNode exists, creates it if it doesn't and sets the stored value to lineNumber
getFileCheckpoint(fileName: String) :- returns the value stored in the ZNode
deleteFileCheckpoint(fileName: String) :- deletes the ZNode
below is the code for all three:
/*
updates or creates a checkpoint for a file being processed
*/
def setOrCreateFileCheckpoint(fileName: String, lineNumber: Int): Unit =
{
val fileCheckpointPath = checkpointPoolPath + "/" +fileName
val zk = getZookeeper
val zkCuratorClient = getZookeeperCuratorClient
if ( zk.exists(fileCheckpointPath, false) == null)
{
val node = new PersistentNode(zkCuratorClient, CreateMode.PERSISTENT, false, fileCheckpointPath, lineNumber.toString.getBytes())
node.start()
}
else
zk.setData(fileCheckpointPath, lineNumber.toString.getBytes(), -1)
}
/*
gets checkpoint for a file
*/
def getFileCheckpoint(fileName: String): Int =
{
val fileCheckpointPath = checkpointPoolPath + "/" +fileName
val zk = getZookeeper
val zkCuratorClient = getZookeeperCuratorClient
if ( zk.exists(fileCheckpointPath, false) != null)
new String(zk.getData(fileCheckpointPath, false, null)).toInt
else
0
}
/*
deletes the file checkpoint so that we don't keep accumulating zNodes on the zookeeper
*/
def deleteFileCheckpoint(fileName: String): Unit =
{
val fileCheckpointPath = checkpointPoolPath + "/" +fileName
val zk = getZookeeper
if ( zk.exists(fileCheckpointPath, false) == null)
{
throw RuntimeException("Trying to delete checkpoint that doesn't exist for file: " + fileName)
}
else
{
/*println(zk.exists(fileCheckpointPath, false).getVersion)
zk.delete(fileCheckpointPath, zk.exists(fileCheckpointPath, false).getVersion)*/
deleteChildren(zk, fileCheckpointPath, true)
}
}
Following is the code I am testing and am perplexed by:
ZookeeperUtility.setOrCreateFileCheckpoint("file1", 2000) //let's call it cre1
println(ZookeeperUtility.getFileCheckpoint("file1")) //let's call it get1
ZookeeperUtility.deleteFileCheckpoint("file1") //let's call it del1
println("del1")
ZookeeperUtility.deleteFileCheckpoint("file1") //let's call in del2
println("del2")
Run 1:
Step1: I Run the code shown above
Result: Error encountered on the del2
Step2: Comment out cre1 and run the code again
Result: Node is fetched, gives the correct value as result error encountered on del2. This is mind boggling. I can't understand why. The node is supposed to be deleted.
Step3: with cre1 still commented, same as previous step, run the code again
Result: Node doesn't exist gives 0 at get1 which means node doesn't exist. error is encountered at del1. Which is what should've happened in step2 itself
Run2:
Step1: Comment out del2, run the code
Result: creates node, fetches correct data, exits normally
Step2: Comment out cre1, run the code
Result: Fetches the value 2000 from a node that was supposed to be deleted. exits normally
Step3: Run the same code as step2 again
Result: fetches 0, error encountered on del1.
If I run the code one step at a time, if I only create in one run, only fetch in the next run and only delete in the run after that, everything works just like it should. I am on the brink of pulling my hair out.
P.S. The code is written in Scala but I am using the Java API. Scala can seemlessly work with Java classes.
If you look at the deleteFileCheckpoint function I have commented out a part, I have tried that approach as well. It has the exact same behvaior.

This is mind boggling. I can't understand why. The node is supposed to be deleted.
I'm not sure why you're surprised. You are creating a PersistentNode which exists to automatically recreate the node should it get deleted. In fact, all the surrounding code is very puzzling. It's duplicating what PersistentNode does internally. You don't need to do all that other stuff. Just use PersistentNode.
Further, code like checkExists() followed by an action based on the result will almost never work in production. ZooKeeper is highly concurrent and eventually consistent. This is why you should always use Curator's recipes instead of hand-coding solutions.

Returning value from persisted Groovy code without using java.io.File

In my end product, I provide the ability to extend the application code at runtime using small Groovy scripts that are edited via a form and whose code are persisted in the SQL database.
The scheme that these "custom code" snippets follow is generally returning a value based on input parameters. For example, during the invoicing of a service, the rating system might use a published schedule of predetermined rates, or values defined in a contract in the application, through custom groovy code, if an "overridden" value is returned, then it should be used.
In the logic that would determine the "override" value of the rate, I've incorporated something like these groovy code snippets that return a value, or if they return null, then the default value is used. E.g.
class GroovyRunner {
static final GroovyClassLoader classLoader = new GroovyClassLoader()
static final String GROOVY_CODE = MyDatabase().loadCustomCode()
static final String GROOVY_CLASS = MyDatabase().loadCustomClassName()
static final String TEMPDIR = System.getProperty("java.io.tmpdir")
double getOverrideRate(Object inParameters) {
def file = new File(TEMPDIR+GROOVY_CLASS+".groovy")
BufferedWriter bw = new BufferedWriter(new FileWriter(file))
bw.write(GROOVY_CODE)
bw.close()
Class gvy = classLoader.parseClass(file)
GroovyObject obj = (GroovyObject) gvy.getDeclaredConstructor().newInstance()
return Double.valueOf(obj.invokeMethod("getRate",inParameters)
}
}
And then, in the user-created custom groovy code:
class RateInterceptor {
def getRate(Object inParameters) {
def businessEntity = (SomeClass) inParameters
return businessEntity.getDiscount() == .5 ? .5 : null
}
}
The problem with this is that these "custom code" bits in GROOVY_CODE above, are pulled from a database during runtime, and contain an intricate groovy class. Since this method will be called numerous times in succession, it is impractical to create a new File object each time it is run.
Whether I use GroovyScriptEngine, or the GroovyClassLoader, these both involve the need of a java.io.File object. This makes the code execute extremely slowly, as the File will have to be created after the custom groovy code is retrieved from the database. Is there any way to run groovy code that can return a value without creating a temporary file to execute it?

The straight solution for your case would be using GroovyClassLoader.parseClass(String text)
http://docs.groovy-lang.org/latest/html/api/groovy/lang/GroovyClassLoader.html#parseClass(java.lang.String)
The class caching should not be a problem because you are creating each time a new GroovyClassLoader
However think about using groovy scripts instead of classes
your rate interceptor code could be like this:
def businessEntity = (SomeClass) context
return businessEntity.getDiscount() == .5 ? .5 : null
or even like this:
context.getDiscount() == .5 ? .5 : null
in script you could declare functions, inner classes, etc
so, if you need the following script will work also:
class RateInterceptor {
def getRate(SomeClass businessEntity) {
return businessEntity.getDiscount() == .5 ? .5 : null
}
}
return new RateInterceptor().getRate(context)
The java code to execute those kind of scripts:
import groovy.lang.*;
...
GroovyShell gs = new GroovyShell();
Script script = gs.parse(GROOVY_CODE);
// bind variables
Binding binding = new Binding();
binding.setVariable("context", inParams);
script.setBinding(binding);
// run script
Object ret = script.run();
Note that parsing of groovy code (class or script) is a heavy operation. And if you need to speedup your code think about caching of parsed class into some in-memory cache or even into a map
Map<String, Class<groovy.lang.Script>>

Straight-forward would be also:
GroovyShell groovyShell = new GroovyShell()
Closure groovy = { String name, String code ->
String script = "{ Map params -> $code }"
groovyShell.evaluate( script, name ) as Closure
}
def closure = groovy( 'SomeName', 'params.someVal.toFloat() * 2' )
def res = closure someVal:21
assert 42.0f == res

Simple Code to Copy Same Name Properties?

I have an old question sustained in my mind for a long time. When I was writing code in Spring, there are lots dirty and useless code for DTO, domain objects. For language level, I am hopeless in Java and see some light in Kotlin. Here is my question:
Style 1 It is common for us to write following code (Java, C++, C#, ...)
// annot: AdminPresentation
val override = FieldMetadataOverride()
override.broadleafEnumeration = annot.broadleafEnumeration
override.hideEnumerationIfEmpty = annot.hideEnumerationIfEmpty
override.fieldComponentRenderer = annot.fieldComponentRenderer
Sytle 2 Previous code can be simplified by using T.apply() in Kotlin
override.apply {
broadleafEnumeration = annot.broadleafEnumeration
hideEnumerationIfEmpty = annot.hideEnumerationIfEmpty
fieldComponentRenderer = annot.fieldComponentRenderer
}
Sytle 3 Can such code be even simplified to something like this?
override.copySameNamePropertiesFrom (annot) { // provide property list here
broadleafEnumeration
hideEnumerationIfEmpty
fieldComponentRenderer
}
First Priority Requirments
Provide property name list only one time
The property name is provided as normal code, so as to we can get IDE auto complete feature.
Second Priority Requirements
It's prefer to avoid run-time cost for Style 3. (For example, 'reflection' may be a possible implementation, but it do introduce cost.)
It's prefer to generated code like style1/style2 directly.
Not care
The final syntax of Style 3.
I am a novice for Kotlin language. Is it possible to use Kotlin to define somthing like 'Style 3' ?

It should be pretty simple to write a 5 line helper to do this which even supports copying every matching property or just a selection of properties.
Although it's probably not useful if you're writing Kotlin code and heavily utilising data classes and val (immutable properties). Check it out:
fun <T : Any, R : Any> T.copyPropsFrom(fromObject: R, vararg props: KProperty<*>) {
// only consider mutable properties
val mutableProps = this::class.memberProperties.filterIsInstance<KMutableProperty<*>>()
// if source list is provided use that otherwise use all available properties
val sourceProps = if (props.isEmpty()) fromObject::class.memberProperties else props.toList()
// copy all matching
mutableProps.forEach { targetProp ->
sourceProps.find {
// make sure properties have same name and compatible types
it.name == targetProp.name && targetProp.returnType.isSupertypeOf(it.returnType)
}?.let { matchingProp ->
targetProp.setter.call(this, matchingProp.getter.call(fromObject))
}
}
}
This approach uses reflection, but it uses Kotlin reflection which is very lightweight. I haven't timed anything, but it should run almost at same speed as copying properties by hand.
Now given 2 classes:
data class DataOne(val propA: String, val propB: String)
data class DataTwo(var propA: String = "", var propB: String = "")
You can do the following:
var data2 = DataTwo()
var data1 = DataOne("a", "b")
println("Before")
println(data1)
println(data2)
// this copies all matching properties
data2.copyPropsFrom(data1)
println("After")
println(data1)
println(data2)
data2 = DataTwo()
data1 = DataOne("a", "b")
println("Before")
println(data1)
println(data2)
// this copies only matching properties from the provided list
// with complete refactoring and completion support
data2.copyPropsFrom(data1, DataOne::propA)
println("After")
println(data1)
println(data2)
Output will be:
Before
DataOne(propA=a, propB=b)
DataTwo(propA=, propB=)
After
DataOne(propA=a, propB=b)
DataTwo(propA=a, propB=b)
Before
DataOne(propA=a, propB=b)
DataTwo(propA=, propB=)
After
DataOne(propA=a, propB=b)
DataTwo(propA=a, propB=)

Different types of Groovy execution in Java

I have 3 questions regarding using Groovy in java. They are all related together so I'm only creating one question here.
1) There are: GroovyClassLoader, GroovyShell, GroovyScriptEngine. But what is the difference between using them?
For example for this code:
static void runWithGroovyShell() throws Exception {
new GroovyShell().parse(new File("test.groovy")).invokeMethod("hello_world", null);
}
static void runWithGroovyClassLoader() throws Exception {
Class scriptClass = new GroovyClassLoader().parseClass(new File("test.groovy"));
Object scriptInstance = scriptClass.newInstance();
scriptClass.getDeclaredMethod("hello_world", new Class[]{}).invoke(scriptInstance, new Object[]{});
}
static void runWithGroovyScriptEngine() throws Exception {
Class scriptClass = new GroovyScriptEngine(".").loadScriptByName("test.groovy");
Object scriptInstance = scriptClass.newInstance();
scriptClass.getDeclaredMethod("hello_world", new Class[]{}).invoke(scriptInstance, new Object[]{});
}
2) What is the best way to load groovy script so it remains in memory in compiled form, and then I can call a function in that script when I need to.
3) How do I expose my java methods/classes to groovy script so that it can call them when needed?

Methods 2 and 3 both return the parsed class in return. So you can use a map to keep them in memory once they are parsed and successfully loaded.
Class scriptClass = new GroovyClassLoader().parseClass(new File("test.groovy"));
map.put("test.groovy",scriptClass);
UPDATE:
GroovyObject link to the groovy object docs.
Also this is possible to cast the object directly as GroovyObject and other java classes are indistinguishable.
Object aScript = clazz.newInstance();
MyInterface myObject = (MyInterface) aScript;
myObject.interfaceMethod();
//now here you can also cache the object if you want to
Cannot comment on efficiency. But I guess if you keep the loaded classes in memory, one time parsing would not hurt much.
UPDATE For efficiency:
You should use GroovyScriptEngine, it uses script caching internally.
Here is the link: Groovy Script Engine
Otherwise you can always test it using some performance benchmarks yourself and you would get rough idea. For example: Compiling groovy scripts with all three methods in three different loops and see which performs better. Try using same and different scripts, to see if caching kicks in, in some way.
UPDATE FOR PASSING PARAMS TO AND FROM SCRIPT
Binding class will help you sending the params to and from the script.
Example Link
// setup binding
def binding = new Binding()
binding.a = 1
binding.setVariable('b', 2)
binding.c = 3
println binding.variables
// setup to capture standard out
def content = new StringWriter()
binding.out = new PrintWriter(content)
// evaluate the script
def ret = new GroovyShell(binding).evaluate('''
def c = 9
println 'a='+a
println 'b='+b
println 'c='+c
retVal = a+b+c
a=3
b=2
c=1
''')
// validate the values
assert binding.a == 3
assert binding.getVariable('b') == 2
assert binding.c == 3 // binding does NOT apply to def'd variable
assert binding.retVal == 12 // local def of c applied NOT the binding!
println 'retVal='+binding.retVal
println binding.variables
println content.toString()

Struggle against habits formed by Java when migrating to Scala

What are the most common mistakes that Java developers make when migrating to Scala?
By mistakes I mean writing a code that does not conform to Scala spirit, for example using loops when map-like functions are more appropriate, excessive use of exceptions etc.
EDIT: one more is using own getters/setters instead of methods kindly generated by Scala

It's quite simple: Java programmer will tend to write imperative style code, whereas a more Scala-like approach would involve a functional style.
That is what Bill Venners illustrated back in December 2008 in his post "How Scala Changed My Programming Style".
That is why there is a collection of articles about "Scala for Java Refugees".
That is how some of the SO questions about Scala are formulated: "help rewriting in functional style".

One obvious one is to not take advantage of the nested scoping that scala allows, plus the delaying of side-effects (or realising that everything in scala is an expression):
public InputStream foo(int i) {
final String s = String.valueOf(i);
boolean b = s.length() > 3;
File dir;
if (b) {
dir = new File("C:/tmp");
} else {
dir = new File("/tmp");
}
if (!dir.exists()) dir.mkdirs();
return new FileInputStream(new File(dir, "hello.txt"));
}
Could be converted as:
def foo(i : Int) : InputStream = {
val s = i.toString
val b = s.length > 3
val dir =
if (b) {
new File("C:/tmp")
} else {
new File("/tmp")
}
if (!dir.exists) dir.mkdirs()
new FileInputStream(new File(dir, "hello.txt"))
}
But this can be improved upon a lot. It could be:
def foo(i : Int) = {
def dir = {
def ensuring(d : File) = { if (!d.exists) require(d.mkdirs); d }
def b = {
def s = i.toString
s.length > 3
}
ensuring(new File(if (b) "C:/tmp" else "/tmp"));
}
new FileInputStream(dir, "hello.txt")
}
The latter example does not "export" any variable beyond the scope which it is needed. In fact, it does not declare any variables at all. This means it is easier to refactor later. Of course, this approach does lead to hugely bloated class files!

A couple of my favourites:
It took me a while to realise how truly useful Option is. A common mistake carried from Java is to use null to represent a field/variable that sometimes does not have a value. Recognise that you can use 'map' and 'foreach' on Option to write safer code.
Learn how to use 'map', 'foreach', 'dropWhile', 'foldLeft', ... and other handy methods on Scala collections to save writing the kind of loop constructions you see everywhere in Java, which I now perceive as verbose, clumsy, and harder to read.

A common mistake is to go wild and overuse a feature not present in Java once you "grokked" it. E.g. newbies tend to overuse pattern matching(*), explicit recursion, implicit conversions, (pseudo-) operator overloading and so on. Another mistake is to misuse features that look superficially similar in Java (but ain't), like for-comprehensions or even if (which works more like Java's ternary operator ?:).
(*) There is a great cheat sheet for pattern matching on Option: http://blog.tmorris.net/scalaoption-cheat-sheet/

I haven't adopted lazy vals and streams so far.
In the beginning, a common error (which the compiler finds) is to forget the semicolon in a for:
for (a <- al;
b <- bl
if (a < b)) // ...
and where to place the yield:
for (a <- al) yield {
val x = foo (a).map (b).filter (c)
if (x.cond ()) 9 else 14
}
instead of
for (a <- al) {
val x = foo (a).map (b).filter (c)
if (x.cond ()) yield 9 else yield 14 // why don't ya yield!
}
and forgetting the equals sign for a method:
def yoyo (aka : Aka) : Zirp { // ups!
aka.floskel ("foo")
}

Using if statements. You can usually refactor the code to use if-expressions or by using filter.
Using too many vars instead of vals.
Instead of loops, like others have said, use the list comprehension functions like map, filter, foldLeft, etc. If there isn't one available that you need (look carefully there should be something you can use), use tail recursion.
Instead of setters, I keep the spirit of functional programming and have my objects immutable. So instead I do something like this where I return a new object:
class MyClass(val x: Int) {
def setX(newx: Int) = new MyClass(newx)
}
I try to work with lists as much as possible. Also, to generate lists, instead of using a loop, use the for/yield expressions.

Using Arrays.
This is basic stuff and easily spotted and fixed, but will slow you down initially when it bites your ass.
Scala has an Array object, while in Java this is a built in artifact. This means that initialising and accessing elements of the array in Scala are actually method calls:
//Java
//Initialise
String [] javaArr = {"a", "b"};
//Access
String blah1 = javaArr[1]; //blah1 contains "b"
//Scala
//Initialise
val scalaArr = Array("c", "d") //Note this is a method call against the Array Singleton
//Access
val blah2 = scalaArr(1) //blah2 contains "d"

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.