How to increase performance of Groovy? - java

I'm using Groovy to execute some piece of Java code.
For my purpose Groovy it's easy to use since the code I have to execute has an arbitrary number of params that I cannot predict, since it depends on the user input.
The input I'm talking about is OWL axioms, that are nested.
This is my code:
//The reflection
static void reflectionToOwl() {
Binding binding = new Binding(); //155 ms
GroovyShell shell = new GroovyShell(binding);
while (!OWLMapping.axiomStack.isEmpty()) {
String s = OWLMapping.axiomStack.pop();
shell.evaluate(s); //350 ms
}
}
The only bottleneck in my program is exactly here. More is the data I have to process more is the ms I have to wait for.
Do you have any suggestions?

If you need to increase Groovy performance, you can use #CompileStatic annotation.
This will let the Groovy compiler use compile time checks in the style of Java then perform static compilation, thus bypassing the Groovy meta object protocol.
Just annotate specific method with it. But be sure that you don't use any dynamic features in that scope.
As an example:
import groovy.transform.CompileStatic
#CompileStatic
class Static {
}
class Dynamic {
}
println Static.declaredMethods.length
Static.declaredMethods.collect { it.name }.each { println it }
println('-' * 100)
println Dynamic.declaredMethods.length
Dynamic.declaredMethods.collect{ it.name }.each { println it }
Won't generate some extra methods:
6
invokeMethod
getMetaClass
setMetaClass
$getStaticMetaClass
setProperty
getProperty
8
invokeMethod
getMetaClass
setMetaClass
$getStaticMetaClass
$getCallSiteArray
$createCallSiteArray
setProperty
getProperty

Like the first answer indicated, #CompileStatic would have been the first option on my list of tricks as well.
Depending on your use case, pre-parsing the script expressions and calling 'run' on them execution time might be an option here. The following code demonstrates the idea:
def exprs = [
"(1..10).sum()",
"[1,2,3].max()"
]
def shell = new GroovyShell()
def scripts = time("parse exprs") {
exprs.collect { expr ->
shell.parse(expr) // here we pre-parse the strings to groovy Script instances
}
}
def standardBindings = [someKey: 'someValue', someOtherKey: 'someOtherValue']
scripts.eachWithIndex { script, i ->
time("run $i") {
script.binding = new Binding(standardBindings)
def result = script.run() // execute the pre-parsed Script instance
}
}
// just a small method for timing operations
def time(str, closure) {
def start = System.currentTimeMillis()
def result = closure()
def delta = System.currentTimeMillis() - start
println "$str took $delta ms -> result $result"
result
}
which prints:
parse exprs took 23 ms -> result [Script1#1165b38, Script2#4c12331b]
run 0 took 7 ms -> result 55
run 1 took 1 ms -> result 3
on my admittedly aging laptop.
The above code operates in two steps:
parse the String expressions into Script instances using shell.parse. This can be done in a background thread, on startup or otherwise while the user is not waiting for results.
"execution time" we call script.run() on the pre-parsed script instances. This should be faster than calling shell.evaluate.
The takeaway here is that if your use case allows for pre-parsing and has a need for runtime execution speed, it's possible to get quite decent performance with this pattern.
An example application I have used this in is a generic feed file import process where the expressions were customer editable data mapping expressions and the data was millions of lines of product data. You parse the expressions once and call script.run millions of times. In this kind of scenario pre-parsing saves a lot of cycles.

Insted of Groovy you can also use BeanShell.
It is supereasy to use and it is very light:
Website
Probably not all Java function are supported, but just give a try.

Related

In Karate, what is the advantage of wrapping a Java function in a JavaScript function?

I can wrap a Java function like this:
* def myJavaMethod =
"""
function() {
var Utils = Java.type('Utils');
// use Number type in constructor
var obj = new Utils(...);
return obj.myJavaMethod();
}
"""
But why would I? I can use Java functions straight in the test scenarios, like this:
Scenario: Test exec and error value
* def Utils = Java.type('Utils');
* def ret = Utils.exec('echo "From Java" > /home/richard//karate-test/karate-0.9.6/out.txt');
* match read('out.txt') == "From Java\n";
* match Utils.exec('exit 123') == 123
Note: exec is a static method that uses a shell to execute the command given.
At least I tested this for static methods and that works fine without the JavaScript detour.
It seems that the JavaScript wrapper only adds an extra layer of complication.
Besides that, with the 'call' syntax I can only pass one parameter (that admittedly can be an entire object or array).
But, I can pass parameters straight to a Java function (using normal syntax) and even use the result in a match.
(I assume parameters and results are implicitly converted to a JavaScript value, that could be a JSON object or array).
So far, I miss to see the advantage of explicitly wrapping Java code inside JavaScript wrappers. I assume the problem is me, that is: I am missing some important point here?
The only advantage is to reduce typing and when you have a lot of code re-use. For example, instead of:
* def foo = java.util.UUID.randomUUID() + ''
You define it once:
* def uuid = function(){ return java.util.UUID.randomUUID() + '' }
And then call it wherever, and it is super-concise:
* def json = { foo: '#(uuid())' }

Groovy gmongo batch processing

I'm currently trying to run a batch processing job in groovy with Gmongo driver, the collection is about 8 gigs my problem is that my script tries to load everything in-memory, ideally I'd like to be able to process this in batch similar to what Spring Boot Batch does but in groovy scripts
I've tried batchSize(), but this function still retrieves the entire collection into memory only to apply it to my logic in batch-process.
here's my example
momngoDb.collection.find().collect() it -> {
//logic
}
according to official doc:
https://docs.mongodb.com/manual/tutorial/iterate-a-cursor/#read-operations-cursors
def myCursor = db.collection.find()
while (myCursor.hasNext()) {
print( myCursor.next() }
}
After deliberation I found this solution to works best for the following reasons.
Unlike the Cursor it doesn't retrieve documents on a singular basis for processing (which can be terribly slow)
Unlike the Gmongo batch funstion, it also doesn't try to upload the the entire collection in memory only to cut it up in batches for process, this tends to be heavy on machine resources.
code below is efficient and light on resource depending on your batch size.
def skipSize = 0
def limitSize = Integer.valueOf(1000) batchSize (if your going to hard code the batch size then you dont need the int convertion)
def dbSize = Db.collectionName.count()
def dbRunCount = (dbSize / limitSize).round()
dbRunCount.times { it ->
dstvoDsEpgDb.schedule.find()
.skip(skipSize)
.limit(limitSize)
.collect { event ->
//run your business logic processing
}
//calculate the next skipSize
skipSize += limitSize
}

Groovy Postbuild in Jenkins, parsing the log for strings and counting them

I am new to Groovy and am trying to set up a postbuild in Jenkins that allows me to count strings and determine if the build succeeded by how many the count returns at the end.
Here is my example code :
class Main {
def manager = binding.getVariable("manager")
def log = manager.build.logFile.text
def list = log
def JobCount = list.count {it.startsWith("====") && it.contains("COMPLETE")}
if (JobCount == 7) {
manager.listener.logger.println("All Jobs Completed Successfully")
} else {
manager.addWarningBadge("Not All Jobs Have Completed Successfully")
manager.buildUnstable()
}
}
I am looking for a specific string that gets printed to the console when the test has completed successfully. The string is "====JOB COMPLETE====" and I should have 7 instances of this string if all 7 tests passed correctly.
Currently when I run this code I get the following error :
Script1.groovy: 6: unexpected token: if # line 6, column 5.
if (JobCount == 7)
^
Any help would be greatly appreciated
manager.build.logFile.text returns the whole file text as String.
What you need is readLines():
def list = manager.build.logFile.readLines()
def JobCount = list.count {it.startsWith("====") && it.contains("COMPLETE")}
and of course as mentioned below, the Jenkins Groovy Postbuild plugin runs Groovy scripts, so you will have get rid of the enclosing class declaration (Main)
You have statements directly inside your class, without being in a method, which is not allowed in Java/Groovy. Since this is Groovy, you can run this as a script without the class at all, or put the offending code (the if statement) inside a method and call the method.
Perhaps you're missing a closing }
def JobCount = list.count {it.startsWith("====") && it.contains("COMPLETE")}

How make Java 8 Nashorn fast?

I'm using Java 8 Nashorn to render CommonMark to HTML server side. If I compile and cache and reuse a CompiledScript, a certain page takes 5 minutes to render. However, if I instead use eval, and cache and reuse the script engine, rendering the same page takes 3 seconds.
Why is CompiledScript so slow? (sample code follows)
What's a good approach for running Javascript code in Nashorn, over and over again as quickly as possible? And avoiding compiling the Javascript code more than once?
This is the server side Scala code snippet that calls Nashorn in a way that takes 5 minutes: (when run 200 times; I'm compiling many comments from CommonMark to HTML.) (This code is based on this blog article.)
if (engine == null) {
val script = scala.io.Source.fromFile("public/res/remarkable.min.js").mkString
engine = new js.ScriptEngineManager(null).getEngineByName("nashorn")
compiledScript = engine.asInstanceOf[js.Compilable].compile(s"""
var global = this;
$script;
remarkable = new Remarkable({});
remarkable.render(__source__);""");
}
engine.put("__source__", "**bold**")
val htmlText = compiledScript.eval()
Edit Note that the $script above is reevaluated 200 times. I did test a version that evaluated it only once, but apparently then I wrote some bug, because the only-once version wasn't faster than 5 minutes, although it should have been one of the fastest ones, see Halfbit's answer. Here's the fast version:
...
val newCompiledScript = newEngine.asInstanceOf[js.Compilable].compile(s"""
var global;
var remarkable;
if (!remarkable) {
global = this;
$script;
remarkable = new Remarkable({});
}
remarkable.render(__source__);""")
...
/Edit
Whereas this takes 2.7 seconds: (when run 200 times)
if (engine == null) {
engine = new js.ScriptEngineManager(null).getEngineByName("nashorn")
engine.eval("var global = this;")
engine.eval(new jio.FileReader("public/res/remarkable.min.js"))
engine.eval("remarkable = new Remarkable({});")
}
engine.put("source", "**bold**")
val htmlText = engine.eval("remarkable.render(source)")
I would actually have guessed that the CompiledScript version (the topmost snippet) would have been faster. Anyway, I suppose I'll have to cache the rendered HTML server side.
(Linux Mint 17 & Java 8 u20)
Update:
I just noticed that using invokeFunction at the end instead of eval is almost twice as fast, takes only 1.7 seconds. This is roughly as fast as my Java 7 version that used Javascript code compiled by Rhino to Java bytecode (as a separate and complicated step in the build process). Perhaps this is as fast as it can get?
if (engine == null) {
engine = new js.ScriptEngineManager(null).getEngineByName("nashorn")
engine.eval("var global = this;")
engine.eval(new jio.FileReader("public/res/remarkable.min.js"))
engine.eval("remarkable = new Remarkable({});")
engine.eval(
"function renderCommonMark(source) { return remarkable.render(source); }")
}
val htmlText = engine.asInstanceOf[js.Invocable].invokeFunction(
"renderCommonMark", "**bold1**")
The variant of your code which uses CompiledScript seems to re-evaluate remarkable.min.js 200 times - while your eval based version does this once. This explains the huge difference in runtimes.
With just the remarkable.render(__source__) precompiled, the CompiledScript based variant is slightly faster than the eval and invokeFunction based ones (on my machine, Oracle Java 8u25).
CompiledScript has been improved a bit in 8u40. You can download early access download of jdk8u40 # https://jdk8.java.net/download.html

How user can (safely) programme their own filter in Java

I want my users to be able to write there own filter when requesting a List in Java.
Option 1) I'm thinking about JavaScript with Rhino.
I get my user's filter as a javascript string. And then call isAccepted(myItem) in this script.
Depending on the reply I accept the element or not.
Option 2) I'm thinking about Groovy.
My user can write Groovy script in a textfield. When my user searches with this filter the Groovy script is compiled in Java (if first call) and call the Java methode isAccepted()
Depending on the reply I accept the element or not.
My application rely a lot on this fonctionallity and it will be called intensively on my server.
So I beleave speed is the key.
Option 1 thinking:
Correct me if I'm wrong, but I think in my case the main advantage of Groovy is the speed but my user can compile and run unwanted code on my server... (any workaround?)
Option 2 thinking:
I think in most people mind JavaScript is more like a toy. Even if it's not my idea at all it is probably for my customers who will not trust it that much. Do you think so?
An other bad point I expect is speed, from my reading on the web.
And again my user can access Java and run unwanted code on my server... (any workaround?)
More info:
I'm running my application on Google App Engine for the main web service of my app.
The filter will be apply 20 times by call.
The filter will be (most of the times) simple.
Any idea to make this filter safe for my server?
Any other approche to make it work?
My thoughts:
You'll have to use your own classloader when compiling your script, to avoid any other classes to be accessible from the script. Not sure if that is possible in GAE.
You'll have to use Java's SecurityManager features to avoid a script being able to access the file ssystem, network, etc etc. Not sure if that is possible in GAE.
Looking only at the two items above, it looks incredibly complicated and brittle to me. If you can't find existing sandboxing features as an existing project, you should stay away from it.
Designing a Domain Specific Language that will allow the expressions you decide are legal is a lot safer, and looking at the above items, you will have to think very hard anyway at what you want to allow. From there to designing the language is not a big step.
Be careful not to implement the DSL with groovy closures (internal DSL), because that is just groovy and you are hackable too. You need to define an extrnal language and parse it. I recommend the parser combinator jparsec to define the grammar. no compiler compiler needed in that case.
http://jparsec.codehaus.org/
FYI, here's a little parser I wrote with jparsec (groovy code):
//import some static methods, this will allow more concise code
import static org.codehaus.jparsec.Parsers.*
import static org.codehaus.jparsec.Terminals.*
import static org.codehaus.jparsec.Scanners.*
import org.codehaus.jparsec.functors.Map as FMap
import org.codehaus.jparsec.functors.Map4 as FMap4
import org.codehaus.jparsec.functors.Map3 as FMap3
import org.codehaus.jparsec.functors.Map2 as FMap2
/**
* Uses jparsec combinator parser library to construct an external DSL parser for the following grammar:
* <pre>
* pipeline := routingStep*
* routingStep := IDENTIFIER '(' parameters? ')'
* parameters := parameter (',' parameter)*
* parameter := (IDENTIFIER | QUOTED_STRING) ':' QUOTED_STRING
* </pre>
*/
class PipelineParser {
//=======================================================
//Pass 1: Define which terminals are part of the grammar
//=======================================================
//operators
private static def OPERATORS = operators(',', '(', ')', ':')
private static def LPAREN = OPERATORS.token('(')
private static def RPAREN = OPERATORS.token(')')
private static def COLON = OPERATORS.token(':')
private static def COMMA = OPERATORS.token(',')
//identifiers tokenizer
private static def IDENTIFIER = Identifier.TOKENIZER
//single quoted strings tokenizer
private static def SINGLE_QUOTED_STRING = StringLiteral.SINGLE_QUOTE_TOKENIZER
//=======================================================
//Pass 2: Define the syntax of the grammar
//=======================================================
//PRODUCTION RULE: parameter := (IDENTIFIER | QUOTED_STRING) ':' QUOTED_STRING
#SuppressWarnings("GroovyAssignabilityCheck")
private static def parameter = sequence(or(Identifier.PARSER,StringLiteral.PARSER), COLON, StringLiteral.PARSER, new FMap3() {
def map(paramName, colon, paramValue) {
new Parameter(name: paramName, value: paramValue)
}
})
//PRODUCTION RULE: parameters := parameter (',' parameter)*
#SuppressWarnings("GroovyAssignabilityCheck")
private static def parameters = sequence(parameter, sequence(COMMA, parameter).many(), new FMap2() {
def map(parameter1, otherParameters) {
if (otherParameters != null) {
[parameter1, otherParameters].flatten()
} else {
[parameter1]
}
}
})
//PRODUCTION RULE: routingStep := IDENTIFIER '(' parameters? ')'
#SuppressWarnings("GroovyAssignabilityCheck")
private static def routingStep = sequence(Identifier.PARSER, LPAREN, parameters.optional(), RPAREN, new FMap4() {
def map(routingStepName, lParen, parameters, rParen) {
new RoutingStep(
name: routingStepName,
parameters: parameters ?: []
)
}
})
//PRODUCTION RULE: pipeline := routingStep*
#SuppressWarnings("GroovyAssignabilityCheck")
private static def pipeline = routingStep.many().map(new FMap() {
def map(from) {
new Pipeline(
routingSteps: from
)
}
})
//Combine the above tokenizers to create the tokenizer that will parse the stream and spit out the tokens of the grammar
private static def tokenizer = or(OPERATORS.tokenizer(), SINGLE_QUOTED_STRING, IDENTIFIER)
//This parser will be used to define which input sequences need to be ignored
private static def ignored = or(JAVA_LINE_COMMENT, JAVA_BLOCK_COMMENT, WHITESPACES)
/**
* Parser that is used to parse extender pipelines.
* <pre>
* def parser=PipelineParser.parser
* Pipeline pipeline=parser.parse(pipelineStr)
* </pre>
* Returns an instance of {#link Pipeline} containing the AST representation of the parsed string.
*/
//Create a syntactic pipeline parser that will use the given tokenizer to parse the input into tokens, and will ignore sequences that are matched by the given parser.
static def parser = pipeline.from(tokenizer, ignored.skipMany())
}
Some thoughts:
Whether you use JavaScript or Groovy, it will be run in a context that you provide to the script, so the script should not be able to access anything that you don't want it to (but of course, you should test it extensively to be sure if go this route).
You'd probably be safer by having the filter expression specified as data, rather than as executable code, if possible. Of course, this depends on how complex the filter expressions are. Perhaps you can break up the representation into something like field, comparator, and value, or something similar, that can be treated as data and evaluated in regular way?
If you're worried about what the user can inject via a scripting language, you're probably safer with JavaScript. I don't think that performance should be a problem, but again, I'd suggest extensive testing to be sure.
I would never let users input arbitrary code. It's brittle, insecure and a bad user experience. Not knowing anything about your users, my guess is that you will spend a lot of time answering questions.. If most of your filters are simple, why not create a little filter builder for them instead?
As far as groovy vs JavaScript i think groovy is easier to understand and better for scripting but that's just my opinion.

Categories