Dynamically generating high performance functions in clojure - java

I'm trying to use Clojure to dynamically generate functions that can be applied to large volumes of data - i.e. a requirement is that the functions be compiled to bytecode in order to execute fast, but their specification is not known until run time.
e.g. suppose I specify functions with a simple DSL like:
(def my-spec [:add [:multiply 2 :param0] 3])
I would like to create a function compile-spec such that:
(compile-spec my-spec)
Would return a compiled function of one parameter x that returns 2x+3.
What is the best way to do this in Clojure?

Hamza Yerlikaya has already made the most important point, which is that Clojure code is always compiled. I'm just adding an illustration and some information on some low-hanging fruit for your optimisation efforts.
Firstly, the above point about Clojure's code always being compiled includes closures returned by higher-order functions and functions created by calling eval on fn / fn* forms and indeed anything else that can act as a Clojure function. Thus you don't need a separate DSL to describe functions, just use higher order functions (and possibly macros):
(defn make-affine-function [a b]
(fn [x] (+ (* a x) b)))
((make-affine-function 31 47) 5)
; => 202
Things would be more interesting if your specs were to include information about the types of parameters, as then you could be interested in writing a macro to generate code using those type hints. The simplest example I can think of would be a variant of the above:
(defmacro make-primitive-affine-function [t a b]
(let [cast #(list (symbol (name t)) %)
x (gensym "x")]
`(fn [~x] (+ (* ~(cast a) ~(cast x)) ~(cast b)))))
((make-primitive-affine-function :int 31 47) 5)
; => 202
Use :int, :long, :float or :double (or the non-namespace-qualified symbols of corresponding names) as the first argument to take advantage of unboxed primitive arithmetic appropriate for your argument types. Depending on what your function's doing, this may give you a very significant performance boost.
Other types of hints are normally provided with the #^Foo bar syntax (^Foo bar does the same thing in 1.2); if you want to add them to macro-generated code, investigate the with-meta function (you'll need to merge '{:tag Foo} into the metadata of the symbols representing the formal arguments to your functions or let-introduced locals that you wish to put type hints on).
Oh, and in case you'd still like to know how to implement your original idea...
You can always construct the Clojure expression to define your function -- (list 'fn ['x] (a-magic-function-to-generate-some-code some-args ...)) -- and call eval on the result. That would enable you to do something like the following (it would be simpler to require that the spec includes the parameter list, but here's a version assuming arguments are to be fished out from the spec, are all called paramFOO and are to be lexicographically sorted):
(require '[clojure.walk :as walk])
(defn compile-spec [spec]
(let [params (atom #{})]
(walk/prewalk
(fn [item]
(if (and (symbol? item) (.startsWith (name item) "param"))
(do (swap! params conj item)
item)
item))
spec)
(eval `(fn [~#(sort #params)] ~#spec))))
(def my-spec '[(+ (* 31 param0) 47)])
((compile-spec my-spec) 5)
; => 202
The vast majority of the time, there is no good reason to do things this way and it should be avoided; use higher-order functions and macros instead. However, if you're doing something like, say, evolutionary programming, then it's there, providing the ultimate flexibility -- and the result is still a compiled function.

Even if you don't AOT compile your code, as soon as you define a function it gets compiled to bytecode on the fly.

Related

Z3 producing different models when run multiple times

I've been using Z3 with the JAVA bindings for 2 years now.
For some reason, I've always generated the SMTLib2 code myself as a String and then used the parseSMTLib2String to build the corresponding Z3 Expr.
As far as I can remember, every time I entered the exact same input twice with this method, I always got the same model.
But I recently decided to change and to use the JAVA API directly and build the expressions with ctx.mk...(). Basically, I'm not generating the String and then parse it, but I let Z3 do the job of building the Z3 Expr.
What happens now is that I get different models while I've checked that the solver does indeed store the exact same code.
My JAVA code looks something like this:
static final Context context = new Context();
static final Solver solver = context.mkSolver();
public static void someFunction(){
solver.add(context.mk...()); // Add some bool expr to the solver
Status status = solver.check();
if(status == SATISFIABLE){
System.out.println(solver.getModel()); // Prints different model with same expr
}
}
I'm making more than 1 call to "someFunction()" during runtime, and the checked expression context.mk...() changes. But if I run my program twice, the same sequence of expression is checked and sometimes give me different models from one run to another.
I've tried disabling the auto-config parameter and setting my own random seed, but Z3 still produces different models sometimes. I'm only using bounded Integer variables and uninterpreted functions.
Am I using the API in the wrong way?
I could add the whole SMTLib2 code to this question if needed but it isn't really short and contains multiple solver calls (I don't even know which of them will produce a different model from one execution to another, I just know that some do).
I must precise that I've read the following threads but found the answers to be either outdated or (if I understood correctly) in favour of "Z3 is deterministic and should produce the same model for the same input":
Z3 timing variation
Randomness in Z3 Results
different run time for the same code in Z3
Edit:
Surprisingly enough, with the following code I seem to always get the same models and Z3 now seems deterministic. However, the memory consumption is huge compared to my previous code since I need to keep the context in memory for a while. Any idea what I could do to achieve the same behaviour with less memory use ?
public static void someFunction(){
Context context = new Context();
Solver solver = context.mkSolver();
solver.add(context.mk...()); // Add some bool expr to the solver
Status status = solver.check();
if(status == SATISFIABLE){
System.out.println(solver.getModel()); // Seem to always print the same model :-)
}
}
Here is the memory consumption I get from calling the method "someFunction" multiple times:
As long as it doesn't toggle between SAT and UNSAT on the same problem, it's not a bug.
One of the answers you linked explains what's happening:
Randomness in Z3 Results
"That being said, if we solve the same problem twice in the same execution path, then Z3 can produce different models. Z3 assigns internal unique IDs to expressions. The internal IDs are used to break ties in some heuristics used by Z3. Note that the loop in your program is creating/deleting expressions. So, in each iteration, the expressions representing your constraints may have different internal IDs, and consequently the solver may produce different solutions."
Perhaps when it's parsing it's assigning the same ids, whereas with the API it may differ, although I'd find that a bit hard to believe...
If you need this behavior and you're sure it was doing this from the SMT encoding, you could always print the expressions from the API then parse them.
I think I spotted the specific parts of code producing these strange opposite behavious.
Maybe the Z3 experts around can tell me if I'm completely wrong.
First of all, if I try the same code (no matter if it's manually generated code or code generated with the API) twice in a single run of my program, I sometimes end up with different models. That is something I didn't notice before, and this actually isn't a real problem for me.
My main concern however is what happens if I run my program twice, checking the exact same code during the two runs.
When I'm generating the code manually, I end up with functions definitions like this:
(declare-fun var!0 () Int)
(declare-fun var!2 () Int)
(declare-fun var!42 () Int)
(assert (and
(or (= var!0 0) (= var!0 1))
(or (= var!2 0) (= var!2 1))
(or (= var!42 0) (= var!42 1))
))
(define-fun fun ((i! Int)) Int
(ite (= i! 0) var!0
(ite (= i! 1) var!2
(ite (= i! 2) var!42 -1)
)
)
)
As far as I can tell (and for what I've read about it (see here)), the API doesn't handle the way I defined the "fun" function.
So what I did to define it with the API was something like that:
(declare-fun var!0 () Int)
(declare-fun var!2 () Int)
(declare-fun var!42 () Int)
(assert (and
(or (= var!0 0) (= var!0 1))
(or (= var!2 0) (= var!2 1))
(or (= var!42 0) (= var!42 1))
))
(declare-fun fun (Int) Int)
(assert (forall
((i! Int))
(ite (= i! 0) (= (fun i!) var!0)
(ite (= i! 1) (= (fun i!) var!2)
(ite (= i! 2) (= (fun i!) var!42) (= (fun i!) -1))
)
)
))
It seems that with the first method, checking the same code for different runs always (or at least so often that it isn't a real problem for me) gives the same models.
With the second method, checking the same code for different runs very often gives different models.
Can anybody tell me if there is indeed some logic behind what I've exposed regarding how Z3 actually works ?
Since I need my results to be as reproducible as possible, I went back to the manual code generation and it seems to work perfectly fine. I would love to see a function in the API allowing us to define functions directly, and not having to use the "forall" method, and see if what I just described is true or not.

What is a distributive function under IDFS and why is pointer analysis non-distributive?

I'm doing an inter-procedrual analysis project in Java at the moment and I'm looking into using an IFDS solver to compute the control flow graph of a program. I'm finding it hard to follow the maths involved in the description of the IFDS framework and graph reachability. I've read in several places that its not possible to compute the points-to sets of a program using this solver as "pointer analysis is known to be a non-distributive problem." [1] Other sources have said that this is often specifically with regard to 'strong updates', which from what I can gather are field write statements.
I think I can basically follow how the solver computes edges and works out the dataflow facts. But I don't quite follow what this: f(A ∪ B) = f(A) ∪ f(B) means in practical terms as a definition of a distributive function, and therefore what it means to say that points-to analysis deals with non-distributive functions.
The linked source [1] gives an example specific to field write statements:
A a = new A();
A b = a;
A c = new C();
b.f = c;
It claims that in order to reason about the assignment to b.f one must also take into account all aliases of the base b. I can follow this. But what I don't understand is what are the properties of this action that make it non-distributive.
A similar (I think) example from [2]:
x = y.n
Where before the statement there are points-to edges y-->obj1 and obj1.n-->obj2 (where obj1 and 2 are heap objects). They claim
it is not possible to correctly deduce that the edge x-->obj2 should be generated after the statement if we consider each input edge independently. The flow function for this statement is a function of the points-to graph as a whole and cannot be decomposed into independent functions of each edge and then merged to get a correct result.
I think I almost understand what, at least the first, example is saying but that I am not getting the concept of distributive functions which is blocking me getting the full picture. Can anyone explain what a distributive or non-distributive function is on a practical basis with regards to pointer analysis, without using set theory which I am having difficulty following?
[1] http://karimali.ca/resources/pubs/conf/ecoop/SpaethNAB16.pdf
[2] http://dl.acm.org/citation.cfm?doid=2487568.2487569 (paywall, sorry)
The distributiveness of a flow function is defined as: f(a Π b) = f(a) Π f(b), with Π being the merge function. In IFDS, Π is defined as the set union ∪.
What this means is that it doesn't matter whether or not you apply the merge function before or after the flow function, you will get the same result in the end.
In a traditional data-flow analysis, you go through the statements of your CFG and propagate sets of data-flow facts. So with a flow function f, for each statement, you compute f(in, stmt) = out, with in and out the sets of information you want to keep (e.g.: for an in-set {(a, allocA), (b, allocA)} -denoting that the allocation site of objects a and b is allocA, and the statement "b.f = new X();" -which we will name allocX, you would likely get the out-set {(a, allocA), (b, allocA), (a.f, allocX), (b.f, allocX)} because a and b are aliased).
IFDS explodes the in-set into its individual data-flow facts. So for each fact, instead of running your flow-function once with your entire in-set, you run it on each element of the in-set: ∀ d ∈ in, f(d, stmt) = out_d. The framework then merges all out_d together into the final out-set.
The issue here is that for each flow function, you don't have access to the entire in-set, meaning that for the example we presented above, running the flow-function f((a, allocA)) on the statement would yield a first out-set {(a, allocA)}, f((b, allocA)) would yield a second out-set {(b, allocA)}, and f(0) would yield a third out-set {(0), (b.f, allocX)}.
So the global out-set after you merge the results would be {(a, allocA), (b, allocA), (b.f, allocX)}. We are missing the fact {(a.f, allocX)} because when running the flow function f(0), we only know that the in-fact is 0 and that the statement is "b.f = new X();". Because we don't know that a and b refer to the allocation site allocA, we don't know that they are aliased, and we therefore cannot know that a.f should also point to allocX after the statement.
IFDS runs on the assumption of distributiveness: merging the out-sets after running the flow-function should yield the same results as merging the in-sets before running the flow-function.
In other words, if you need to combine information from multiple elements on the in-set to create a certain data-flow fact in your out-set, then you are not distributive, and should not express your problem in IFDS (unless you do something to handle those combination cases, like the authors of the paper you refer to as [1] did).

Dynamic (tree) structure of functions as process (and implementation in Clojure)

This will all sound probably very strange as I was not able to find an exact term for what I am trying to do.
I am developing an application that, given a set of rules (which are easily translated into functions) and input/output pairs (that are not so easily translated into the code), would allow to construct a tree of the rules to be applied to the given input to reach given output.
It is similar to expert system, except the goal is not to determine "better" (by some quality) rule/function tree (actually mapping input to output is secondary by itself) but to be able to build those trees according to some restrictions or business rules.
I am trying to do this in Clojure, but I will appreciate more general advice as well since I cannot even figure out how to call this sort of thing in short.
Going into details, supposedly I have a large flat map of details.
I have a long list of functions that I transformed to do almost the same: each function takes this large flat map of details, and applies the rule to whatever value(s) this rule concerns. Function has side effects (logs what it does) and, supposedly, single boolean output that is used by the said (to be) tree structure to determine what branch to go into (if tree splits on this function).
The idea is that I can take one function and declare it as a root of the tree. Then take either one another function and declare it to be next function to do, or take two functions and declare them as two next branches from the root depending on the root function output. And so on and so forth.
And I need to be able to do it as many times as I want, producing a tree that fits some requirements.
I will be doing all of the logic, but I need a structure that would allow me to apply the tree of functions (that I can even construct myself as long as I only need to specify it as something as simple as a list) to the given input map, without having to manually code the whole process tree for every tree I will be trying to do.
Real life example would be a large tree-like data structure (input that we can flat down easily) that every client can want to be described (side effect of functions) according to his own set of rules when it is processed (reaches output).
Does this "procedure" have a more common name than this long description?
Are there any functionalities withing Java/Clojure that can be used for it or should I try doing it myself?
For those who know Clojure, I basically need a variation of one of the (->) family that can take a tree of functions, like
(tree->
input-buffer side-effect-buffer output-buffer (f1 (f2 f4 (f5 f7)) (f3 f6)))
Edit below: adding examples:
This is just one part of a more general solution I am looking for:
A (mini)game that is based around alchemy (more generally, a mix of real chemistry and alchemy).
In this case, input is grouped measurable/observable characteristics of a concoction, for example:
(def concoction
{:observable {:color {:r 50 :g 50 :b 50} :opacity 50}
:measurable {:acidity 50 :density 50 :fluidity 50 :composition "TO DO"}
:testable {:duration 10 :known-effects [list of maps] :safety 10 :after "TO DO"}})
Output is a vector of maps each of which is similar to:
{:ingredient "ingredient-a" :amount 20 :application {:type "mix" :order 0}}
The (stand-alone) function in general consist of 3 parts:
Get one (or more) characteristics of the concoction.
Apply some restricted logic to the chosen characteristics (few entries from table of individual effects of ingredient on the resulting concoction, table of application types or huge table of combined effects of two or more ingredients).
Log processed characteristics into shared log/info output.
Append result of application of the logic to the output.
Return boolean (for now, it will be int later) that signals what level of success this step had in terms of producing output.
I changed logic around a bit so now I have only one function that applies a given piece of logic to the input (instead of having almost infinite amount of similar functions) similar to:
(defn apply-logic [input logic-to-apply]
(let [chalist (apply (:input logic-to-apply) input)
out (apply (:logic logic-to-apply) chalist)]
(info-out (apply (:format logic-to-apply) out))
(return-grade out chalist))))
; info-out uses info-output and output variables set with let somewhere outside
Then I would have a tree of logic to apply instead of functions:
(def tree-1112 '(logic1
(logic2
(logic3
(logic4 logic5)))
(logic6
(logic7)
(logic8
(logic9)))))
And a some sort of apply-tree-logic:
(defn apply-tree-logic [some-tree input]
(if (apply-logic input (take-root some-tree))
(apply-tree-logic (take-branch first some-tree) input)
(apply-tree-logic (take-branch last some-tree) input))
Practically if I could do exactly what I brought in these examples it would be pretty close to implementing it all myself.
But then it would take me ages to optimize all of this.
It sounds like what you are trying to do is similar in some respects to Plumbing.
A Graph is just a map from keywords to keyword functions. In this case, stats-graph represents the steps in taking a sequence of numbers (xs) and producing univariate statistics on those numbers (i.e., the mean m and the variance v). The names of arguments to each fnk can refer to other steps that must happen before the step executes. For instance, in the above, to execute :v, you must first execute the :m and :m2 steps (mean and mean-square respectively).
as far as i understood, you want to find all the paths in a graph from input node to output node, where every node of graph is some value, and every connection is the function application, and make a tree of them.
Here is some sketchy (and partial) solution for that:
let's say we want to get a list of arithmetic ops to make one number from another. We have functions description: a collection of pairs predicate to applicable functions. Predicate checks, if corresponding functions are applicable to some input:
(def fns
[[zero? {:add3 #(+ 3 %)}]
[#(== % 1) {:add2 #(+ 2 %) :sub10 #(- 10 %)}]
[even? {:mul3 #(* 3 %) :add2 #(+ 2 %) :add1 inc}]
[#(> % 50) {:sub49 #(- % 49)}]
[(constantly true) {:add1 inc}]])
(defn applicable-fns [fns input]
(some (fn [[pred f-map]]
(when (pred input)
f-map))
fns))
in repl:
(applicable-fns fns 1)
;; {:add2 #function[reactive-clj.core/fn--21334],
:sub10 #function[reactive-clj.core/fn--21336]}
as we can't look through all the numbers, let's just limit our domain to numbers from -100 to 100:
(defn in-domain? [i] (<= -100 i 100))
now to the function: clojure has a nice mechanism to traverse tree like nested structures: zippers
here is an example of a function, that would compute the functions' chain from input to output:
(defn get-path [input output next-fns domain-pred]
(loop [visited #{}
curr (z/zipper identity
#(map (fn [[k v]] [k (v (second %))])
(next-fns (second %)))
(constantly nil)
[nil input])]
(let [curr-out (-> curr z/node second)]
(cond (z/end? curr) nil
(or (visited curr-out) (not (domain-pred curr-out)))
(recur (conj visited curr) (-> curr z/remove z/next))
(= output curr-out) (conj (z/path curr) (z/node curr))
:else (recur (conj visited curr-out)
(z/next curr))))))
it's qoite a simple one (easier to understand, when you'll see input and output):
(get-path 1 21 (partial applicable-fns fns) in-domain?)
;; => [[nil 1] [:add2 3] [:add1 4] [:mul3 12] [:add2 14]
[:add2 16] [:add2 18] [:add2 20] [:add1 21]]
(get-path 51 29 (partial applicable-fns fns) in-domain?)
;; => [[nil 51] [:sub49 2] [:mul3 6] [:mul3 18] [:add2 20]
[:add2 22] [:add2 24] [:add2 26] [:add2 28] [:add1 29]]
so, these pairs is the result of depth-first search for the path. it's not the shortest one, but the first on, that was valid. You can read it as (-> 1 add2 add1 mul3 .... add1) => 21
if you really need all the paths from input to output, you should better read about some fancy algorithms for graphs with cycles (which can be really non a trivial task). But the most interesting question for me, is why do you need all the paths, and do you really need them? What is your final goal? how will you use this functions tree?

Converting Java collections to Clojure data structures

I am creating a Clojure interface to a Java API with a method that returns a java.util.LinkedHashSet.
Firstly, is the idiomatic Clojure way of handling this to convert the LinkedHashSet to a clojure data structure?
Secondly, what is the best method for converting Java collections into Clojure data structures?
There are lots of options, since Clojure plays very nicely with Java collections. It depends on exactly what data structure you want to use in Clojure.
Here's some examples:
;; create a HashSet
(def a (java.util.HashSet.))
(dotimes [i 10] (.add a i))
;; Get all the values as a sequence
(seq a)
=> (0 1 2 3 4 5 6 7 8 9)
;; build a new HashSet containing the values from a
(into #{} a)
#{0 1 2 3 4 5 6 7 8 9}
;; Just use the HashSet directly (high performance, no copy required)
(.contains a 1)
=> true
(.contains a 100)
=> false
Regarding when to use each of these, I'd suggest the following advice:
If you are trying to wrap a Java library and present a clean Clojure API, then I'd suggest converting to the equivalent Clojure data structures. This is what Clojure users will expect, and you can hide the potentially messy Java interop details. As a bonus, this will make things immutable so that you don't run the risk of Java collections mutating while you use them.
If you just want to use the Java API quickly and efficiently, just use Java interop directly on the Java collections.
The idiomatic way to convert java collections to clojure is to use the (seq) function, which is already called by most functions operating on sequences.
(def s (java.util.LinkedHashSet.))
#'user/s
user> (seq s)
nil
user> (.add s "foo")
true
user> (seq s)
("foo")
user>
I honestly don't know if there's a universally accepted practice, but here's Chris Houser arguing against Java to Clojure adapters as you break compatibility with the original Java API.
To perform the translation you asked for, simply use into:
user=> (import java.util.LinkedHashSet)
java.util.LinkedHashSet
user=> (def x (LinkedHashSet.))
#'user/x
user=> (.add x "test")
true
user=> (def y (into #{} x))
#'user/y
user=> y
#{"test"}

Does Common Lisp have a something like java's Set Interface/implementing classes?

I need something like this, a collection of elements which contains no duplicates of any element. Does Common Lisp, specifically SBCL, have any thing like this?
For a quick solution, just use hash tables, as has been mentioned before.
However, if you prefer a more principled approach, you can take a look at FSet, which is “a functional set-theoretic collections library”. Among others, it contains classes and operations for sets and bags.
(EDIT:) The cleanest way would probably be to define your set-oriented operations as generic functions. A set of generic functions is basically equivalent to a Java interface, after all. You can simply implement methods on the standard HASH-TABLE class as a first prototype and allow other implementations as well.
Look at cl-containers. There is a set-container class.
You could use lists, though they can prove to be inefficient for representing large sets. This is done using ADJOIN or PUSHNEW to add a new element to a list, and DELETE or REMOVE to do the opposite.
(let ((set (list)))
(pushnew 11 set)
(pushnew 42 set)
(pushnew 11 set)
(print set) ; set={42,11}
(setq set (delete 42 set))
(print set)) ; set={11}
One thing to watch out for is all that these operators use EQL by default to test for potential duplicates in the set (much as Java uses the equals method). That's OK for sets holding numbers or characters, but for sets of other objects, a `deeper' equality test such as EQUAL should be specified as a :TEST keyword parameter, e.g. for a set of strings :-
(let ((set (list)))
(pushnew "foo" set :test #'equal)
(pushnew "bar" set :test #'equal)
(pushnew "foo" set :test #'equal) ; EQUAL decides that "foo"="foo"
(print set)) ; set={"bar","foo"}
Lisp's counterparts to some of Java's Set operations are:
addAll -> UNION or NUNION
containsAll -> SUBSETP
removeAll -> SET-DIFFERENCE or NSET-DIFFERENCE
retainAll -> INTERSECTION or NINTERSECTION
Yes, it has sets. See this section on "Sets" from Practical Common Lisp.
Basically, you can create a set with pushnew and adjoin, query it with member, member-if and member-if-not, and combine it with other sets with functions like intersection, union, set-difference, set-exclusive-or and subsetp.
Easily solvable using a hash table.
(let ((h (make-hash-table :test 'equalp))) ; if you're storing symbols
(loop for i from 0 upto 20
do (setf (gethash i h) (format nil "Value ~A" i)))
(loop for i from 10 upto 30
do (setf (gethash i h) (format nil "~A eulaV" i)))
(loop for k being the hash-keys of h using (hash-value v)
do (format t "~A => ~A~%" k v)))
outputs
0 => Value 0
1 => Value 1
...
9 => Value 9
10 => 10 eulaV
11 => 11 eulaV
...
29 => 29 eulaV
30 => 30 eulaV
Not that I'm aware of, but you can use hash tables for something quite similar.
Lisp hashtables are CLOS based. Specs here.
Personally, I would just implement a function which takes a list and return a unique set. I've drafted something together which works for me:
(defun make-set (list-in &optional (list-out '()))
(if (endp list-in)
(nreverse list-out)
(make-set
(cdr list-in)
(adjoin (car list-in) list-out :test 'equal))))
Basically, the adjoin function prepends an item to a list non-destructively if and only if the item is not already present in the list, accepting an optional test function (one of the Common Lisp "equal" functions). You can also use pushnew to do so destructively, but I find the tail-recursive implementation to be far more elegant. So, Lisp does export several basic functions that allow you to use a list as a set; no built-in datatype is needed because you can just use different functions for prepending things to a list.
My data source for all of this (not the function, but the info) has been a combination of the Common Lisp HyperSpec and Common Lisp the Language (2nd Edition).

Categories