Dynamic (tree) structure of functions as process (and implementation in Clojure)

Dynamic (tree) structure of functions as process (and implementation in Clojure) - java

This will all sound probably very strange as I was not able to find an exact term for what I am trying to do.
I am developing an application that, given a set of rules (which are easily translated into functions) and input/output pairs (that are not so easily translated into the code), would allow to construct a tree of the rules to be applied to the given input to reach given output.
It is similar to expert system, except the goal is not to determine "better" (by some quality) rule/function tree (actually mapping input to output is secondary by itself) but to be able to build those trees according to some restrictions or business rules.
I am trying to do this in Clojure, but I will appreciate more general advice as well since I cannot even figure out how to call this sort of thing in short.
Going into details, supposedly I have a large flat map of details.
I have a long list of functions that I transformed to do almost the same: each function takes this large flat map of details, and applies the rule to whatever value(s) this rule concerns. Function has side effects (logs what it does) and, supposedly, single boolean output that is used by the said (to be) tree structure to determine what branch to go into (if tree splits on this function).
The idea is that I can take one function and declare it as a root of the tree. Then take either one another function and declare it to be next function to do, or take two functions and declare them as two next branches from the root depending on the root function output. And so on and so forth.
And I need to be able to do it as many times as I want, producing a tree that fits some requirements.
I will be doing all of the logic, but I need a structure that would allow me to apply the tree of functions (that I can even construct myself as long as I only need to specify it as something as simple as a list) to the given input map, without having to manually code the whole process tree for every tree I will be trying to do.
Real life example would be a large tree-like data structure (input that we can flat down easily) that every client can want to be described (side effect of functions) according to his own set of rules when it is processed (reaches output).
Does this "procedure" have a more common name than this long description?
Are there any functionalities withing Java/Clojure that can be used for it or should I try doing it myself?
For those who know Clojure, I basically need a variation of one of the (->) family that can take a tree of functions, like
(tree->
input-buffer side-effect-buffer output-buffer (f1 (f2 f4 (f5 f7)) (f3 f6)))
Edit below: adding examples:
This is just one part of a more general solution I am looking for:
A (mini)game that is based around alchemy (more generally, a mix of real chemistry and alchemy).
In this case, input is grouped measurable/observable characteristics of a concoction, for example:
(def concoction
{:observable {:color {:r 50 :g 50 :b 50} :opacity 50}
:measurable {:acidity 50 :density 50 :fluidity 50 :composition "TO DO"}
:testable {:duration 10 :known-effects [list of maps] :safety 10 :after "TO DO"}})
Output is a vector of maps each of which is similar to:
{:ingredient "ingredient-a" :amount 20 :application {:type "mix" :order 0}}
The (stand-alone) function in general consist of 3 parts:
Get one (or more) characteristics of the concoction.
Apply some restricted logic to the chosen characteristics (few entries from table of individual effects of ingredient on the resulting concoction, table of application types or huge table of combined effects of two or more ingredients).
Log processed characteristics into shared log/info output.
Append result of application of the logic to the output.
Return boolean (for now, it will be int later) that signals what level of success this step had in terms of producing output.
I changed logic around a bit so now I have only one function that applies a given piece of logic to the input (instead of having almost infinite amount of similar functions) similar to:
(defn apply-logic [input logic-to-apply]
(let [chalist (apply (:input logic-to-apply) input)
out (apply (:logic logic-to-apply) chalist)]
(info-out (apply (:format logic-to-apply) out))
(return-grade out chalist))))
; info-out uses info-output and output variables set with let somewhere outside
Then I would have a tree of logic to apply instead of functions:
(def tree-1112 '(logic1
(logic2
(logic3
(logic4 logic5)))
(logic6
(logic7)
(logic8
(logic9)))))
And a some sort of apply-tree-logic:
(defn apply-tree-logic [some-tree input]
(if (apply-logic input (take-root some-tree))
(apply-tree-logic (take-branch first some-tree) input)
(apply-tree-logic (take-branch last some-tree) input))
Practically if I could do exactly what I brought in these examples it would be pretty close to implementing it all myself.
But then it would take me ages to optimize all of this.

It sounds like what you are trying to do is similar in some respects to Plumbing.
A Graph is just a map from keywords to keyword functions. In this case, stats-graph represents the steps in taking a sequence of numbers (xs) and producing univariate statistics on those numbers (i.e., the mean m and the variance v). The names of arguments to each fnk can refer to other steps that must happen before the step executes. For instance, in the above, to execute :v, you must first execute the :m and :m2 steps (mean and mean-square respectively).

as far as i understood, you want to find all the paths in a graph from input node to output node, where every node of graph is some value, and every connection is the function application, and make a tree of them.
Here is some sketchy (and partial) solution for that:
let's say we want to get a list of arithmetic ops to make one number from another. We have functions description: a collection of pairs predicate to applicable functions. Predicate checks, if corresponding functions are applicable to some input:
(def fns
[[zero? {:add3 #(+ 3 %)}]
[#(== % 1) {:add2 #(+ 2 %) :sub10 #(- 10 %)}]
[even? {:mul3 #(* 3 %) :add2 #(+ 2 %) :add1 inc}]
[#(> % 50) {:sub49 #(- % 49)}]
[(constantly true) {:add1 inc}]])
(defn applicable-fns [fns input]
(some (fn [[pred f-map]]
(when (pred input)
f-map))
fns))
in repl:
(applicable-fns fns 1)
;; {:add2 #function[reactive-clj.core/fn--21334],
:sub10 #function[reactive-clj.core/fn--21336]}
as we can't look through all the numbers, let's just limit our domain to numbers from -100 to 100:
(defn in-domain? [i] (<= -100 i 100))
now to the function: clojure has a nice mechanism to traverse tree like nested structures: zippers
here is an example of a function, that would compute the functions' chain from input to output:
(defn get-path [input output next-fns domain-pred]
(loop [visited #{}
curr (z/zipper identity
#(map (fn [[k v]] [k (v (second %))])
(next-fns (second %)))
(constantly nil)
[nil input])]
(let [curr-out (-> curr z/node second)]
(cond (z/end? curr) nil
(or (visited curr-out) (not (domain-pred curr-out)))
(recur (conj visited curr) (-> curr z/remove z/next))
(= output curr-out) (conj (z/path curr) (z/node curr))
:else (recur (conj visited curr-out)
(z/next curr))))))
it's qoite a simple one (easier to understand, when you'll see input and output):
(get-path 1 21 (partial applicable-fns fns) in-domain?)
;; => [[nil 1] [:add2 3] [:add1 4] [:mul3 12] [:add2 14]
[:add2 16] [:add2 18] [:add2 20] [:add1 21]]
(get-path 51 29 (partial applicable-fns fns) in-domain?)
;; => [[nil 51] [:sub49 2] [:mul3 6] [:mul3 18] [:add2 20]
[:add2 22] [:add2 24] [:add2 26] [:add2 28] [:add1 29]]
so, these pairs is the result of depth-first search for the path. it's not the shortest one, but the first on, that was valid. You can read it as (-> 1 add2 add1 mul3 .... add1) => 21
if you really need all the paths from input to output, you should better read about some fancy algorithms for graphs with cycles (which can be really non a trivial task). But the most interesting question for me, is why do you need all the paths, and do you really need them? What is your final goal? how will you use this functions tree?

Related

What is a distributive function under IDFS and why is pointer analysis non-distributive?

I'm doing an inter-procedrual analysis project in Java at the moment and I'm looking into using an IFDS solver to compute the control flow graph of a program. I'm finding it hard to follow the maths involved in the description of the IFDS framework and graph reachability. I've read in several places that its not possible to compute the points-to sets of a program using this solver as "pointer analysis is known to be a non-distributive problem." [1] Other sources have said that this is often specifically with regard to 'strong updates', which from what I can gather are field write statements.
I think I can basically follow how the solver computes edges and works out the dataflow facts. But I don't quite follow what this: f(A ∪ B) = f(A) ∪ f(B) means in practical terms as a definition of a distributive function, and therefore what it means to say that points-to analysis deals with non-distributive functions.
The linked source [1] gives an example specific to field write statements:
A a = new A();
A b = a;
A c = new C();
b.f = c;
It claims that in order to reason about the assignment to b.f one must also take into account all aliases of the base b. I can follow this. But what I don't understand is what are the properties of this action that make it non-distributive.
A similar (I think) example from [2]:
x = y.n
Where before the statement there are points-to edges y-->obj1 and obj1.n-->obj2 (where obj1 and 2 are heap objects). They claim
it is not possible to correctly deduce that the edge x-->obj2 should be generated after the statement if we consider each input edge independently. The flow function for this statement is a function of the points-to graph as a whole and cannot be decomposed into independent functions of each edge and then merged to get a correct result.
I think I almost understand what, at least the first, example is saying but that I am not getting the concept of distributive functions which is blocking me getting the full picture. Can anyone explain what a distributive or non-distributive function is on a practical basis with regards to pointer analysis, without using set theory which I am having difficulty following?
[1] http://karimali.ca/resources/pubs/conf/ecoop/SpaethNAB16.pdf
[2] http://dl.acm.org/citation.cfm?doid=2487568.2487569 (paywall, sorry)

The distributiveness of a flow function is defined as: f(a Π b) = f(a) Π f(b), with Π being the merge function. In IFDS, Π is defined as the set union ∪.
What this means is that it doesn't matter whether or not you apply the merge function before or after the flow function, you will get the same result in the end.
In a traditional data-flow analysis, you go through the statements of your CFG and propagate sets of data-flow facts. So with a flow function f, for each statement, you compute f(in, stmt) = out, with in and out the sets of information you want to keep (e.g.: for an in-set {(a, allocA), (b, allocA)} -denoting that the allocation site of objects a and b is allocA, and the statement "b.f = new X();" -which we will name allocX, you would likely get the out-set {(a, allocA), (b, allocA), (a.f, allocX), (b.f, allocX)} because a and b are aliased).
IFDS explodes the in-set into its individual data-flow facts. So for each fact, instead of running your flow-function once with your entire in-set, you run it on each element of the in-set: ∀ d ∈ in, f(d, stmt) = out_d. The framework then merges all out_d together into the final out-set.
The issue here is that for each flow function, you don't have access to the entire in-set, meaning that for the example we presented above, running the flow-function f((a, allocA)) on the statement would yield a first out-set {(a, allocA)}, f((b, allocA)) would yield a second out-set {(b, allocA)}, and f(0) would yield a third out-set {(0), (b.f, allocX)}.
So the global out-set after you merge the results would be {(a, allocA), (b, allocA), (b.f, allocX)}. We are missing the fact {(a.f, allocX)} because when running the flow function f(0), we only know that the in-fact is 0 and that the statement is "b.f = new X();". Because we don't know that a and b refer to the allocation site allocA, we don't know that they are aliased, and we therefore cannot know that a.f should also point to allocX after the statement.
IFDS runs on the assumption of distributiveness: merging the out-sets after running the flow-function should yield the same results as merging the in-sets before running the flow-function.
In other words, if you need to combine information from multiple elements on the in-set to create a certain data-flow fact in your out-set, then you are not distributive, and should not express your problem in IFDS (unless you do something to handle those combination cases, like the authors of the paper you refer to as [1] did).

Transformation algorithms for numerical values similar to functionality of Soundex, Metaphone, etc

I'm working on implementing probablistic matching for person record searching. As part of this, I plan to have blocking performed before any scoring is done. Currently, there are a lot of good options for transforming strings so that they can be stored and then searched for, with similar strings matching each other (things like soundex, metaphone, etc).
However, I've struggled to find something similar for purely numeric values. For example, it would be nice to be able to block on a social security number and not have numbers that are off or have transposed digits be removed from the results. 123456789 should have blocking results for 123456780 or 213456789.
Now, there are certainly ways to simply compare two numerical values to determine how similar they are, but what could I do when there are million of numbers in the database? It's obviously impractical to compare them all (and that would certainly invalidate the point of blocking).
What would be nice would be something where those three SSNs above could somehow be transformed into some other value that would be stored. Purely for example, imagine those three numbers ended up as AAABBCCC after this magical transformation. However, something like 987654321 would be ZZZYYYYXX and 123547698 would be AAABCCBC or something like that.
So, my question is, is there a good transformation for numeric values like there exists for alphabetical values? Or, is there some other approach that might make sense (besides some highly complex or low performing SQL or logic)?

The first thing to realize is that social security numbers are basically strings of digits. You really want to treat them like you would strings rather than numbers.
The second thing to realize is that your blocking function maps from a record to a list of strings that identify comparison worthy sets of items.
Here is some Python code to get you started. (I know you asked for Java, but I think the Python is clear and you aren't paying me enough to write it in Java :P ). The basic idea is to take your input record, simulate roughing it up in multiple ways (to get your blocking keys), and then group on by any match on those blocking keys.
import itertools
def transpositions(s):
for pos in range(len(s) - 1):
yield s[:pos] + s[pos + 1] + s[pos] + s[pos + 2:]
def substitutions(s):
for pos in range(len(s)):
yield s[:pos] + '*' + s[pos+1:]
def all_blocks(s):
return itertools.chain([s], transpositions(s), substitutions(s))
def are_blocked_candidates(s1, s2):
return bool(set(all_blocks(s1)) & set(all_blocks(s2)))
assert not are_blocked_candidates('1234', '5555')
assert are_blocked_candidates('1234', '1239')
assert are_blocked_candidates('1234', '2134')
assert not are_blocked_candidates('1234', '1255')

Java - Recover the original order of a list after its elements had been randomized

The Title is self explanatory. This was an interview question. In java, List is an interface. So it should be initialized by some collection.
I feel that this is a tricky question to confuse. Am I correct or not? How to answer this question?

Assuming you don't have a copy of the original List, and the randomizing algorithm is truly random, then no, you cannot restore the original List.
The explanation is far more important on this type of question than the answer. To be able to explain it fully, you need to describe it using the mathematical definitions of Function and Map (not the Java class definitions).
A Function is a Map of Elements in one Domain to another Domain. In our example, the first domain is the "order" in the first list, and the second domain is the "order" in the second list. Any way that can get from the first domain to the second domain, where each element in the first domain only goes to one of the elements in the second domain is a Function.
What they want is to know if there is an Inverse Function, or a corresponding function that can "back map" the elements from the second domain to the elements in the first domain. Some functions (squaring a number, or F(x) = x*x ) cannot be reversed because one element in the second domain might map back to multiple (or none) elements in the first domain. In the squaring a number example
F(x) = x * x
F(3) = 9 or ( 3 -> 9)
F(12) = 144 or ( 12 -> 144)
F(-11) = 121 or (-11 -> 121)
F(-3) = 9 or ( -3 -> 9)
attempting the inverse function, we need a function where
9 maps to 3
144 maps to 12
121 maps to -11
9 maps to -3
Since 9 must map to 3 and -3, and a Map must have only one destination for every origin, constructing an inverse function of x*x is not possible; that's why mathematicians fudge with the square root operator and say (plus or minus).
Going back to our randomized list. If you know that the map is truly random, then you know that the output value is truly independent of the input value. Thus if you attempted to create the inverse function, you would run into the delimma. Knowledge that the function is random tells you that the input cannot be calculated from the output, so even though you "know" the function, you cannot make any assumptions about the input even if you have the output.
Unless, it is pseudo-random (just appears to be random) and you can gather enough information to reverse the now-not-truly random function.

If you have not kept some external order information (this includes things like JVM trickery with ghost copies), and the items are not implicitly ordered, you cannot recover the original ordering.
When information is lost, it is lost. If the structure of the list is the only place recording the order you want, and you disturb that order, it's gone for good.

There's a user's view, and there's internals. There's the question as understood and the question as can be interpreted.
The user's view is that list items are blocks of memory, and that the pointer to the next item is a set of (4?8? they keep changing the numbers:) bytes inside this memory. So when the list is randomized and the pointer to the next item is changed, that area of memory is overriden and can't be recovered.
The question as understood is that you are given a list after it had been randomized.
Internals - I'm not a Java or an OS guy, but you should look into situations where the manner in which the process is executed differs from the naive view: Maybe Java randomizes lists by copying all the cells, so the old list is still kept in memory somewhere? Maybe it keeps backup values of pointers? Maybe the pointers are kept at an external table, separate from the list, and can be reconstructed? Maybe. Internals.
Understanding - Who says you haven't got an access to the list before it was randomized? You could have just printed it out! Or maybe you have a trace of the execution? Or who said you're using Java's built it list? Maybe you are using your own version controlled list? Or maybe you're using your own reversable-randomize method?
Edwin Buck's answer is great but it all depends what the interviewer was looking for.

Dynamically generating high performance functions in clojure

I'm trying to use Clojure to dynamically generate functions that can be applied to large volumes of data - i.e. a requirement is that the functions be compiled to bytecode in order to execute fast, but their specification is not known until run time.
e.g. suppose I specify functions with a simple DSL like:
(def my-spec [:add [:multiply 2 :param0] 3])
I would like to create a function compile-spec such that:
(compile-spec my-spec)
Would return a compiled function of one parameter x that returns 2x+3.
What is the best way to do this in Clojure?

Hamza Yerlikaya has already made the most important point, which is that Clojure code is always compiled. I'm just adding an illustration and some information on some low-hanging fruit for your optimisation efforts.
Firstly, the above point about Clojure's code always being compiled includes closures returned by higher-order functions and functions created by calling eval on fn / fn* forms and indeed anything else that can act as a Clojure function. Thus you don't need a separate DSL to describe functions, just use higher order functions (and possibly macros):
(defn make-affine-function [a b]
(fn [x] (+ (* a x) b)))
((make-affine-function 31 47) 5)
; => 202
Things would be more interesting if your specs were to include information about the types of parameters, as then you could be interested in writing a macro to generate code using those type hints. The simplest example I can think of would be a variant of the above:
(defmacro make-primitive-affine-function [t a b]
(let [cast #(list (symbol (name t)) %)
x (gensym "x")]
`(fn [~x] (+ (* ~(cast a) ~(cast x)) ~(cast b)))))
((make-primitive-affine-function :int 31 47) 5)
; => 202
Use :int, :long, :float or :double (or the non-namespace-qualified symbols of corresponding names) as the first argument to take advantage of unboxed primitive arithmetic appropriate for your argument types. Depending on what your function's doing, this may give you a very significant performance boost.
Other types of hints are normally provided with the #^Foo bar syntax (^Foo bar does the same thing in 1.2); if you want to add them to macro-generated code, investigate the with-meta function (you'll need to merge '{:tag Foo} into the metadata of the symbols representing the formal arguments to your functions or let-introduced locals that you wish to put type hints on).
Oh, and in case you'd still like to know how to implement your original idea...
You can always construct the Clojure expression to define your function -- (list 'fn ['x] (a-magic-function-to-generate-some-code some-args ...)) -- and call eval on the result. That would enable you to do something like the following (it would be simpler to require that the spec includes the parameter list, but here's a version assuming arguments are to be fished out from the spec, are all called paramFOO and are to be lexicographically sorted):
(require '[clojure.walk :as walk])
(defn compile-spec [spec]
(let [params (atom #{})]
(walk/prewalk
(fn [item]
(if (and (symbol? item) (.startsWith (name item) "param"))
(do (swap! params conj item)
item)
item))
spec)
(eval `(fn [~#(sort #params)] ~#spec))))
(def my-spec '[(+ (* 31 param0) 47)])
((compile-spec my-spec) 5)
; => 202
The vast majority of the time, there is no good reason to do things this way and it should be avoided; use higher-order functions and macros instead. However, if you're doing something like, say, evolutionary programming, then it's there, providing the ultimate flexibility -- and the result is still a compiled function.

Even if you don't AOT compile your code, as soon as you define a function it gets compiled to bytecode on the fly.

Does Common Lisp have a something like java's Set Interface/implementing classes?

I need something like this, a collection of elements which contains no duplicates of any element. Does Common Lisp, specifically SBCL, have any thing like this?

For a quick solution, just use hash tables, as has been mentioned before.
However, if you prefer a more principled approach, you can take a look at FSet, which is “a functional set-theoretic collections library”. Among others, it contains classes and operations for sets and bags.
(EDIT:) The cleanest way would probably be to define your set-oriented operations as generic functions. A set of generic functions is basically equivalent to a Java interface, after all. You can simply implement methods on the standard HASH-TABLE class as a first prototype and allow other implementations as well.

Look at cl-containers. There is a set-container class.

You could use lists, though they can prove to be inefficient for representing large sets. This is done using ADJOIN or PUSHNEW to add a new element to a list, and DELETE or REMOVE to do the opposite.
(let ((set (list)))
(pushnew 11 set)
(pushnew 42 set)
(pushnew 11 set)
(print set) ; set={42,11}
(setq set (delete 42 set))
(print set)) ; set={11}
One thing to watch out for is all that these operators use EQL by default to test for potential duplicates in the set (much as Java uses the equals method). That's OK for sets holding numbers or characters, but for sets of other objects, a `deeper' equality test such as EQUAL should be specified as a :TEST keyword parameter, e.g. for a set of strings :-
(let ((set (list)))
(pushnew "foo" set :test #'equal)
(pushnew "bar" set :test #'equal)
(pushnew "foo" set :test #'equal) ; EQUAL decides that "foo"="foo"
(print set)) ; set={"bar","foo"}
Lisp's counterparts to some of Java's Set operations are:
addAll -> UNION or NUNION
containsAll -> SUBSETP
removeAll -> SET-DIFFERENCE or NSET-DIFFERENCE
retainAll -> INTERSECTION or NINTERSECTION

Yes, it has sets. See this section on "Sets" from Practical Common Lisp.
Basically, you can create a set with pushnew and adjoin, query it with member, member-if and member-if-not, and combine it with other sets with functions like intersection, union, set-difference, set-exclusive-or and subsetp.

Easily solvable using a hash table.
(let ((h (make-hash-table :test 'equalp))) ; if you're storing symbols
(loop for i from 0 upto 20
do (setf (gethash i h) (format nil "Value ~A" i)))
(loop for i from 10 upto 30
do (setf (gethash i h) (format nil "~A eulaV" i)))
(loop for k being the hash-keys of h using (hash-value v)
do (format t "~A => ~A~%" k v)))
outputs
0 => Value 0
1 => Value 1
...
9 => Value 9
10 => 10 eulaV
11 => 11 eulaV
...
29 => 29 eulaV
30 => 30 eulaV

Not that I'm aware of, but you can use hash tables for something quite similar.

Lisp hashtables are CLOS based. Specs here.

Personally, I would just implement a function which takes a list and return a unique set. I've drafted something together which works for me:
(defun make-set (list-in &optional (list-out '()))
(if (endp list-in)
(nreverse list-out)
(make-set
(cdr list-in)
(adjoin (car list-in) list-out :test 'equal))))
Basically, the adjoin function prepends an item to a list non-destructively if and only if the item is not already present in the list, accepting an optional test function (one of the Common Lisp "equal" functions). You can also use pushnew to do so destructively, but I find the tail-recursive implementation to be far more elegant. So, Lisp does export several basic functions that allow you to use a list as a set; no built-in datatype is needed because you can just use different functions for prepending things to a list.
My data source for all of this (not the function, but the info) has been a combination of the Common Lisp HyperSpec and Common Lisp the Language (2nd Edition).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.