I am going to call the .toUpperCase method within a doto macro like follows but the doto returns the small letters:
(doto (java.lang.String. "clojure")
(.toUpperCase))
returns "clojure". I do the macroexpansion and as the return value is the object created:
(clojure.core/let [G__7359 (java.lang.String. "cojure")] (.toUpperCase G__7359) G__7359)
, but why i don't get the uppercased answer?
doto is part of Clojure's Java interop features. It is designed to make it possible to write java with out soooo darn many parens. so
Foo foo = new Foo;
foo.setX().setY().makeFactory().applyPhaseOfMoon();
which has 8 parens becomes:
(doto foo .setY .makeFactory .applyPhaseOfMoon)
which has a total of two.
In this case if we dig into the expanstion of your example:
user> (doto "hi" .toUpperCase)
"hi"
expands to:
user> (macroexpand-1 '(doto "hi" .toUpperCase))
(clojure.core/let [G__110453 "hi"]
(.toUpperCase G__110453)
G__110453)
where the second line does this:
user> (.toUpperCase "hi")
"HI"
and then throws the answer away and returns the saved value form the start. I personally never see doto used in practice outside of places where people are translating java to clojure inorder to call some API.
From the documentation:
Evaluates x then calls all of the methods and functions with the value
of x supplied at the front of the given arguments. The forms are
evaluated in order. Returns x.
doto returns the original argument, not any the result of functions performed using it. I believe the doto function is generally intended for side-effects. This is why you get the original string back.
What you are looking for is the .. threading macro ( https://clojuredocs.org/clojure.core/_.. ):
Expands into a member access (.) of the first member on the first
argument, followed by the next member on the result, etc. For
instance:
(.. System (getProperties) (get "os.name"))
expands to:
(. (. System (getProperties)) (get "os.name"))
but is easier to write, read, and understand.
In your case:
(doto "clojure" .toUpperCase)
; => "clojure"
(.. "clojure" toUpperCase)
; => "CLOJURE"
Related
Is there any built in support for array in XQuery? For example, if we want to implement
the simple java program in xquery how we would do it:
(I am not asking to translate the entire program into xquery, but just asking
how to implement the array in line number 2 of the below code to xquery? I am
using marklogic / xdmp functions also).
java.lang.String test = new String("Hello XQuery");
char[] characters = test.toCharArray();
for(int i = 0; i<characters.length; i++) {
if(character[i] == (char)13) {
character[i] = (char) 0x00;
}
}
Legend:
hex 0x00 dec 0 : null
hex 0x0d dec 13: carriage return
hex 0x0a dec 10: line feed
hex 0x20 dec 22: dquote
The problem with converting your sample code to XQuery is not the absence of support for arrays, but the fact that x00 is not a valid character in XML. If it weren't for this problem, you could express your query with the simple function call:
translate($input, '', '')
Now, you could argue that's cheating, it just happens so that there's a function that does exactly what you are trying to do by hand. But if this function didn't exist, you could program it in XQuery: there are sufficient primitives available for strings to allow you to manipulate them any way you want. If you need to (and it's rarely necessary) you can convert a string to a sequence of integers using the function string-to-codepoints(), and then take advantage of all the XQuery facilities for manipulating sequences.
The lesson is, when you use a declarative language like XQuery or XSLT, don't try to use the same low-level programming techniques you were forced to use in more primitive languages. There's usually a much more direct way of expressing the problem.
XQuery has built-in support for sequences. The function tokenize() (as suggested by #harish.ray) returns a sequence. You can also construct one yourself using braces and commas:
let $mysequence = (1, 2, 3, 4)
Sequences are ordered lists, so you can rely on that. That is slightly different from a node-set returned from an XPath, those usually are document-ordered.
On a side mark: actually, everything in XQuery is either a node-set or a sequence. Even if a function is declared to return one string or int, you can treat that returned value as if it is a sequence of one item. No explicit casting is necessary, for which there are no constructs in XQuery anyhow. Functions like fn:exists() and fn:empty() always work.
HTH!
Just for fun, here's how I would do this in XQuery if fn:translate did not exist. I think Michael Kay's suggestion would end up looking similar.
let $test := "Hello XQuery"
return codepoints-to-string(
for $c in string-to-codepoints($test)
return if ($c eq 32) then 44 else $c)
Note that I changed the transformation because of the problem he pointed: 0 is not a legal codepoint. So instead I translated spaces to commas.
With MarkLogic, another option is to use http://docs.marklogic.com/json:array and its associated functions. The json:set-item-at function would allow coding in a vaguely imperative style. Coding both variations might be a good learning exercise.
There are two ways to do this.
Firstly you can create an XmlResults object using
XmlManager.createResults(), and use XmlResults.add() to add your
strings to this. You can then use the XmlResults object to set the
value of a variable in XmlQueryContext, which can be used in your
query.
Example:
XmlResults values = XMLManager.createResults();
values.add(new XmlValue("value1"));
values.add(new XmlValue("value2"));
XmlQueryContext.setVariableValue("files", values);
The alternative is to split the string in XQuery. You
can do this using the tokenize() function, which works using a
regular expression to match the string separator.
http://www.w3.org/TR/xpath-functions/#func-tokenize
Thanks.
A little outlook: XQuery 3.1 will provide native support for arrays. See http://www.w3.org/TR/xquery-31/ for more details.
You can construct an array like this:
$myArray = tokenize('a b c d e f g', '\s')
// $myArray[3] -> c
Please note that the first index of this pseudo-array is 1 not 0!
Since the question "How to use or implement arrays in XQuery?" is being held generic (and thus shows up in search results on this topic), I would like to add a generic answer for future reference (making it a Community Wiki, so others may expand):
As Christian Grün has already hinted at, with XQuery 3.1 XQuery got a native array datatype, which is a subtype of the function datatype.
Since an array is a 'ordered list of values' and an XPath/XQuery sequence is as well, the first question, which may arise, is: "What's the difference?" The answer is simple: a sequence can not contain another sequence. All sequences are automatically flattened. Not so an array, which can be an array of arrays. Just like sequences, arrays in XQuery can also have any mix of any other datatype.
The native XQuery array datatype can be expressed in either of two ways: As [] or via array {}. The difference being, that, when using the former constructor, a comma is being considered a 'hard' comma, meaning that the following array consists of two members:
[ ("apples", "oranges"), "plums" ]
while the following will consist of three members:
array { ("apples", "oranges"), "plums" }
which means, that the array expression within curly braces is resolved to a flat sequence first, and then memberized into an array.
Since Array is a subtype of function, an array can be thought of as an anonymous function, that takes a single parameter, the numeric index. To get the third member of an array, named $foo, we thus can write:
$foo(3)
If an array contains another array as a member you can chain the function calls together, as in:
$foo(3)(5)
Along with the array datatype, special operators have been added, which make it easy to look up the values of an array. One such operator (also used by the new Map datatype) is the question mark followed by an integer (or an expression that evaluates to zero or more integers).
$foo?(3)
would, again, return the third member within the array, while
$foo?(3, 6)
would return the members 3 and 6.
The parenthesis can be left out, when working with literal integers. However, the parens are needed, to form the lookup index from a dynamic expression, like in:
$foo?(3 to 6)
here, the expression in the parens gets evaluated to a sequence of integers and thus the expression would return a sequence of all members from index position 3 to index position 6.
The asterisk * is used as wildcard operator. The expression
$foo?*
will return a sequence of all items in the array. Again, chaining is possible:
$foo?3?5
matches the previos example of $foo(3)(5).
More in-depth information can be found in the official spec: XML Path Language (XPath) 3.1 / 3.11.2 Arrays
Also, a new set of functions, specific to arrays, has been implemented. These functions resinde in the namespace http://www.w3.org/2005/xpath-functions/array, which, conventionally, is being prefixed with array and can be found referenced in here: XPath and XQuery Functions and Operators 3.1 / 17.3 Functions that Operate on Arrays
I have a file that contains 10 lines - I want to retrieve it, and then split them with a newline("\n") delimiter.
here's what I did
val data = io.Source.fromFile("file.txt").toString;
But this causes an error when I try to split the file on newlines.
I then tried
val data = io.Source.fromFile("file.txt").mkString;
And it worked.
What the heck? Can someone tell me what the difference between the two methods are?
Let's look at the types, shall we?
scala> import scala.io._
import scala.io._
scala> val foo = Source.fromFile("foo.txt")
foo: scala.io.BufferedSource = non-empty iterator
scala>
Now, the variable that you have read the file foo.txt into is an iterator. If you perform toString() invocation on it, it doesn't return the contents of the file, rather the String representation of the iterator you've created. OTOH, mkString() reads the iterator(that is, iterates over it) and constructs a long String based on the values read from it.
For more info, look at this console session:
scala> foo.toString
res4: java.lang.String = non-empty iterator
scala> res4.foreach(print)
non-empty iterator
scala> foo.mkString
res6: String =
"foo
bar
baz
quux
dooo
"
scala>
The toString method is supposed to return the string representation of an object. It is often overridden to provide a meaningful representation. The mkString method is defined on collections and is a method which joins the elements of the collection with the provided string. For instance, try something like:
val a = List("a", "b", "c")
println(a.mkString(" : "))
and you will get "a : b : c" as the output. The mkString method has created a string from your collection by joining the elements of the collection with the string you provided. In the particular case you posted, the mkString call joined the elements returned by the BufferedSource iterator with the empty string (this is because you called mkString with no arguments). This results in simply concatenating all of the strings (yielded by the BufferedSource iterator) in the collection together.
On the other hand, calling toString here doesn't really make sense, as what you are getting (when you don't get an error) is the string representation of the BufferedSource iterator; which just tells you that the iterator is non-empty.
They're different methods in different classes. In this case, mkString is a method in the trait GenTraversableOnce. toString is defined on Any (and is very often overridden).
The easiest way (or at least the way I usually use) to find this out is to use the documentation at http://www.scala-lang.org/api/current/index.html. Start with the type of your variable:
val data = io.Source.fromFile("file.txt")
is of type
scala.io.BufferedSource
Go to the doc for BufferedSource, and look for mkString. In the doc for mkString (hit the down arrow over to the left) you'll see that it comes from
Definition Classes TraversableOnce → GenTraversableOnce
And do the same thing with toString.
I think the problem is to understand what Source class is doing. It seems from your code that you expect that Source.fromFile retrieves the content of a file when really what it does is to point to the start of a file.
This is typical when working with I/O operations where you have to open a "connection" with a resource (on this case a connection with your filesystem), read/write several times and then close that "connection". In your example you open a connection to a file and you have to read line per line the contents of the file until you reach the end. Think that when you read you are loading information in memory so it's not a good idea to load the whole file in memory in most of the scenarios (which mkString is going to do).
In the other hand mkString is made to iterate over all the elements of a collection, so in this case what is does is to read the file and load an Array[String] in memory. Be careful because if the file is big your code will fail, normally when working with I/O you should use a buffer to read some content, then process/save that content and then load more content (in the same buffer), avoiding problems with memory. For example reading 5 lines --> parse --> save parsed lines --> read next 5 lines --> etc.
You can also understand that "toString" retrieves you nothing... just tells you "you can read lines, the file is not empty".
I'm developing a software to generate a Turing Machine from a regular expression.
[ EDIT: To clarify, the OP wants to take a regular expression as input, and programmatically generate a Turing Machine to perform the same task. OP is seeking to perform the task of creating a TM from a regular expression, not using a regular expression. ]
First I'll explain a bit what I have done and then what is my specific problem:
I've modeled the regular expression as follows:
RegularExpression (interface): the classes below implements this interface
Simple (ie: "aaa","bbb","abcde"): this is a leaf class it does not have any subexpressions
ComplexWithoutOr (ie: "a(ab)*","(a(ab)c(b))*"): this class contains a list of RegularExpression.
ComplexWithOr (ie: "a(a|b)","(a((ab)|c(b))"): this class contains an Or operation, which contains a list of RegularExpression. It represents the "a|b" part of the first example and the "(ab)|c(b)" of the second one.
Variable (ie: "awcw", where w E {a,b}*): this is not yet implemented, but the idea is to model it as a leaf class with some different logic from Simple. It represents the "w" part of the examples.
It is important that you understand and agree with the model above. If you have questions make a comment, before continue reading...
When it comes to MT generation, I have different levels of complexity:
Simple: this type of expression is already working. Generates a new state for each letter and moves right. If in any state, the letter read is not the expected, it starts a "rollback circuit" that finishes with the MT head in the initial position and stops in a not final state.
ComplexWithoutOr: here it comes my problem. Here, the algorithm generates an MT for each subexpression and concat them. This work for some simple cases, but I have problems with the rollback mechanism.
Here is an example that does not work with my algorithm:
"(ab)abac": this is a ComplexWithoutOr expression that contains a ComplexWithOr expression "(ab)" (that has a Simple expression inside "ab") and a Simple expression "abac"
My algorithm generates first an MT1 for "ab". This MT1 is used by the MT2 for "(ab)*", so if MT1 succeed it enters again in MT1, otherwise MT1 rollbacks and MT2 finishes right. In other words, MT2 cannot fail.
Then, it generates an MT3 for "abac". The output of MT2 it is the input of MT3. The output of MT3 is the result of the execution
Now, let suppose this input string: "abac". As you can see it matches with the regular expression. So let see what happens when the MT is executed.
MT1 is executed right the first time "ab". MT1 fails the second time "ac" and rollback, putting the MT head in the 3rd position "a". MT2 finishes right and input is forwarded to MT3. MT3 fails, because "ac"!="abac". So MT does not recognize "abac".
Do you understand the problem? Do you know any solution for this?
I'm using Java to develop it, but the language it is not important, I'd like to discuss the algorithm.
It is not entirely clear to me what exactly you are trying to implement. It looks like you want to make a Turing Machine (or any FSM in general) that accepts only those strings that are also accepted by the regular expression. In effect, you want to convert a regular expression to a FSM.
Actually that is exactly what a real regex matcher does under the hood. I think this series of articles by Russ Cox covers a lot of what you want to do.
Michael Sipser, in Introduction to the Theory of Computation, proves in chapter 1 that regular expressions are equivalent to finite automata in their descriptive power. Part of the proof involves constructing a nondeterministic finite automaton (NDFA) that recognizes the language described by a specific regular expression. I'm not about to copy half that chapter, which would be quite hard due to the notation used, so I suggest you borrow or purchase the book (or perhaps a Google search using these terms will turn up a similar proof) and use that proof as the basis for your algorithm.
As Turing machines can simulate an NDFA, I assume an algorithm to produce an NDFA is good enough.
in the chomsky hierarchy a regex is Level3, whereas a TM is Level1. this means, that a TM can produce any regex, but not vice versa.
I'm trying to use Clojure to dynamically generate functions that can be applied to large volumes of data - i.e. a requirement is that the functions be compiled to bytecode in order to execute fast, but their specification is not known until run time.
e.g. suppose I specify functions with a simple DSL like:
(def my-spec [:add [:multiply 2 :param0] 3])
I would like to create a function compile-spec such that:
(compile-spec my-spec)
Would return a compiled function of one parameter x that returns 2x+3.
What is the best way to do this in Clojure?
Hamza Yerlikaya has already made the most important point, which is that Clojure code is always compiled. I'm just adding an illustration and some information on some low-hanging fruit for your optimisation efforts.
Firstly, the above point about Clojure's code always being compiled includes closures returned by higher-order functions and functions created by calling eval on fn / fn* forms and indeed anything else that can act as a Clojure function. Thus you don't need a separate DSL to describe functions, just use higher order functions (and possibly macros):
(defn make-affine-function [a b]
(fn [x] (+ (* a x) b)))
((make-affine-function 31 47) 5)
; => 202
Things would be more interesting if your specs were to include information about the types of parameters, as then you could be interested in writing a macro to generate code using those type hints. The simplest example I can think of would be a variant of the above:
(defmacro make-primitive-affine-function [t a b]
(let [cast #(list (symbol (name t)) %)
x (gensym "x")]
`(fn [~x] (+ (* ~(cast a) ~(cast x)) ~(cast b)))))
((make-primitive-affine-function :int 31 47) 5)
; => 202
Use :int, :long, :float or :double (or the non-namespace-qualified symbols of corresponding names) as the first argument to take advantage of unboxed primitive arithmetic appropriate for your argument types. Depending on what your function's doing, this may give you a very significant performance boost.
Other types of hints are normally provided with the #^Foo bar syntax (^Foo bar does the same thing in 1.2); if you want to add them to macro-generated code, investigate the with-meta function (you'll need to merge '{:tag Foo} into the metadata of the symbols representing the formal arguments to your functions or let-introduced locals that you wish to put type hints on).
Oh, and in case you'd still like to know how to implement your original idea...
You can always construct the Clojure expression to define your function -- (list 'fn ['x] (a-magic-function-to-generate-some-code some-args ...)) -- and call eval on the result. That would enable you to do something like the following (it would be simpler to require that the spec includes the parameter list, but here's a version assuming arguments are to be fished out from the spec, are all called paramFOO and are to be lexicographically sorted):
(require '[clojure.walk :as walk])
(defn compile-spec [spec]
(let [params (atom #{})]
(walk/prewalk
(fn [item]
(if (and (symbol? item) (.startsWith (name item) "param"))
(do (swap! params conj item)
item)
item))
spec)
(eval `(fn [~#(sort #params)] ~#spec))))
(def my-spec '[(+ (* 31 param0) 47)])
((compile-spec my-spec) 5)
; => 202
The vast majority of the time, there is no good reason to do things this way and it should be avoided; use higher-order functions and macros instead. However, if you're doing something like, say, evolutionary programming, then it's there, providing the ultimate flexibility -- and the result is still a compiled function.
Even if you don't AOT compile your code, as soon as you define a function it gets compiled to bytecode on the fly.
I was wondering whether anyone had managed to use the 'listing.' command in JPL to examine the contents of the Prolog knowledgebase? JPL requires you construct queries and will return solutions based on the variables which you set in the query. For example (Java):
Query q = new Query("holdsAt((X,Y) = true, 3)");
while ( q.hasMoreSolutions() ){
Hashtable s = q.nextSolution();
System.out.println(s.get("X")+", "+s.get("Y"));
}
I can't see how this would work for listing/0, or even listing/1 which requires an instantiated input. At the moment I am playing around with code of the form
predicate_property(L,interpreted),
\+ predicate_property(L, built_in),
\+ predicate_property(L,imported_from(_)),
current_predicate( X, L), current_predicate(X/Z).
which returns for a function existing in the knowledgebase:
myFunction:-
myGoal1,
myGoal2.
the answer:
L = myFunction(_G403,_G404),
X = myFunction,
Z = 2
But it's not sufficient as none of the goals are returned. I suppose what I require (if the listing function cannot be called using JPL), is a function which returns as a variable the predicate head along with a list of the relevant goals which must be satisfied. Unfortunately, I'm not familiar with the internals of the listing function, so I'm not sure how to go about doing this.
Thanks in advance
I have a function which is working for the time being, but I am concerned that it is less efficient than a 'listing' call
getClauses(Y):-
predicate_property(L,interpreted),
\+ predicate_property(L, built_in),
\+ predicate_property(L,imported_from(_)),
current_predicate( X, L),
current_predicate(X/Z),
findall((L, T), clause(L, T), Y).
which returns for a predicate existing in the knowledgebase:
myPredicate:-
myGoal1,
myGoal2.
the result:
?- getClauses(Y).
Y = [ (myPredicate, myGoal1, myGoal2)]
Note that this will not work for predicates which have been imported from other modules.