Tree Transformations Using Visitor Pattern - java

(Disclaimer: these examples are given in the context of building a compiler, but this question is all about the Visitor pattern and does not require any knowledge of compiler theory.) I'm going through Andrew Appel's Modern Compiler Implementation in Java to try to teach myself compiler theory (so no, this isn't homework) and I'm having trouble understanding how he wants to use the Visitor pattern to transform an AST to an IR tree. (Note: I'm doing this in Python so I can learn Python also, which is why the upcoming examples are not in Java.) As I understand it, the visit and accept methods in the Visitor pattern are void-typed by design, so if I have something like
class PlusExp(Exp):
def __init__(self, exp_left, exp_right):
self.exp_left = exp_left
self.exp_right = exp_right
def accept(self, v):
v.visit_plus_exp(self)
then I would like to be able to write a visitor method like
def visit_plus_exp(self, plus_exp):
return BINOP(BinOp.PLUS,
plus_exp.exp_left.accept(self),
plus_exp.exp_right.accept(self))
which would translate the two child expressions into IR and then link them up with the BINOP representing the plus expression. Of course, this isn't possible unless I modify all the accept functions to return extra info, and that is also messy because sometimes you just want a print visitor that doesn't return anything. Yet, this text insists that a visitor is the right way to go, and in Java at that, which means it can be done without the flexibility of Python. I can't think of any solutions that aren't incredibly hacky - can anyone enlighten me as to the intended design?

A SAX parser is a kind of visitor. To avoid adding a return value to the method, you can use a stack:
class Visitor {
Stack<Node> stack = new Stack<Node>();
// . . .
void visitPlus(PlusExp pe) {
pe.left.accept(this);
pe.right.accept(this);
Node b = stack.pop();
Node a = stack.pop();
stack.push(new BinOp(BinOp.PLUS, a, b));
}

Look at source code of THIS compiler. I think that the guy has used Visitor pattern.

Caveat: I haven't read that book.
The method may be void-typed, but in Java (which the book was written for) it is also part of an object. So, the visitor method can build up the structure in a local member variable, thus maintaining the necessary context between calls.
So, for instance, your print visitor would be appending to a StringBuilder that is held as a member variable (or as a final local variable in a method that created the visitor object -- this is fairly common in Java, where creating small anonymous-inner-class objects is a common habit).
In python, you could similarly let the visitor method access a non-method-local variable to maintain context and build the structure. Eg, closure, or a small object.
Update -- small bit of code added as example from comment below
result = new Node();
result.left.add(n1.accept(this));
result.right.add(n2.accept(this));
return result;
or
result = new Node();
this.nextLoc.add(result);
this.nextLoc = result.left;
n1.accept(this);
this.nextLoc = result.right;
n2.accept(this);
The first is prettier (though still crappy comment example code), but the second would let you keep the void return type if you really needed to.

Related

Java equivalent of Smalltalk's become:

Is there a way to swap myself (this) with some other object in Java?
In Smalltalk we could write
Object subclass:myClass [
"in my method I swap myself with someone else"
swapWith:anObject [
self become:anObject.
^nil
]
]
myClass subclass:subClass [
]
obj := myClass new.
obj swapWith:subClass new.
obj inspect.
Result is An instance of subClass, obviously.
I need to do following in Java:
I am in a one-directional hierarchy (directed acyclic graph)
in one of my methods (event listener method to be exact) I decide that I am not the best suited object to be here, so:
I create a new object (from a subclass of my class to be exact), swap myself with him, and let myself to be garbage-collected in near future
So, in short, how can I achieve in Java self become: (someClass new:someParameters)? Are there some known design patterns I could use?
In its most general form, arbitrary object swapping is impossible to reconcile with static typing. The two objects might have different interfaces, so this could compromise type safety. If you impose constraints on how objects can be swapped, such a feature can be made type safe. Such feature never became mainstream, but have been investigated in research. Look for instead at Gilgul.
Closely related is reclassification, the ability to change the class of an object dynamically. This is possible in Smalltalk with some primitives. Again, this puts type safety at risks, never became mainstream, but has been investigated in research. Look at Wide Classes, Fickle, or Plaid.
A poor man's solution to object swapping is a proxy that you interpose between the client and the object to swap, or the use of the state and strategy design patterns.
Here is an interesting thread on the official forum. I believe that object encapuslation in combination with strong types makes this function unable to work in Java. Plus for already slow JVM, this could lead to disaster...
this is a reserved word in Java and you cannot override it.
What you're trying to do can be implemented with a simple reference. You just iterate (or go through your graph) and change pointer to what you want to be active.
Consider this:
List<String> stringList = new ArrayList<String>();
// fill your list
String longestWord = "";
for (String s : stringList) {
if (longestWord.length() < s.length()) {
longestWord = s;
}
}
longestWord is poiniting to another object now.

Regarding two lines of java code

I am trying to learn a java-based program, but I am pretty new to java. I am quite confusing on the following two lines of java code. I think my confusion comes from the concepts including “class” and “cast”, but just do not know how to analyze.
For this one
XValidatingObjectCorpus<Classified<CharSequence>> corpus
= new XValidatingObjectCorpus<Classified<CharSequence>>(numFolds);
What is <Classified<CharSequence>> used for in terms of Java programming? How to understand its relationships with XValidatingObjectCorpusand corpus
For the second one
LogisticRegressionClassifier<CharSequence> classifier
= LogisticRegressionClassifier.<CharSequence>train(para1, para2, para3)
How to understand the right side of LogisticRegressionClassifier.<CharSequence>train? What is the difference between LogisticRegressionClassifier.<CharSequence>train and LogisticRegressionClassifier<CharSequence> classifier
?
These are called generics. They tell Java to make an instance of the outer class - either XValidatingObjectCorpus or LogisticRegressionClassifier - using the type of the inner object.
Normally, these are used for lists and arrays, such as ArrayList or HashMap.
What is the relationship between XValidatingObjectCorpus and corpus?
corpus is just a name given to the new XValidatingObjectCorpus object that you make with that statement (hence the = new... part).
What does LogisticRegressionClassifier.<CharSequence>train mean?
I have no idea, really. I suggest looking at the API for that (I think this is the right class).
What is the difference between LogisticRegressionClassifier.<CharSequence>train and LogisticRegressionClassifier<CharSequence> classifier?
You can't really compare these two. The one on the left of the = is the object identifier, and the one on the right is the allocator (probably the wrong word, but it is what it does, kind of).
Together, the two define an instance of LogisticRegressionClassifier, saying to create that type of object, call it classifier, and then give it the value returned by the train() method. Again, look at the API to understand it more.
By the way, these look like wretched examples to be learning Java with. Start with something simple, or at least an easier part of the code. It looks like someone had way too much fun with long names (the API has even longer names). Seriously though, I only just got to fully understanding this, and Java was my main language for quite a while (It gets really confusing when you try and do simple things). Anyways, good luck!
public class Sample<T> { // T implies Generic implementation, T can be substituted with any object.
static <T> Sample<T> train(int par1, int par2, int par3){
return new Sample<T>(); // you are calling the Generic method to return Sample object which works with a particular type of generic object, may it be an Integer or a CharSequence. --> see the main method.
}
public static void main(String ... a)
{
int par1 = 0, par2 = 0, par3 = 1;
// Here you are returning Sample object which works with a sequence of characters.
Sample<CharSequence> sample = Sample.<CharSequence>train(par1, par2, par3);
// Here you are returning Sample object which works with Integer values.
Sample<CharSequence> sample1 = Sample.<Integer>train(par1, par2, par3);
}
}
<Classified<CharSequence>> is a generic parameter.
LogisticRegressionClassifier<CharSequence> is a generic type.
LogisticRegresstionClassifier.<CharSequence>train is a generic method.
Java Generics Tutorial

calling java method one after each other with "dots" in between

I see the following code syntax. Calling
ConfigurationBuilder cb = new ConfigurationBuilder();
cb.setDebugEnabled(true)
.setOAuthConsumerKey("x11")
.setOAuthConsumerSecret("x33")
.setOAuthAccessToken("x55")
.setOAuthAccessTokenSecret("x66");
All the methods after each other without using the object instance.
How does this work in programming my own class when i want to use this kind of calling methods?
make each of those methods return the same object which they are called on:
class MyClass{
public MyClass f(){
//do stuff
return this;
}
}
It's a pretty common pattern. Have you ever seen this in C++?
int number=654;
cout<<"this string and a number: "<<number<<endl;
each call of the operator << returns the same ostream that is passed as its argument, so in this case cout, and since this operation is done left to right, it correctly prints the number after the string.
That style of writing code is called 'fluent' and it is equivalent to this:
cb.setDebugEnabled(true);
cb.setOAuthConsumerKey("x11");
cb.setOAuthConsumerSecret("x33");
cb.setOAuthAccessToken("x55");
cb.setOAuthAccessTokenSecret("x66");
Each method returns 'this' in order to accommodate this style.
If you use this style, be prepared to encounter some inconvenience while debugging your code, because if you want to single-step into only one of those methods instead of each and every single one of them, the debugger might not necessarily give you the option to do that.
This is simmilar to a Builder design pattern.
Here you can find a excellent example of it inspired by Josh Bloch's code from Effective Java 2nd ed.

Flyweight Examples in Java

I am trying to create a flyweight object in Java. I've worked with a similar concept in Objective-C (Singleton Classes in Objective-C // I believe they are the same thing).
I am trying to find a tutorial or an example or explanation online to learn how to create a flyweight object and use it, but I've searched on Google and I can't find anything descent. I went through 10 pages and they basically all plagiarize from one website which just explains the concept. I understand the concept - I need something to help me/teach me how to implement it in Java.
Anyone has any suggestions/tutorials?
Thanks!
The Wikipedia entry for the flyweight pattern has a concrete Java example.
EDIT to try and help the OP understand the pattern:
As noted in my comment below, The point of the flyweight pattern is that you're sharing a single instance of something rather than creating new, identical objects.
Using the Wiki example, the CoffeeFlavorFactory will only create a single instance of any given CoffeeFlavor (this is done the first time a Flavor is requested). Subsequent requests for the same flavor return a reference to the original, single instance.
public static void main(String[] args)
{
flavorFactory = new CoffeeFlavorFactory();
CoffeeFlavor a = flavorFactory.getCoffeeFlavor("espresso");
CoffeeFlavor b = flavorFactory.getCoffeeFlavor("espresso");
CoffeeFlavor c = flavorFactory.getCoffeeFlavor("espresso");
// This is comparing the reference value, not the contents of the objects
if (a == b && b == c)
System.out.println("I have three references to the same object!");
}
To follow up on the Wikipedia example that Brian cited...
Usually, if you want to cache some objects (such as CoffeeFlavors) and have them shared between a number of flyweights (the CoffeeOrders), then you would make them statically available. But this is not at all necessary. The important part is that the CoffeeOrders are being given the shared objects when they're constructed.
If the Orders are always only created by one singleton, like a "CoffeeOrderFactory," then the factory can keep a non-static cache of Flavors. However you accomplish it, your goal is to get all the Orders in the whole system to use the same exact set of Flavor objects. But at the end of the day, if you want to avoid creating many instances of CoffeeFlavor, then it usually needs to be created statically, just to make sure there's only one cache.
Get it?
I got this case. I think my solution was flyweight.
INPUT
A: C E
B: D C
C: E
A: B
It asked me to create a tree and sort its children by name. Something like this:
A: B C E
B: C D
C: E
It's an easy task actually. But please notice that the first 'A' and the second 'A' in the input must refer to same object. Hence I coded something like this
public Node add(String key){
Node node = nodes.get(key);
if (null == node){
node = new Node(key);
nodes.put(key, node);
}
return node;
}
This is the simplified version of the actual problem, but you should have the idea now.
I also found this example, which has good Java code example.
The "java.lang.Character" uses the flyweight pattern to cache all US-ASCII characters : see in class java.lang.Character$CharacterCache used by the Character.valueOf() method

How do I turn a conditional chain into faster less ugly code?

I have 9 different grammars. One of these will be loaded depending on what the first line of txt is on the file it is parsing.
I was thinking about deriving the lexer/parser spawning into sep. classes and then instantiating them as soon as I get a match -- not sure whether that would slow me down or not though. I guess some benchmarking is in order.
Really, speed is definitely my goal here but I know this is ugly code.
Right now the code looks something like this:
sin.mark(0)
site = findsite(txt)
sin.reset()
if ( site == "site1") {
loadlexer1;
loadparser1;
} else if (site == "site2") {
loadlexer2;
loadparser2;
}
.................
} else if (site == "site8") {
loadparser8;
loadparser8;
}
findsite(txt) {
...................
if line.indexOf("site1-identifier") {
site = site1;
} else if(line.indexOf("site2-identifier") {
site = site2;
} else if(line.indexOf("site3-identifier") {
site = site3;
}
.........................
} else if(line.indexOf("site8-identifier") {
site = site8;
}
}
some clarifications
1) yes, I truly have 9 different grammars I built with antlr so they will ALL have their own lexer/parser objs.
2) yes, as of right now we are comparing strings and obivously that'll be replaced with some sort of integer map.
I've also considered sticking the site identifiers into one regex, however I don't believe that will speed anything up.
3) yes, this is pseudocode so I wouldn't get too picky on the semantics here..
4) kdgregory is correct in noting that I am unable to create one instance of the lexer/parser pair
I like the hash idea to make the code a little bit better looking, however I don't think it's going to speed me up any.
The standard approach is to use a Map to connect the key strings to the lexers that will handle them:
Map<String,Lexer> lexerMap = new HashMap<String,Lexer>();
lexerMap.put("source1", new Lexer01());
lexerMap.put("source2", new Lexer02());
// and so on
Once you've retrieve the string that identifies the lexer to use, you'd retrieve it from the Map like so:
String grammarId = // read it from a file, whatever
Lexer myLexer = lexerMap.get(grammarId);
Your example code has a few quirks, however. First, the indexOf() calls indicate that you don't have a stand-alone string, and Map won't look inside the string. So you need to have some way to extract the actual key from whatever string you read.
Second, lexers and parsers usually maintain state, so you won't be able to create a single instance and reuse it. That indicates that you need to create a factory class, and store it in the map (this is the Abstract Factory pattern).
If you expect to have lots of different lexers/parsers, then it makes sense to use a map-driven approach. For a small number, an if-else chain is probably your best bet, properly encapsulated (this is the Factory Method pattern).
Using polymorphism is almost guaranteed to be faster than string manipulation, and will be checked for correctness at compile time. Is site really a String? If so, FindSite should be called GetSiteName. I would expect FindSite to return a Site object that knows the appropriate lexer and parser.
Another speed issue is speed of coding. It would definitely be better to have your different lexers and parsers in individual classes (perhaps with shared functionality in another). It'll make your code slightly smaller, and it will be significantly easier for someone to understand.
Something like:
Map<String,LexerParserTuple> lptmap = new HashMap<String,LexerParserTuple>();
lpt=lptmap.get(site)
lpt.loadlexer()
lpt.loadparser()
combined with some regex magic rather than string.indexOf() to grab the names of the sites should dramatically clean up your code.
Replace Conditional With Polymorphism
For a half-measure, for findsite(), you could simply set up a HashMap to get you from site identifier to site. An alternative cleanup would be simply to return the site string, thus:
String findsite(txt) {
...................
if line.indexOf("site1-identifier")
return site1;
if(line.indexOf("site2-identifier")
return site2;
if(line.indexOf("site3-identifier")
return site3;
...
}
Using indexOf() in this way isn't really expressive; I'd use equals() or contains().
Suppose your code is inefficient.
Will it take more time than (say) 1% of the time to actually parse the input?
If not, you've got bigger "fish to fry".
I was thinking about deriving the lexer/parser spawning into sep. classes and then instantiating them as soon as I get a match
It looks like you have the answer already. That would create code that is more flexible, but not necessary faster.
I guess some benchmarking is in order
Yes, measure with both approaches and take an informed decision. My guess is the way you have it already would be enough.
Perhaps, if what's bothers you is to have a "kilometric" method you could refactor it in different functions with extract method.
The most important thing is to have first a solution that does the job even though it is slow, and once you have it working, profile it and detect points where the performance could be improved. Remember the "Rules of optimization"
i would change the type of findsite to return a site type (super class) and then leverage the polymorphism...
That should be faster than string manipulation...
Do you need separate lexers ?
Use a Map to configure a site to loadstrategy structure. Then a simple lookup is required based on 'site' and you execute the appropriate strategy. Same can be done for findSite().
Could have a map of idenifiers vs sites, then just iterate over the map entries.
// define this as a static somewhere ... build from a properties file
Map<String,String> m = new HashMap<String,String>(){{
put("site1-identifier","site2");
put("site2-identifier","site2");
}}
// in your method
for(Map.Entry<String,String> entry : m.entries()){
if( line.contains(entry.getKey())){
return line.getValue();
}
}
cleaner: yes
faster: dunno...should be fast enough
You could use reflection possibly
char site = line.charAt(4);
Method lexerMethod = this.getClass().getMethod( "loadLexer" + site, *parameters types here*)
Method parserMethod = this.getClass().getMethod( "loadparser" + site, *parameters types here*)
lexerMethod.invoke(this, *parameters here*);
parserMethod.invoke(this, *parameters here*);
I don't know about Java but some language allow switch to take strings.
switch(site)
{
case "site1": loadlexer1; loadparser1; break;
case "site2": loadlexer2; loadparser2; break;
...
}
As for the seconds bit, use a regex to extract the identifier and switch on that. You might be better off using an enum.

Categories