Give instructions to the Java parser/lexer

Give instructions to the Java parser/lexer - java

Is there any way to give instructions directly to the parser and lexar from the java code level? If not, how could one go about doing this at all?
The issue is that I want to have the parser evaluate a variable, back up, then assign the value of that variable as an Object name. Like this:
String s = "text";
SomeClass (s) = new SomeClass();
parser reads--> ok, s evaluates to be "text"...
parser backtracks, while holding "text" in memory and assigns "text" as the name of the new instance of SomeClass, such that one can now do this:
text.callSomeMethod();
I need to do this because I have to instantiate an arbitrary number of objects of SomeClass. Each one has to have a unique name, and it would be ideal to do something like this:
while (someArbitrarySet.hasNext()) {
String s = "token" + Math.random();
SomeClass (s) = new SomeClass();
(s).callSomeMethod();
}
I hope this makes sense...

What you're asking for is what some languages call MACROS. They're also sometimes known as preprocessor definitions, or simply "defines".
A decision was made to not have includes and macros and the like in Java because it introduces additional code maintenance concerns that the designers concluded was going to cause code that would not have been in the style they wanted.
However, just because it's not built into the compiler doesn't mean you couldn't add it to your build script.
As part of your build, you copy all files to a src-comp directory, and as you do, replace your tokens as they're defined.
I don't recommend doing it, but that doesn't mean it isn't possible.

What you describe (creating new named variables at runtime) is possible in interpreted languages like JavaScript, Lua, Bash, but not with a compiled language like Java. When the loop is executed, there is no source code there to manipulate, and all named variables have to be defined before.
Apart from this, your variables don't need a "unique" name, if you are using them sequentially (one after another), you could just as well write your loop as this:
while (someArbitrarySet.hasNext()) {
SomeClass sC = new SomeClass();
sC.callSomeMethod();
}
If you really need your objects at the same time, put them in some sort of data structure. The simplest would be an array, you could use a Collection (like an ArrayList) or a Map (like CajunLuke wrote), if you want to find them again by key.
In fact, an array (in Java) is nothing else than a collection of variables (all of the same type), which you can index by an int.
(And the scripting languages which allow creating new variables on runtime implement this also with some kind of map String → (anything), where this map is either method/script-local or belonging to some surrounding object.)
You wrote in a comment to the question (better add those things to the question itself, it has an "edit" button):
Without getting into too many details, I'm writing an application that runs within a larger program. Normally, the objects would get garbage-collected after I was done with them, but the larger program maintains them, thus the need for a unique name for each. If I don't give each a unique name, the old object will get overwritten, but it is still needed in the context of the greater program.
So, you want to retain the objects to avoid garbage collection? Use an array (or List or anything else).
The thing is, if you want your larger program to be able to use these objects, you somehow have to give them to this larger program anyway. And then this program would have to retain references to these objects, thereby avoiding garbage collection. So it looks you want to solve a problem which does not exist by means which do not exist :-)

Not really an answer to the question you asked, but a possible solution to your problem: using a map.
Map variables = new HashMap();
while (someArbitrarySet.hasNext()) {
String s = "token" + Math.random();
variables.put(s, new SomeClass());
variables.get(s).callSomeMethod();
}
That way, you can use the "variable name" as the keys into the map, and you can get by without messing with the lexer/parser.
I really hope there is a way to do specifically what you state in Java - it would be really cool.

No. That's not possible.
Even if you could I can't think on a way to invoke them, because there won't be compiling code that could successfully reference them.
So the options are the one described by CanjuLuke or to create your own java parser, probably using ANTRL sample Java grammar and hook what you need there.
Consider the map solution.

This is answered in How do you use Java 1.6 Annotation Processing to perform compile time weaving? .
In short, there is an annotation processing tool that allows you to extend java syntax, and create DSLs that compile to java annotations.
Under JDK 1.5 you had to use apt instead of javac, but under 1.6, these are affected by the -processor flag to javac. From javac -help:
-processor <class1>[<class2>,<class3>...]Names of the annotation processors to run; bypasses default discovery process
-processorpath <path> Specify where to find annotation processors

Related

Keeping track of what's in a Collection in pre-generics Java?

For a bunch of reasons that (believe it or not) are not as unsound as you may think, we are still (sigh) using Java 1.4 to build and run our code (though we plan to finally move to Java 7 by the end of the year).
Our existing code that uses Collection classes doesn't do a very good job of making it clear what is expected to be in the Collection. Obviously, you can read the code and see what the downcasts end up being done and infer from that, but you can't just look at a method declaration and know what the Collection object that is a method argument or method return value actually holds.
In new code that I'm writing and when I am in older code that uses Collections, I've been adding in-line comments to Collections declarations to show what would have been declared if generics were being used. For example:
Map/*<String, Set<Integer>>*/ theMap = new HashMap/*<String, Set<Integer>>*/();
or
List/*<Actions>*/ someMethod(List/*<Job>*/ jobs);
In keeping with the frowning at subjectivity here at SO, rather than asking what you think of this (though admittedly I'd like to know -- I do find it a bit ugly but still like having the type info there) I'd instead just ask what, if anything, you do to make it clear what is being held by pre-generics Collection objects.

What we recommended back in the old days -- and I was a Java Architect at Sun when Java 1.1 was the New Thing -- was to write a class around the structure (I don't think 1.1 even had Collection as a base class) so that the typecasts happned in code you control instead of in user code. So, for example, something like
public class ArrayOfFoo {
Object [] ary; // ctor left as exercise
public void set(int index, Foo value){
ary[index] = (Object) value; // cast strictly not needed, any Foo is an Object
}
public void get(int index){
return (Foo) ary[index]; // cast needed, not every Object is a Foo
}
}
Sounds like the code base you have isn't built to this convention; if you're writing new code, there's no reason you can't start. Failing that, your convention isn't bad, but it's easy to forget the cast and then have to search to find out why you're getting a bad cast exception. It's mildly better to resort of some variant on Hungarian notation, or the Smalltalk 'aVariable' convention, by encoding the type in the names, so that you use
Object fooAry = new Object[aZillion];
fooAry[42] = new Foo();
Foo aFoo = fooAry[42];

Use clear variable identifiers such as jobList, actionList, or dictionaryMap. If you're concerned with the type of objects they contain, you could even make it a convention to always let the identifier of a Collection hint about which type of objects it holds.
The inlined comments aren't that idea actually. When I ported a 1.5 project back to 1.4 I did just that (instead of removing the type parameters). It worked out quite well.

I'd recommend writing tests. For various reasons:
You should be writing tests anyway!
You can assert the type of a collection member very easily to ensure that all your code paths are adding the right types to the collection
You can use the test to write code that serves as an "example" of how to use the collection correctly

If you just need binary compatibility to 1.4 you could consider using a tool to downgrade the class files back to 1.4 and thus start to develop in 1.6 or 1.7 right now. You would of course need to avoid any API that hasn't been there in 1.4 (unfortunately you can't compile code with generics against the 1.4 jars directly as they don't declare any generic types). The Bytecode is still the same (at least with 1.6, I don't know for sure about 1.7). One free tool that can do the trick is ProGuard. It can do much more sophisticated things and can also remove all traces of generics in the class files. Just turn off the obfuscation and optimization if you don't need it. It will also warn you if some missing API was used in the processed code if you feed it the 1.4 libraries.
I'm aware that is considered a hack by many but we had a similar requirement where we needed some code to still run on a Personal Java VM (this is essentially Java 1.1) and several other exotic VMs and this approach worked quite well. We started with ProGuard and then made our own tool for the task to be able to implement a few workarounds for some Bugs in the diverse VMs.

Is it possible to create Java enums based on input file?

I'm using Java 6.
Suppose I have a file availableFruits.txt
APPLE
ORANGE
BANANA
Suppose I want an enum FruitType that contains values listed in availableFruits.txt, will I be able to do this?

You can't populate an enum type at execution time, no - at least, not without something like BCEL, or by calling the Java compiler.
You can write code to create a Java source file, of course, and build that when you build your app, if you don't need it to be changed afterwards.
Otherwise, I'd just create a wrapper class which is able to take a set of known values and reuse them. Exactly what you need to do will depend on how you wanted to use the enum, of course.

Well the point of an Enum is to use it at compile time.
If you don't know at compile time what values your Enum has then it's not an Enum it's a collection.
If you do know and you just want to create a class file base on the values in the text file then yes it's possible by reading the txt then generating the source code.

I expect it's possible, by writing your own ClassLoader subclass, creating the bytecode for the enum in a byte array, and using defineClass. Hard, maybe, but possible. I expect once you know the byte sequence for an enum, it's not that hard to custom-generate it from the info in the JVM spec.
Now, whether it's a good idea...well, I suspect only in a very small number of edge cases. (I can't think of one; I mean, having created it, you'd have to generate code to use it, right?) Otherwise, you're probably better off with a Map or similar.

No, not unless you generate the enum source file from the text file.

As everyone else said- no. It's not possible. Your best shot is to use the Registry pattern. Read in the values, store them in some sort of query-able map. Sort of like an Enum.

As everyone pointed out, it's not possible. However, you could create a Map where the key of your map would be the value you read from you file (APPLE,ORANGE,BANANA) and the ? would be an associated valu (int for example).
This way you could basically achieve the same goal without the type safety, of course.
int i = fruitsMap.get("BANANA") // get the assoicated value

You can with dynamically generated code. e.g. Using the Compiler API. I have written a wrapper for that API so you can compile classes in memory. See the code below.
The problem you have is that its not very useful as you cannot use these values except in classes which were compiled AFTER your enum was compiled. You can use Enum.valueOf() etc. But a lot of the value of enums is lost.
As other have suggested, using a Map would be simpler and give the same benefit. I would only use the enum if you have a library has to be passed an Enum. (Or plan more generated code)
public static Class generateEnum(String className, List<String> enums) {
StringBuilder code = new StringBuilder();
code.append("package enums; public enum enums." + className + " {\n");
for (String s : enums)
code.append("\t"+s+",\n");
code.append("}");
return CompilerUtils.CACHED_COMPILER
.loadFromJava("enums."+className, code.toString());
}
One of things I find useful with text generated code is that you can write it to a file and debug it even at run time. (The library supports this) If you byte code generation, its harder to debug.
The library is called Essence JCF. (And it doesn't require a custom class loader)

How would you do this in a dynamic language like JavaScript: it would be just string with one of values: "APPLE", "ORANGE", "BANANA".
Java types (classes, interfaces, enums) exist only for compiler to do some optimizations, and type checking, to make refactoring possible, etc. At runtime you don't need neither optimizations, type checking nor refactoring, so normal "string" is OK, just like in JavaScript every object is either a number (Double in Java), a string (String in Java) or a complex object (Map in Java) - that's all you need to do anything at runtime even in Java.

How to identify if an object returned was created during the execution of a method - Java

Original Question: Given a method I would like to determine if an object returned is created within the execution of that method. What sort of static analysis can or should I use?
Reworked Questions: Given a method I would like to determine if an object created in that method may be returned by that method. So, if I go through and add all instantiations of the return type within that method to a set, is there an analysis that will tell me, for each member of the set, if it may or may not be returned. Additionally, would it be possible to not limit the set to a single method but, all methods called by the original method to account for delegation?
This is not specific to any invocation.
It looks like method escape analysis may be the answer.
Thanks everyone for your suggestions.

Your question seems to be either a simple "reaching" analysis ("does a new value reach a return statements") if you are interested in any invocation and only if a method-local new creates the value. If you need to know if any invocation can return a new value from any subcomputation you need to compute the possible call-graph and determine if any called function can return a new value, or pass a new value from a called function to its parent.
There are a number of Java static analysis frameworks.
SOOT is a byte-code based analysis framework. You could probably implement your static query using this.
The DMS Software Reengineering Toolkit is a generic engine for building custom analyzers and transformation tools. It has a full Java front end, and computes various useful base analyses (def/use chains, call graph) on source code. It can process class files but presently only to get type information.
If you wanted a dynamic analysis, either by itself or as a way to tighten up the static analysis, DMS can be used to instrument the source code in arbitrary ways by inserting code to track allocations.

I'm not sure if this would work for you circumstances, but one simple approach would be to populate a newly added 'instantiatedTime' field in the constructor of the object and compare that with the time the method was call was made. This assumes you have access to the source for the object in question.

Are you sure static analysis is the right tool for the job? Static analysis can give you a result in some cases but not in all.
When running the JVM under a debugger, it assigns objects with increasing object IDs, which you can fetch via System.identityHashCode(Object o). You can use this fact to build a test case that creates an object (the checkpoint), and then calls the method. If the returned object as an id greater than the checkpoint id, then you know the object was created in the method.
Disclaimer: that this is observed behaviour under a debugger, under Windows XP.

I have a feeling that this is impossible to do without a specially modified JVM. Here are some approaches ... and why they won't work in general.
The Static Analysis approach will work in simple cases. However, something like this is likely to stump any current generation static analysis tool:
// Bad design alert ... don't try this at home!
public class LazySingletonStringFactory {
private String s;
public String create(String initial) {
if (s == null) {
s = new String(initial);
}
return s;
}
}
For a static analyser to figure out if a given call to LazySingletonStringFactory.create(...) returns a newly created String it must figure out that it has not been called previously. The Halting Problem tells us that this is theoretically impossible in some cases, and in practice this is beyond the "state of the art".
The IdentityHashCode approach may work in a single-threaded application that completes without the garbage collector running. However, if the GC runs you will get incorrect answers. And if you have multiple threads, then (depending on the JVM) you may find that objects are allocated in different "spaces" resulting in object "id" creation sequence that is no longer monotonic across all threads.
The Code Instrumentation approach works if you can modify the code of the Classes you are concerned about, either direct source-code changes, annotation-based code injection or by some kind of bytecode processing. However, in general you cannot do these things for all classes.
(I'm not aware of any other approaches that are materially different to the above three ... but feel free to suggest them as a comment.)

Not sure of a reliable way to do this statically.
You could use:
AspectJ or a similar AOP library could be use to instrument classes and increment a counter on object creation
a custom classloader (or JVM agent, but classloader is easier) could be used similarly

Simple List of All Java Standard Classes and Methods?

I'm building a very simple Java parser, to look for some specific usage models. This is in no way lex/yacc or any other form of interpreter/compiler for puposes of running the code.
When I encounter a word or a set of two words separated by a dot ("word.word"), I would like to know if that's a standard Java class (and method), e.g. "Integer", or some user defined name. I'm not interested in whether the proper classes were included/imported in the code (i.e. if the code compiles well), and the extreme cases of user defined classes that override the names of standard Java classes also does not interest me. In other words: I'm okay with false negative, I'm only interesting in being "mostly" right.
If there a place wher I could find a simple list of all the names of all Java standard classes and methods, in the form easily saved into a text file or database? (J2SE is okay, but J2EE is better). I'm familiar with http://java.sun.com/j2se/ etc, but it seems I need a terrible amount of manual work to extract all the names from there. Also, the most recent JDK is not neccesary, I can live with 1.4 or 1.5.
Clarification: I'm not working in Java but in Python, so I can't use Java-specific commands in my parsing mechanism.
Thanks

What's wrong with the javadoc? The index lists all classes, methods, and static variables. You can probably grep for parenthesis.

To get all classes and methods you can look at the index on
http://java.sun.com/javase/6/docs/api/index-files/index-1.html
This will be 10's of thousands classes and method which can be overwhelming.
I suggest instead you use auto-complete in your IDE. This will show you all the matching classes/methods appropriate based on context.
e.g. say you have a variable
long time = System.
This will show you all the methods in System which return a long value, such as
long time = System.nanoTime();
Even if you know a lot of the method/classes, this can save you a lot of typing.

If you just want to create a list of all classes in Java and their methods (so that you can populate a database or an XML file), you may want to write an Eclipse-plugin that looks at the entire JavaCore model, and scans all of its classes (e.g., by searching all subtypes of Object). Then enumerate all the methods. You can do that technically to any library by including it in your context.

IBM had a tool for creating XML from JavaDocs, if I am not mistaken:
http://www.ibm.com/developerworks/xml/library/x-tipjdoc/index.html

There's also an option to either parse classlist file from jre/lib folder or open the jsse.jar file, list all classes there and make a list of them in dot-separated form by yourself.

When I encounter a word or a set of two words separated by a dot ("word.word"), I would like to know if that's a standard Java class (and method), e.g. "Integer", or some user defined name.
If thats what you're after, you could do without a (limited) list of Java Classes by using some simple reflection:
http://java.sun.com/developer/technicalArticles/ALT/Reflection/
try {
Class.forName("word.word");
System.out.println("This is a valid class!");
} catch (ClassNotFoundException e) {
System.out.println("This is not a valid class.");
}
Something like this should be enough for your purposes, with he added benefit of not being limited to a subset of classes, and extensible by any libraries on the classpath.

Why are variables declared with their interface name in Java? [duplicate]

This question already has answers here:
What does it mean to "program to an interface"?
(33 answers)
Closed 6 years ago.
This is a real beginner question (I'm still learning the Java basics).
I can (sort of) understand why methods would return a List<String> rather than an ArrayList<String>, or why they would accept a List parameter rather than an ArrayList. If it makes no difference to the method (i.e., if no special methods from ArrayList are required), this would make the method more flexible, and easier to use for callers. The same thing goes for other collection types, like Set or Map.
What I don't understand: it appears to be common practice to create local variables like this:
List<String> list = new ArrayList<String>();
While this form is less frequent:
ArrayList<String> list = new ArrayList<String>();
What's the advantage here?
All I can see is a minor disadvantage: a separate "import" line for java.util.List has to be added. Technically, "import java.util.*" could be used, but I don't see that very often either, probably because the "import" lines are added automatically by some IDE.

When you read
List<String> list = new ArrayList<String>();
you get the idea that all you care about is being a List<String> and you put less emphasis on the actual implementation. Also, you restrict yourself to members declared by List<String> and not the particular implementation. You don't care if your data is stored in a linear array or some fancy data structure, as long as it looks like a List<String>.
On the other hand, reading the second line gives you the idea that the code cares about the variable being ArrayList<String>. By writing this, you are implicitly saying (to future readers) that you shouldn't blindly change actual object type because the rest of the code relies on the fact that it is really an ArrayList<String>.

Using the interface allows you to quickly change the underlying implementation of the List/Map/Set/etc.
It's not about saving keystrokes, it's about changing implementation quickly. Ideally, you shouldn't be exposing the underlying specific methods of the implementation and just use the interface required.

I would suggest thinking about this from the other end around. Usually you want a List or a Set or any other Collection type - and you really do not care in your code how exactly this is implemented. Hence your code just works with a List and do whatever it needs to do (also phrased as "always code to interfaces").
When you create the List, you need to decide what actual implementation you want. For most purposes ArrayList is "good enough", but your code really doesn't care. By sticking to using the interface you convey this to the future reader.
For instance I have a habit of having debug code in my main method which dumps the system properties to System.out - it is usually much nicer to have them sorted. The easiest way is to simply let "Map map = new TreeMap(properties);" and THEN iterate through them, as TreeMap returns the keys sorted.
When you learn more about Java, you will also see that interfaces are very helpful in testing and mocking, since you can create objects with behaviour specified at runtime conforming to a given interface. An advanced (but simple) example can be seen at http://www.exampledepot.com/egs/java.lang.reflect/ProxyClass.html

if later you want to change implementation of the list and use for example LinkedList(maybe for better performance) you dont have to change the whole code(and API if its library). if order doesnt matter you should return Collection so later on you can easily change it to Set if you would need items to be sorted.

The best explanation I can come up with (because I don't program in Java as frequently as in other languages) is that it make it easier to change the "back-end" list type while maintaining the same code/interface everything else is relying on. If you declare it as a more specific type first, then later decide you want a different kind... if something happens to use an ArrayList-specific method, that's extra work.
Of course, if you actually need ArrayList-specific behavior, you'd go with the specific variable type instead.

The point is to identify the behavior you want/need and then use the interface that provides that behavior. The is the type for your variable. Then, use the implementation that meets your other needs - efficiency, etc. This is what you create with "new". This duality is one of the major ideas behind OOD. The issue is not particularly significant when you are dealing with local variables, but it rarely hurts to follow good coding practices all the time.

Basically this comes from people who have to run large projects, possibly other reasons - you hear it all the time. Why, I don't actually know. If you have need of an array list, or Hash Map or Hash Set or whatever else I see no point in eliminating methods by casting to an interface.
Let us say for example, recently I learned how to use and implemented HashSet as a principle data structure. Suppose, for whatever reason, I went to work on a team. Would not that person need to know that the data was keyed on hashing approaches rather than being ordered by some basis? The back-end approach noted by Twisol works in C/C++ where you can expose the headers and sell a library thus, if someone knows how to do that in Java I would imagine they would use JNI - at which point is seems simpler to me to use C/C++ where you can expose the headers and build libs using established tools for that purpose.
By the time you can get someone who can install a jar file in the extensions dir it would seem to me that entity could be jus short steps away - I dropped several crypto libs in the extensions directory, that was handy, but I would really like to see a clear, concise basis elucidated. I imagine they do that all the time.
At this point it sounds to me like classic obfuscation, but beware: You have some coding to do before the issue is of consequence.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Give instructions to the Java parser/lexer - java

Related

Keeping track of what's in a Collection in pre-generics Java?

Is it possible to create Java enums based on input file?

How to identify if an object returned was created during the execution of a method - Java

Simple List of All Java Standard Classes and Methods?

Why are variables declared with their interface name in Java? [duplicate]

Categories

Resources