i am new to Antlr and writing in java, and i am currently trying to figure out how i can make the parser identify the token "var" as either an int, string etc. Just as in javascript where you use either var or let. I am trying to make my own programming language which has explicit data types, so that it would be easier for a beginner to start coding, without worrying wether he/she is using an int/string/char and so on.
I dont seem to find any documentation for it online, so i am hoping that someone here can teach me in how i would make this possible
This isn't really a task you'd accomplish directly with ANTLR.
ANTLR generates code to produce a Parser for you. That means it will process your input and produce a data structure (a ParseTree in ANTLRs case), that correctly catagorizes all of your input (assuming it's syntactically correct; an error message otherwise).
In your case you'd have a ParseTree that would correctly identify that you have a var keyword, an Identifier (your variable name) an = and a value. This would probably be the result of matching matching a parse rule for something like assignmentStmt.
With that ParseTree in memory, you'll have listener and/or visitor classes that ANTLR generates to make it quite easy to navigate that ParseTree.
With everything parsed out for you (by ANTLR), it would be up to you, in your own code to do the type inference (what you are describing is type inference rather than "explicit typing"). Or, if you're wanting to allow any type to be assigned to your variable, you really don't have any thing you need to do (You have a typeless language, and no need to verify types. Your runtime would, of course, want to keep track of what the type of the currently assigned value is, but would allow assignment of a new value of any type.)
Antlr's job is to correctly identify all the parts according to your Syntax (Type checking is a Semantic concern, and not something the Parser concerns itself with). It does not create Symbol Tables for you, or attempt to do type inference. These are tasks that are up to you once the input is parsed.
Side note, JavaScript is untyped so, you just have a variable or constant, that can hold anything, there is no type (inferred, or explicit).
Explcit typing would be something like:
var myString : String;
Implcit typing would be something like:
var myVar = "String"
and you would have code that essentially says "They have assigned a String to myVar so, henceforth, myVar is a String type and will accept no value other than a String.
In JavaScript, you're just getting a variable, and you can turn right around as assign it a numeric value, an object, or anything you like (it's typeless).
A parser is a tool to determine if certain input is syntactically correct and can convert the input to a specific data structure, if that's the case (as Mike Cargal explained). That means a parser is a tool to deal with the syntax of the input.
Specifying types and other meta information of the input is applying meaning to certain strings, which is commonly called semantic processing.
Knowing that a parser is a syntax tool should make clear that a parser cannot be used to apply semantics. It's important to differentiate between syntax and semantic, to understand which tool can do what.
How to apply semantics in the way you seem to want is a complete own topic and too broad to be handled in a single question.
Related
I am so used to C++ & C# where I can type bool. In Java, I am required to type boolean. Also, it requires me to type String with uppercase 'S'. I would love to be able to create project wide aliases for these variable types to enable me to create variables by typing bool and string. Do you have any ideas?
If you want to actually change the syntax of the language, that is not only impossible, but a terrible idea. Aliases would make code indecipherable to others.
However, it is possible to only type 'bool'+{TAB} and 'string'+{TAB} and have NetBeans change it to 'boolean' or 'String', respectively. In fact, you have much more flexibility than that (for example, you could make it 'boo'+{SPACE}='boolean' or 'bo'+{ENTER}='boolean').
Take a look at 'NetBeans->Preferences->Editor->Code Templates' if that is the kind of thing you need; should be pretty self-explanatory.
I am writing a compiler for a subset of the Java language and during writing the scanner i was wondering how do i recognize types that are not primitive but come from a library. For example the String type is not primitive in the Java language. In general, how do i recognize new types from libraries as "keywords" in the language without knowing them in advance. Thanks!
Try getting through http://www.antlr.org/grammar/1152141644268/Java.g (pay attention to the type declaration and its usage) — it will give you some insight.
In brief, when you parse the source, you know where you expect to see a type, be it after modifiers in fields and methods, just before parameter names in formal parameter list, etc. And when you know that String is some type, you can check with import table to get a full name, and go find the actual class in the classpath.
Now, as Bart mentions in the comments (now deleted), the real resolution happens later in the pipeline. At the parsing stage, you only create a node saying that you have a type which is an Identifier which is String. So the answer at this moment would be, "you don't."
Update: try following this thesis, it should help you.
I used to write a very strong type language, for example, java. I need to tell the complier what type of variable I will put in... for example...
public static void sayHello(String aName)
I can ensure that the user will pass a string to me...
But if I use php, I can do that...
function sayHello($aName)
I still can call the sayHello, but I don't know what the param type......I can let the name more informative like this:
function sayHelloWithString($aName)
But I can't stop the user pass in a int to me..... the user can still pass the int to me... ...it may cause lot of errors....How can I stop it? any ideas or experience shared? Thank you.
How about not stopping the user from passing in an int?
In php, you could check is_string, but of course, you'll miss out on objects that have __toString set, or the implicit conversion of numbers to strings.
If you must make your program cry in pain when a developer tries something different, you could specify a type in the later versions of PHP (i.e. function foo(ObjectType $bar)...)*
In most loosely typed languages, you want to set up fall-backs for the major types:
number
array
string
generic object
Be liberal in what you accept, be strict in what you send.
* Primitive types are not supported for type hinting
There's a few ways to deal with this...
Use an IDE that supports docblocks. This deals with the pre-runtime type checking when writing code.
Use type checking within your function This only helps with the runtime type checking, and you won't know when writing your code.
Depending on the type you can use built-in type hinting. This however only works for non-scalar values, specifically array and a class name.
1 - To implement #1 using a good IDE, you can docblock your function as such:
/**
* Say hello to someone.
*
* #param string $aName
**/
public function sayHello($aName) {
2 - To implement #2 use the is_ methods..
public function sayHello($aName) {
if (!is_string($aName)) {
throw new ArgumentException("Type not correct.");
}
// Normal execution
3 - You can't do this with your method above, but something like this.. Kindof the same as #2 apart from will throw a catchable fatal error rather than ArgumentException.
public function manipulateArray(array $anArray) {
It's worth noting that most of this is pretty irrelevant unless you're writing publicly usable library code.. You should know what your methods accept, and if you're trying to write good quality code in the first place, you should be checking this before hand.
Using a good IDE (I recommend phpStorm a thousand times over) you can and should utilise DocBlocks everywhere you can for all of your classes. Not only will it help when writing APIs and normal code, but you can use it to document your code, what if you need to look at the code 6 months later, chances are you're not going to remember it 100% :-)
Additionally, there's a lot more you can do with docblocks than just define parameter types, look it up.
You can check if what they passed is a string using:
http://php.net/manual/en/function.is-string.php
Then provide appropriate error handling.
function sayHello($aName) {
if (is_string($aName)) {
//string OK!
} else {
echo "sayHello() only takes strings!";
}
}
In PHP you can check whether the variable that has been passes is a string by using the is_string function:
<?php
if (is_string($aName)) {
echo "Yes";
} else {
echo "No";
}
?>
Hope that helps.
Or alternatively /additionally use Type Casting to convert the variable to the required type
http://us3.php.net/manual/en/language.types.type-juggling.php
You have the option of checking to make sure the parameter is of the right type. However, it's worth considering what you'd do if it isn't. If you're just going to throw an exception, you might be better off just assuming it's the right type and the the exception be thrown when something you do isn't allowed. If you're not going to add any more useful information to the exception/error that would already be thrown, then there's not much point in checking it in the first place.
As to giving the user an indication of what type you want, I generally stick with including it in the variable name:
function sayHello($aNameStr)
function addItems($itemList)
...etc...
That, plus reasonable documentation, will mean the user can look at the function and figure out what they should be passing in in the first place.
Some scripting languages have tools that can help you. For example use strict in perl requires declaration of each variable before using. But still the language is weakly typed by definition.
Sometimes naming conventions help. For example we inherited from good old Fortran tradition that int variables' names should start from i, j, k, l, m, n. And this convention is used now at least for indexes.
Is there any way to give instructions directly to the parser and lexar from the java code level? If not, how could one go about doing this at all?
The issue is that I want to have the parser evaluate a variable, back up, then assign the value of that variable as an Object name. Like this:
String s = "text";
SomeClass (s) = new SomeClass();
parser reads--> ok, s evaluates to be "text"...
parser backtracks, while holding "text" in memory and assigns "text" as the name of the new instance of SomeClass, such that one can now do this:
text.callSomeMethod();
I need to do this because I have to instantiate an arbitrary number of objects of SomeClass. Each one has to have a unique name, and it would be ideal to do something like this:
while (someArbitrarySet.hasNext()) {
String s = "token" + Math.random();
SomeClass (s) = new SomeClass();
(s).callSomeMethod();
}
I hope this makes sense...
What you're asking for is what some languages call MACROS. They're also sometimes known as preprocessor definitions, or simply "defines".
A decision was made to not have includes and macros and the like in Java because it introduces additional code maintenance concerns that the designers concluded was going to cause code that would not have been in the style they wanted.
However, just because it's not built into the compiler doesn't mean you couldn't add it to your build script.
As part of your build, you copy all files to a src-comp directory, and as you do, replace your tokens as they're defined.
I don't recommend doing it, but that doesn't mean it isn't possible.
What you describe (creating new named variables at runtime) is possible in interpreted languages like JavaScript, Lua, Bash, but not with a compiled language like Java. When the loop is executed, there is no source code there to manipulate, and all named variables have to be defined before.
Apart from this, your variables don't need a "unique" name, if you are using them sequentially (one after another), you could just as well write your loop as this:
while (someArbitrarySet.hasNext()) {
SomeClass sC = new SomeClass();
sC.callSomeMethod();
}
If you really need your objects at the same time, put them in some sort of data structure. The simplest would be an array, you could use a Collection (like an ArrayList) or a Map (like CajunLuke wrote), if you want to find them again by key.
In fact, an array (in Java) is nothing else than a collection of variables (all of the same type), which you can index by an int.
(And the scripting languages which allow creating new variables on runtime implement this also with some kind of map String → (anything), where this map is either method/script-local or belonging to some surrounding object.)
You wrote in a comment to the question (better add those things to the question itself, it has an "edit" button):
Without getting into too many details, I'm writing an application that runs within a larger program. Normally, the objects would get garbage-collected after I was done with them, but the larger program maintains them, thus the need for a unique name for each. If I don't give each a unique name, the old object will get overwritten, but it is still needed in the context of the greater program.
So, you want to retain the objects to avoid garbage collection? Use an array (or List or anything else).
The thing is, if you want your larger program to be able to use these objects, you somehow have to give them to this larger program anyway. And then this program would have to retain references to these objects, thereby avoiding garbage collection. So it looks you want to solve a problem which does not exist by means which do not exist :-)
Not really an answer to the question you asked, but a possible solution to your problem: using a map.
Map variables = new HashMap();
while (someArbitrarySet.hasNext()) {
String s = "token" + Math.random();
variables.put(s, new SomeClass());
variables.get(s).callSomeMethod();
}
That way, you can use the "variable name" as the keys into the map, and you can get by without messing with the lexer/parser.
I really hope there is a way to do specifically what you state in Java - it would be really cool.
No. That's not possible.
Even if you could I can't think on a way to invoke them, because there won't be compiling code that could successfully reference them.
So the options are the one described by CanjuLuke or to create your own java parser, probably using ANTRL sample Java grammar and hook what you need there.
Consider the map solution.
This is answered in How do you use Java 1.6 Annotation Processing to perform compile time weaving? .
In short, there is an annotation processing tool that allows you to extend java syntax, and create DSLs that compile to java annotations.
Under JDK 1.5 you had to use apt instead of javac, but under 1.6, these are affected by the -processor flag to javac. From javac -help:
-processor <class1>[<class2>,<class3>...]Names of the annotation processors to run; bypasses default discovery process
-processorpath <path> Specify where to find annotation processors
I have a parser written in bigloo scheme functional language which I need to compile into a java class. The whole of the parser is written as a single function. Unfortunately this is causing the JVM compiler to throw a "Method too large" warning and later give "far label in localvar" error. Is there any possible way where I can circumvent this error? I read somewhere about a DontCompileHugeMethods option, does it work? Splitting the function doesnt seem to be a viable option to me :( !!
Is there any possible way where I can circumvent this error?
Well, the root cause of this compiler error is that there are hard limits in the format of bytecode files. In this case, the problem is that a single method can consist of at most 65536 bytes of bytecodes. (See the JVM spec).
The only workaround is to split the method.
Split the method in related operations or splitting utilities separately.
Well, the case is a bit different
here, the method only consists of a
single function call. Now this
function has a huge parameter list(the
whole of the parser actually!!). So I
have no clues how to split this!!
The way to split up such a beast could be:
define data holder objects for your parameters (put sets of parameters in objects according to the ontology of your data model),
build those data holder objects in their own context
pass the parameter objects to the function
Quick and Dirty: Assign all your parameters to class variables of the same name (you must rename your parameters) at the beginning of your function and start chopping up your function in pieces and put those pieces in functions. This should guarantee that your function will basically operate with the same semantics.
But, this will not lead to pretty code!