I am writing a compiler for a subset of the Java language and during writing the scanner i was wondering how do i recognize types that are not primitive but come from a library. For example the String type is not primitive in the Java language. In general, how do i recognize new types from libraries as "keywords" in the language without knowing them in advance. Thanks!
Try getting through http://www.antlr.org/grammar/1152141644268/Java.g (pay attention to the type declaration and its usage) — it will give you some insight.
In brief, when you parse the source, you know where you expect to see a type, be it after modifiers in fields and methods, just before parameter names in formal parameter list, etc. And when you know that String is some type, you can check with import table to get a full name, and go find the actual class in the classpath.
Now, as Bart mentions in the comments (now deleted), the real resolution happens later in the pipeline. At the parsing stage, you only create a node saying that you have a type which is an Identifier which is String. So the answer at this moment would be, "you don't."
Update: try following this thesis, it should help you.
Related
i am new to Antlr and writing in java, and i am currently trying to figure out how i can make the parser identify the token "var" as either an int, string etc. Just as in javascript where you use either var or let. I am trying to make my own programming language which has explicit data types, so that it would be easier for a beginner to start coding, without worrying wether he/she is using an int/string/char and so on.
I dont seem to find any documentation for it online, so i am hoping that someone here can teach me in how i would make this possible
This isn't really a task you'd accomplish directly with ANTLR.
ANTLR generates code to produce a Parser for you. That means it will process your input and produce a data structure (a ParseTree in ANTLRs case), that correctly catagorizes all of your input (assuming it's syntactically correct; an error message otherwise).
In your case you'd have a ParseTree that would correctly identify that you have a var keyword, an Identifier (your variable name) an = and a value. This would probably be the result of matching matching a parse rule for something like assignmentStmt.
With that ParseTree in memory, you'll have listener and/or visitor classes that ANTLR generates to make it quite easy to navigate that ParseTree.
With everything parsed out for you (by ANTLR), it would be up to you, in your own code to do the type inference (what you are describing is type inference rather than "explicit typing"). Or, if you're wanting to allow any type to be assigned to your variable, you really don't have any thing you need to do (You have a typeless language, and no need to verify types. Your runtime would, of course, want to keep track of what the type of the currently assigned value is, but would allow assignment of a new value of any type.)
Antlr's job is to correctly identify all the parts according to your Syntax (Type checking is a Semantic concern, and not something the Parser concerns itself with). It does not create Symbol Tables for you, or attempt to do type inference. These are tasks that are up to you once the input is parsed.
Side note, JavaScript is untyped so, you just have a variable or constant, that can hold anything, there is no type (inferred, or explicit).
Explcit typing would be something like:
var myString : String;
Implcit typing would be something like:
var myVar = "String"
and you would have code that essentially says "They have assigned a String to myVar so, henceforth, myVar is a String type and will accept no value other than a String.
In JavaScript, you're just getting a variable, and you can turn right around as assign it a numeric value, an object, or anything you like (it's typeless).
A parser is a tool to determine if certain input is syntactically correct and can convert the input to a specific data structure, if that's the case (as Mike Cargal explained). That means a parser is a tool to deal with the syntax of the input.
Specifying types and other meta information of the input is applying meaning to certain strings, which is commonly called semantic processing.
Knowing that a parser is a syntax tool should make clear that a parser cannot be used to apply semantics. It's important to differentiate between syntax and semantic, to understand which tool can do what.
How to apply semantics in the way you seem to want is a complete own topic and too broad to be handled in a single question.
I am implementing a spark process in java, and want to make, from a RDD of the same parametrized type, a Dataset<Try<MyPojo>> for some own made MyPojo class and where Try is the scala Try. In scala, the encoder would be made implicitly, but in java I need to provide it explicitly.
Now, I can get a working Encoder<MyPojo> using Encoders.bean(MyPojo.class). And I expect that there is some code to build an Encoder<Try<T>> from an Encoder<T> that is used by the scala implicit. But I cannot find it.
[Note: I just tried in scala and no implicit was found for type Try... So the question is valid in scala too]
So, how am I supposed to do?
After some search I reached to conclusion that
it is not possible (or maybe but it would be overly complicated)
and that's because it is not the way to use Dataset
At first, I considered Dataset to be a super, more generic, version of RDD. But it is not. Actually, it is less generic with respect to type because the type stored in dataset should be "flat" or "flatten-able".
Traditional Pojo have either a flat structure (each field has a value type that can be represented by one column) or can be flatten when fields has a Pojo type. On the other hand, there is no trivial way to "flatten" a type such as Try, which is basically either some type (MyPojo in the question) or an Exception.
And that conclusion also applies on all none-pojo type, such as interfaces which can have several implementation. Obviously this leads to a question: what about classes that are not pojo, eg. because that contains field of Try or interface type. Probably that Encoders.bean would fail at runtime. So much for type-safety...
Well, in conclusion, to solve my problem which is to keep track of failed items, I think I will go for an addition of an "error" column. Or something like that.
I'm trying to find some useful cases for reflection on my own and i've just discovered an interesting thought from this post:
it says,
say you have an object of an unknown type in Java
Can you provide a couple of examples of that with a plain English explanation?
I am so used to C++ & C# where I can type bool. In Java, I am required to type boolean. Also, it requires me to type String with uppercase 'S'. I would love to be able to create project wide aliases for these variable types to enable me to create variables by typing bool and string. Do you have any ideas?
If you want to actually change the syntax of the language, that is not only impossible, but a terrible idea. Aliases would make code indecipherable to others.
However, it is possible to only type 'bool'+{TAB} and 'string'+{TAB} and have NetBeans change it to 'boolean' or 'String', respectively. In fact, you have much more flexibility than that (for example, you could make it 'boo'+{SPACE}='boolean' or 'bo'+{ENTER}='boolean').
Take a look at 'NetBeans->Preferences->Editor->Code Templates' if that is the kind of thing you need; should be pretty self-explanatory.
I used to write a very strong type language, for example, java. I need to tell the complier what type of variable I will put in... for example...
public static void sayHello(String aName)
I can ensure that the user will pass a string to me...
But if I use php, I can do that...
function sayHello($aName)
I still can call the sayHello, but I don't know what the param type......I can let the name more informative like this:
function sayHelloWithString($aName)
But I can't stop the user pass in a int to me..... the user can still pass the int to me... ...it may cause lot of errors....How can I stop it? any ideas or experience shared? Thank you.
How about not stopping the user from passing in an int?
In php, you could check is_string, but of course, you'll miss out on objects that have __toString set, or the implicit conversion of numbers to strings.
If you must make your program cry in pain when a developer tries something different, you could specify a type in the later versions of PHP (i.e. function foo(ObjectType $bar)...)*
In most loosely typed languages, you want to set up fall-backs for the major types:
number
array
string
generic object
Be liberal in what you accept, be strict in what you send.
* Primitive types are not supported for type hinting
There's a few ways to deal with this...
Use an IDE that supports docblocks. This deals with the pre-runtime type checking when writing code.
Use type checking within your function This only helps with the runtime type checking, and you won't know when writing your code.
Depending on the type you can use built-in type hinting. This however only works for non-scalar values, specifically array and a class name.
1 - To implement #1 using a good IDE, you can docblock your function as such:
/**
* Say hello to someone.
*
* #param string $aName
**/
public function sayHello($aName) {
2 - To implement #2 use the is_ methods..
public function sayHello($aName) {
if (!is_string($aName)) {
throw new ArgumentException("Type not correct.");
}
// Normal execution
3 - You can't do this with your method above, but something like this.. Kindof the same as #2 apart from will throw a catchable fatal error rather than ArgumentException.
public function manipulateArray(array $anArray) {
It's worth noting that most of this is pretty irrelevant unless you're writing publicly usable library code.. You should know what your methods accept, and if you're trying to write good quality code in the first place, you should be checking this before hand.
Using a good IDE (I recommend phpStorm a thousand times over) you can and should utilise DocBlocks everywhere you can for all of your classes. Not only will it help when writing APIs and normal code, but you can use it to document your code, what if you need to look at the code 6 months later, chances are you're not going to remember it 100% :-)
Additionally, there's a lot more you can do with docblocks than just define parameter types, look it up.
You can check if what they passed is a string using:
http://php.net/manual/en/function.is-string.php
Then provide appropriate error handling.
function sayHello($aName) {
if (is_string($aName)) {
//string OK!
} else {
echo "sayHello() only takes strings!";
}
}
In PHP you can check whether the variable that has been passes is a string by using the is_string function:
<?php
if (is_string($aName)) {
echo "Yes";
} else {
echo "No";
}
?>
Hope that helps.
Or alternatively /additionally use Type Casting to convert the variable to the required type
http://us3.php.net/manual/en/language.types.type-juggling.php
You have the option of checking to make sure the parameter is of the right type. However, it's worth considering what you'd do if it isn't. If you're just going to throw an exception, you might be better off just assuming it's the right type and the the exception be thrown when something you do isn't allowed. If you're not going to add any more useful information to the exception/error that would already be thrown, then there's not much point in checking it in the first place.
As to giving the user an indication of what type you want, I generally stick with including it in the variable name:
function sayHello($aNameStr)
function addItems($itemList)
...etc...
That, plus reasonable documentation, will mean the user can look at the function and figure out what they should be passing in in the first place.
Some scripting languages have tools that can help you. For example use strict in perl requires declaration of each variable before using. But still the language is weakly typed by definition.
Sometimes naming conventions help. For example we inherited from good old Fortran tradition that int variables' names should start from i, j, k, l, m, n. And this convention is used now at least for indexes.