JavaCC using input as a 'token' - java

I've been puzzling over this for days and searching doesn't seem to give any results. Makes me wonder if it's possible. For example:
funct functionNAME (Object o) { o+1 };
The point is that The user has to use the identifier 'o' within the curly braces and not some other identifier. This is of course specified by the input in the (Object o) part where 'o' can be anything. Basically the identifier within the curly braces must be the same as the identifier defined in the parameter. I know I can store the matched token and print it out to screen but is it possible to use it as a lexical token itself? Thanks.

Yes there is a better way to do this. You need a symbol table. The job of a symbol table is to keep track of which identifiers can be used at each point in the program. Generally the symbol table also contains other information about the identifiers, such as what they represent (e.g. variable or function name) and what their types are.
Using a symbol table you can detect the use of variables that are not in scope during parsing for many languages but not all. E.g. C and Pascal are languages where identifiers must be declared before they are used (with a few exceptions). But other languages (e.g. Java) allow identifiers to be declared after they are used and in that case it is best not to try to detect errors such as use of an undeclared variable until after the program is parsed. (Indeed in Java you need to wait until all files are parsed, as identifiers might be declared in another file.)
I'll assume a simple scenario, which is that you only need to record information about variables, that there is no type information, and that things must be declared before use. That will get you started. I haven't bothered about adding the function name to the symbol table.
Suppose a symbol table is a stack of things called frames. Each frame is a mutable set of strings. (Later you may want to change that to a mutable map from strings to some additional information.)
void Start(): { }
{
<FUNCTION>
<IDENTIFIER>
{symttab.pushNewFrame() ;}
<LBRACKET> Parameters() <RBRACKET>
<LBRACE> Expression() <RBRACE>
{symtab.popFrame() ; }
}
void Parameters() : {}
{
( Parameter() (<COMMA> Parameter() )* )?
}
void Parameter() : { Token x ; }
<OBJECT> x=<IDENTIFIER>
{ if( symtab.topFrame().contains(x.image) ) reportError( ... ) ; }
{ symtab.topFrame().add(x.image) ; }
}
void Expression() : { }
{
Exp1() ( <PLUS> Exp1() )*
}
void Exp1() : { Token y ; }
{
y = <IDENTIFIER>
{ if( ! symtab.topFrame().contains(y.image) ) reportError( ... ) ; }
|
<NUMBER>
}

you can store the value of the identifier matchin o, and then check in the curly brace if the identifier there is the same, and, if not, throw an Exception.

Okay I have worked out a way to get what I want based on the example I gave in OP. It is a simple variant of the solution I have implemented in mine just to give a proof of concept. Trivial things such as token definitions will be left out for simplicity.
void Start():
{
Token x, y;
}
{
<FUNCTION>
<FUNCTION_NAME>
<LBRACKET>
<OBJECT>
x = <PARAMETER>
<RBRACKET>
<LBRACE>
y = <PARAMETER>
{
if (x.image.equals(y.image) == false)
{
System.out.println("Identifier must be specified in the parameters.");
System.exit(0);
}
}
<PLUS>
<DIGIT>
<RBRACE>
<COLON>
}
Is there a better way to do this?

Related

Is there a standard function to determine if a String is a valid variable/funciton name in Kotlin/Java?

In Kotlin/Java, is there a standard function to determine if a String is a valid variable/function name without having to wrap it in back-ticks?
As in
functionIAmLookingFor("do-something") shouldBe false
functionIAmLookingFor("doSomething") shouldBe true
Edit: I do not want to enclose everything in backticks.
The reason why we need this: we have a tool that serializes instances into compilable Kotlin. And we have encountered the following edge case:
enum class AssetType { TRANSFERRABLE, `NON-TRANSFERRABLE` }
so as we reflect an instance with a field NON-TRANSFERRABLE, we need to wrap it in back-ticks:
val actual = MyAsset( type = `NON-TRANSFERRABLE`
This is why I'm asking this. Currently we are just saying in README that we do not support any names that require back-ticks at this time.
You could do it manually:
https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isJavaIdentifierPart(char)
https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isJavaIdentifierStart(char)
Something like this:
boolean isJavaIdentifier(String s) {
if (s == null || s.isEmpty()) return false;
if (!Character.isJavaIdentifierStart(s.charAt(0)) {
return false;
}
for (int i = 1, n = s.length(); i < n; ++i) {
if (!Character.isJavaIdentifierPart(s.charAt(i)) {
return false;
}
}
return true;
}
I don't know for Kotlin, but I don't think there is much difference using
770grappenmaker answer as reference.
I took a quick look at the kotlin compiler lexer. It has some predefined variables, here is an excerpt:
LETTER = [:letter:]|_
IDENTIFIER_PART=[:digit:]|{LETTER}
PLAIN_IDENTIFIER={LETTER} {IDENTIFIER_PART}*
ESCAPED_IDENTIFIER = `[^`\n]+`
IDENTIFIER = {PLAIN_IDENTIFIER}|{ESCAPED_IDENTIFIER}
FIELD_IDENTIFIER = \${IDENTIFIER}
Source: https://github.com/JetBrains/kotlin/blob/master/compiler/psi/src/org/jetbrains/kotlin/lexer/Kotlin.flex
These seem to be some kind of regexes, you could combine them to your needs and just match on them. As far as I can tell, this is how the compiler validates identifier names.
Edit: of course, this is code of the lexer, which means that if it finds any other token, it is invalid. All tokens and how to identify them are defined in that file and in the KtTokens file. You could use this information as a reference to find illegal tokens. For java, use the answer of NoDataFound.

For loops with antlr 4

Im working on a UNI project and we have to develop a programming language from scratch. We use antlr4 to generate the parse tree. I'm currently working on getting a for loop to work, I have the grammar and can take the values out. My current problem is how to loop the statements in the body of the for loop.
Here is my grammar:
grammar LCT;
program: stmt*;
stmt: assignStmt
| invocationStmt
| show
| forStatement
;
assignStmt: VAR ID '=' expr;
invocationStmt: name=ID ((expr COMMA)* expr)?;
expr: ID | INT | STRING;
show: 'show' (INT | STRING | ID);
block : '{' statement* '}' ;
statement : block
| show
| assignStmt
;
forStatement : 'loop' ('(')? forConditions (')')? statement* ;
forConditions : iterator=expr 'from' startExpr=INT range='to' endExpr=INT ;
//tokens
COMMA: ',';
VAR: 'var';
INT: [0-9]+;
STRING: '"' (~('\n' | '"'))* '"';
ID: [a-zA-Z_] [a-zA-Z0-9_]*;
WS: [ \n\t\r]+ -> skip;
And this is the current listener that supports assigning and printing ints
package LCTlang;
import java.util.HashMap;
public class LCTCustomBaseListener extends LCTBaseListener {
HashMap<String, Integer> variableMap = new HashMap();
String[] keyWords = {"show", "var"};
#Override public void exitAssignStmt(LCTParser.AssignStmtContext ctx) {
this.variableMap.put(ctx.ID().getText(),
Integer.parseInt(ctx.expr().getText()));
}
#Override public void exitInvocationStmt(LCTParser.InvocationStmtContext ctx) {
this.variableMap.put(ctx.name.getText(),
Integer.parseInt(ctx.ID().getText()));
}
#Override
public void exitShow(LCTParser.ShowContext ctx) {
if(ctx.INT() != null) {
System.out.println(ctx.INT().getText());
}
if(ctx.STRING() != null) {
System.out.println(ctx.ID().getText());
}
else if(ctx.ID() != null) {
System.out.println(this.variableMap.get(ctx.ID().getText()));
}
}
#Override public void exitForStatement(LCTParser.ForStatementContext ctx) {
int start = Integer.parseInt(ctx.forConditions().startExpr.getText());
int end = Integer.parseInt(ctx.forConditions().endExpr.getText());
String varName = ctx.forConditions().iterator.getText();
int i;
for (i = start; i < end; i++) {
for (LCTParser.StatementContext state : ctx.statement()){
System.out.println(state);
}
}
}
}
My problem is in the looping of the statements, and how that is done.
A listener is going to be a poor choice for execution. You've turned the tree navigation over to a Tree Walker (hitting each node only once), that calls you back when it encounters nodes you're interested in. You won't convince it to walk the children nodes of some iteration node (while, for, etc.) more than once, and that's pretty much the point of iteration structures. It won't detect that a node is a call to a function and then navigate to that function. It's JUST walking through the ParseTree.
For some, fairly simple grammars (usually something like an expression evaluator (maybe a calculator), you could set up a visitor that returns whatever your expression datatype is (probably a Float for a calculator).
In your case, I'd suggest that ANTLR has provided its value. It's a Parser and has provided a ParseTree for you. Now it's up to you to write code that utilizes that parse tree to execute the functionality. You're now in the world of creating a runtime for your language. Thank ANTLR for making it easy to parse, and providing nice error messages and robust error recovery.
To execute your logic, you'll need to write code that uses those data structures, keeps up with the current value of variables, and, based on those values, decides to execute everything contained in that for/while/... loop. You'll have similar runtime work to evaluate boolean expressions to decide whether to execute children in if/else statements, etc. This runtime will also have to keep up with call stacks of functions calling other functions, etc. In short, executing your resulting logic will involve referencing the parsed input, but won't particularly look like navigating your parse Tree.
(Note: many people find a full parse tree to be a bit tedious to navigate (it tends to have a lot of intermediate nodes). In the past, I've written a Visitor to produce something more like an AST (Abstract Syntax Tree). It's a trimmed down tree that has the structure I want for further processing. This is not necessarily required, but you may find it easier to work with.)

is it possible to extend a rule using a specified set of parameters or pass parameters into a rule in Drools?

I have a large number of almost duplicate rules to implement. They are the exact same but the right hand sides are slightly different. eg
rule "AS77" extends "IVD01"
#description(" IF Individual Identifier = E, THEN Other Description is required. ")
when
$source : Owner($errCode : "427")
then
end
rule "AS78" extends "IVD01"
#description(" IF Individual Identifier = E, THEN Other Description is required. ")
when
$source : JntOwner($errCode : "428")
then
end
rule "IDV01"
#description("IF Individual Identifier = E, THEN Other Description is required. ")
when
IDVerify(idType == "E", otherDscrp == null) from $source
then
_reject.getErrorCode().add($errCode.toString());
end
I would like to do something like the above but I know that I can't because "$source" is in the child rule. I also know that I can't switch it around and have the rule with the condition extend the other rules because a Drools rule can only extend one other rule. In java I would make a method like,
static void evalIdVerify(IDVerify idv, String errorCode) {
if ("E".equals(idv.getIdType()) && idv.getOtherDescript() == null) {
_reject.getErrorCode().add(errorCode);
}
}
and use it as required. Is there any way to write a rule that can be called like a method and takes parameters? Or some other solution that isn't just writing the same rule over and over again? I have no control over the classes and these rules are modified every year by a third party so for maintainability I would really like them to only be defined once. Any help would be appreciated. Thanks.
But you can "switch it around". All you need to do is to insert the IDVerify object as a fact.
rule "IDV01"
#description("IF Individual Identifier = E, THEN Other Description is required. ")
when
$id: IDVerify(idType == "E", otherDscrp == null)
then
end
rule "IDV01 for Owner" extends "IVD01"
when
$source: Owner($errCode : "427", id == $id)
then
_reject.getErrorCode().add($errCode.toString());
end
Another option is to write it as a single rule for each check:
rule "IDV01 for Owner"
when
$source: Owner($errCode : "427",
$id.idType == "E", $id.otherDscrp == null)
then
_reject.getErrorCode().add($errCode.toString());
end
You can put the IDVerify test into a boolean function or static method, reduces the amount of code duplication.

Use column as regex pattern with QueryDSL

I'm trying to use a column as a regular expression to match against a user provided string, but can't figure out how to do it with QueryDSL. Mostly I can't figure out how to put the user supplied string on the lefthand side of the expression.
Basically I'm looking to do something similar to the following, where ~ is my databases symbol for regex matching…
SELECT * FROM thing WHERE 'user supplied string' ~ thing.match
The following works
Expressions.predicate(Ops.MATCHES, Expressions.constant(...), path)
I don't know that this is the best way, but the only solution I was able to get working was to subclass StringExpression
class ConstantExpression extends StringExpression {
private Constant<String> constant;
static public ConstantExpression create(String constant) {
return new ConstantExpression(new ConstantImpl<String>(constant);
}
public ConstantExpression(Constant<String> mixin) {
super(mixin);
constant = mixin;
}
public <R,C> R accept(Visitor<R,C> v, C context) {
return v.visit(constnat, context);
}
}
Then I was able to use that as the lefthand side of the equation…
createJPAQueryFactory().from(qthing)
.where(ConstantExpression.create("user supplied…").like(thing.match)

Error "Cyclic linking detected" while calling a referenced object in a ScopeProvider

I am currently implementing cross-referencing for my Xtext dsl. A dsl file can contain more then one XImportSection and in some special case an XImportSection does not necessariely contain all import statements. It means I need to customize the "XImportSectionNamespaceScopeProvider" to find/build the correct XimportSection. During the implementation I figured out an unexpected behavior of the editor and/or some validation.
I used the following dsl code snipped for testing my implementation:
delta MyDelta {
adds {
package my.pkg;
import java.util.List;
public class MyClass
implements List
{
}
}
modifies my.pkg.MyClass { // (1)
adds import java.util.ArrayList;
adds superclass ArrayList<String>;
}
}
The dsl source code is described by the following grammar rules (not complete!):
AddsUnit:
{AddsUnit} 'adds' '{' unit=JavaCompilationUnit? '}';
ModifiesUnit:
'modifies' unit=[ClassOrInterface|QualifiedName] '{'
modifiesPackage=ModifiesPackage?
modifiesImports+=ModifiesImport*
modifiesSuperclass=ModifiesInheritance?
'}';
JavaCompilationUnit:
=> (annotations+=Annotation*
'package' name=QualifiedName EOL)?
importSection=XImportSection?
typeDeclarations+=ClassOrInterfaceDeclaration;
ClassOrInterfaceDeclaration:
annotations+=Annotation* modifiers+=Modifier* classOrInterface=ClassOrInterface;
ClassOrInterface: // (2a)
ClassDeclaration | InterfaceDeclaration | EnumDeclaration | AnnotationTypeDeclaration;
ClassDeclaration: // (2b)
'class' name=QualifiedName typeParameters=TypeParameters?
('extends' superClass=JvmTypeReference)?
('implements' interfaces=Typelist)?
body=ClassBody;
To provide better tool support, a ModifiesUnit references the class which is modified. This Xtext specific implementation enables hyperlinking to the class.
I am currently working on customized XImportSectionScopeProvider which provides all namespace scopes for a ModifiesUnit. The default implemantation contain a method protected List<ImportNormalizer> internalGetImportedNamespaceResolvers(EObject context, boolean ignoreCase) assumes that there is only one class-like element in a source file. But for my language there can be more then one. For this reason I have to customize it.
My idea now is the following implementation (using the Xtend programming language):
override List<ImportNormalizer> internalGetImportedNamespaceResolvers(EObject context, boolean ignoreCase) {
switch (context) {
ModifiesUnit: context.buildImportSection
default: // ... anything else
}
}
Before I startet this work, the reference worked fine and nothing unexpected happend. My goal now is to build a customized XImportSection for the ModifiesUnit which is used by Xbase to resolve references to JVM types. To do that, I need a copy of the XImportSection of the referenced ClassOrInterface. To get access to the XImportSection, I first call ModifiesUnit.getUnit(). Directly after this call is executed, the editor shows the unexpected behaviour. The minimal implementation which leads to the error looks like this:
def XImportSection buildImportSection(ModifiesUnit u) {
val ci = u.unit // Since this expression is executed, the error occurs!
// ...
}
Here, I don't know what is going internally! But it calculates an error. The editor shows the follwoing error on the qualified name at (1): "Cyclic linking detected : ModifiesUnit.unit->ModifiesUnit.unit".
My questions are: What does it mean? Why does Xtext show this error? Why does it appear if I access the referenced object?
I also figured out a strange thing there: In my first approach my code threw a NullPointerException. Ok, I tried to figure out why by printing the object ci. The result is:
org.deltaj.scoping.deltaJ.impl.ClassOrInterfaceImpl#4642f064 (eProxyURI: platform:/resource/Test/src/My.dj#xtextLink_::0.0.0.1.1::0::/2)
org.deltaj.scoping.deltaJ.impl.ClassDeclarationImpl#1c70366 (name: MyClass)
Ok, it seems to be that this method is executed two times and Xtext resolves the proxy between the first and second execution. It is fine for me as long as the received object is the correct one once. I handle it with an if-instanceof statement.
But why do I get two references there? Does it rely on the ParserRule ClassOrInterface (2a) which only is an abstract super rule of ClassDeclaration (2b)? But why is Xtext not able to resolve the reference for the ClassOrInterface?
OK, now I found a solution for my problem. During I was experimenting with my implementation, I saw that the "Problems" view stil contained unresolved references. This was the reason to rethink what my implementation did. At first, I decided to build the returned list List<ImportNormalizer directly instead of building an XImportSection which then will be converted to this list. During implementing this, I noticed that I have built the scope only for ModifiesUnitelements instead of elements which need the scope within a ModifiesUnit. This is the reason for the cyclic linking error. Now, I am building the list only if it is needed. The result is that the cyclic linking error occurs does not occur any more and all references to JVM types are resolved correctly without any errors in the problems view.
My implementation now looks like this:
class DeltaJXImportSectionNamespaceScopeProvider extends XImportSectionNamespaceScopeProvider {
override List<ImportNormalizer> internalGetImportedNamespaceResolvers(EObject context, boolean ignoreCase) {
// A scope will only be provided for elements which really need a scope. A scope is only necessary for elements
// which are siblings of a JavaCompilationUnit or a ModifiesUnit.
if (context.checkElement) { // (1)
return Collections.emptyList
}
// Finding the container which contains the import section
val container = context.jvmUnit // (2)
// For a non null container create the import normalizer list depending of returned element. If the container is
// null, no scope is needed.
return if (container != null) { // (3)
switch (container) {
JavaCompilationUnit: container.provideJcuImportNormalizerList(ignoreCase)
ModifiesUnit: container.provideMcuImportNormalizerList(ignoreCase)
}
} else {
Collections.emptyList
}
}
// Iterates upwards through the AST until a ModifiesUnit or a JavaCompilationUnit is found. (2)
def EObject jvmUnit(EObject o) {
switch (o) {
ModifiesUnit: o
JavaCompilationUnit: o
default: o.eContainer.jvmUnit
}
}
// Creates the list with all imports of a JCU (3a)
def List<ImportNormalizer> provideJcuImportNormalizerList(JavaCompilationUnit jcu, boolean ignoreCase) {
val is = jcu.importSection
return if (is != null) {
is.getImportedNamespaceResolvers(ignoreCase)
} else {
Collections.emptyList
}
}
// Creates the list of all imports of a ModifiesUnit. This implementation is similar to
// getImportedNamespaceResolvers(XImportSection, boolean) // (3b)
def List<ImportNormalizer> provideMcuImportNormalizerList(ModifiesUnit mu, boolean ignoreCase) {
val List<ImportNormalizer> result = Lists.newArrayList
result.addAll((mu.unit.jvmUnit as JavaCompilationUnit).provideJcuImportNormalizerList(ignoreCase))
for (imp : mu.modifiesImports) {
if (imp instanceof AddsImport) {
val decl = imp.importDeclaration
if (!decl.static) {
result.add(decl.transform(ignoreCase))
}
}
}
result
}
// Creates an ImportNormalizer for a given XImportSection
def ImportNormalizer transform(XImportDeclaration decl, boolean ignoreCase) {
var value = decl.importedNamespace
if (value == null) {
value = decl.importedTypeName
}
return value.createImportedNamespaceResolver(ignoreCase)
}
// Determines whether an element needs to be processed. (1)
def checkElement(EObject o) {
return o instanceof DeltaJUnit || o instanceof Delta || o instanceof AddsUnit || o instanceof ModifiesUnit ||
o instanceof RemovesUnit
}
}
As one can see, elements which do not need namespaces for correct scopes, will be ignored (1).
For each element which might need namespace for a correct scope the n-father element which directly contains the imports is determined (2).
With the correct father element the namespace list can be calculated (3) for JCU's (3a) and MU's (3b).

Categories