The target class is:
class Example{
public void m(){
System.out.println("Hello" + 1);
}
}
I want to get the full string of MethodInvocation "System.out.println("Hello" + 1)" for some regex check. How to write?
public class Rule extends BaseTreeVisitor implements JavaFileScanner {
#Override
public void visitMethodInvocation(MethodInvocationTree tree) {
//get the string of MethodInvocation
//some regex check
super.visitMethodInvocation(tree);
}
}
I wrote some code inspection rules using eclipse jdt and idea psi whose expression tree node has these attributes. I wonder why sonar's just has first and last token instead.
Thanks!
An old question, but I have a solution.
This works for any sort of tree.
#Override
public void visitMethodInvocation(MethodInvocationTree tree) {
int firstLine = tree.firstToken().line();
int lastLine = tree.lastToken().line();
String rawText = getRelevantLines(firstLine, lastLine);
// do your thing here with rawText
}
private String getRelevantLines(int startLine, int endLine) {
StringBuilder builder = new StringBuilder();
context.getFileLines().subList(startLine, endLine).forEach(builder::append);
return builder.toString();
}
If you want to refine further, you can also use firstToken().column or perhaps use the method name in your regex.
If you want more lines/bigger scope, just use the parent of that tree tree.parent()
This will also handle cases where the expression/params/etc span multiple lines.
There might be a better way... but I don't know of any other way. May update if I figure out something better.
Related
i am building a simple grammar parser, with regex. It works but now i want to add Abstract Syntax Tree. But i still dont understand how to set it up. i included the parser.
The parser gets a string and tokeniaze it with the lexer.
The tokens include the value and a type.
Any idea how to setup nodes to build a AST?
public class Parser {
lexer lex;
Hashtable<String, Integer> data = new Hashtable<String, Integer>();
public Parser( String str){
ArrayList<Token> token = new ArrayList<Token>();
String[] strpatt = { "[0-9]*\\.[0-9]+", //0
"[a-zA-Z_][a-zA-Z0-9_]*",//1
"[0-9]+",//2
"\\+",//3
"\\-",//4
"\\*",//5
"\\/",//6
"\\=",// 7
"\\)",// 8
"\\("//9
};
lex = new lexer(strpatt, "[\\ \t\r\n]+");
lex.set_data(str);
}
public int peek() {
//System.out.println(lex.peek().type);
return lex.peek().type;
}
public boolean peek( String[] regex) {
return lex.peek(regex);
}
public void set_data( String s) {
lex.set_data(s);
}
public Token scan() {
return lex.scan();
}
public int goal() {
int ret = 0;
while(peek() != -1) {
ret = expr();
}
return ret;
}
}
Currently, you are simply evaluating as you parse:
ret = ret * term()
The easiest way to think of an AST is that it is just a different kind of evaluation. Instead of producing a numeric result from numeric sub-computations, as above, you produce a description of the computation from descriptions of the sub-computations. The description is represented as small structure which contains the essential information:
ret = BuildProductNode(ret, term());
Or, perhaps
ret = BuildBinaryNode(Product, ret, term());
It's a tree because the Node objects which are being passed around refer to other Node objects without there ever being a cycle or a node with two different parents.
Clearly there are a lot of details missing from the above, particularly the precise nature of the Node object. But it's a rough outline.
I am new to ANTLR, I have a list of functions which are mostly of nested types.
Below are the examples for functions:
1. Function.add(Integer a,Integer b)
2. Function.concat(String a,String b)
3. Function.mul(Integer a,Integer b)
If the input is having:
Function.concat(Function.substring(String,Integer,Integer),String)
So by using ANTLR with Java program, how to define and validate whether the function names are correct and parameter count and datatypes are correct, which has to be recursive as the Function will be in deeply nested format?
validate test class:
public class FunctionValidate {
public static void main(String[] args) {
FunctionValidate fun = new FunctionValidate();
fun.test("FUNCTION.concat(1,2)");
}
private String test(String source) {
CodePointCharStream input = CharStreams.fromString(source);
return compile(input);
}
private String compile(CharStream source) {
MyFunctionsLexer lexer = new MyFunctionsLexer(source);
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
MyFunctionsParser parser = new MyFunctionsParser(tokenStream);
FunctionContext tree = parser.function();
ArgumentContext tree1= parser.argument();
FunctionValidateVisitorImpl visitor = new FunctionValidateVisitorImpl();
visitor.visitFunction(tree);
visitor.visitArgument(tree1);
return null;
}
}
Visitor impl:
public class FunctionValidateVisitorImpl extends MyFunctionsParserBaseVisitor<String> {
#Override
public String visitFunction(MyFunctionsParser.FunctionContext ctx) {
String function = ctx.getText();
System.out.println("------>"+function);
return null;
}
#Override
public String visitArgument(MyFunctionsParser.ArgumentContext ctx){
String param = ctx.getText();
System.out.println("------>"+param);
return null;
}
}
System.out.println("------>"+param); this statement is not printing argument it is only printing ------>.
This task can be accomplished by implementing two main steps:
1) Parse given input and build an Abstract Syntax Tree (AST).
2) Traverse the tree and validate each function, each argument, one after another, using a Listener or a Visitor patterns.
Fortunately, ANTLR provides tools for implementing both steps.
Here's a simple grammar I wrote based on your example. It does recursive parsing and builds the AST. You may want to extend its functionality to meet your needs.
Lexer:
lexer grammar MyFunctionsLexer;
FUNCTION: 'FUNCTION';
NAME: [A-Z]+;
DOT: '.';
COMMA: ',';
L_BRACKET: '(';
R_BRACKET: ')';
WS : [ \t\r\n]+ -> skip;
Parser:
parser grammar MyFunctionsParser;
options {
tokenVocab=MyFunctionsLexer;
}
function : FUNCTION '.' NAME '('(argument (',' argument)*)')';
argument: (NAME | function);
Important thing to notice here: the parser does not make distinction between a valid (from your point of view) and invalid functions, arguments, number of arguments, etc.
So the function like Function.whatever(InvalidArg) is also a valid construction from parser's point of view. To further validate the input and test whether it meets your requirements (which is a predefined list of functions and their arguments), you have to traverse the tree using a Listener or a Visitor (I think Visitor fits here perfectly).
To get a better understanding of what it is I'd recommend reading this and this. But if you want to get deeper into the subject, you should definitely look at "The Dragons Book", which covers the topic exhaustively.
Hi everyone I have the following code in my .jjt file for my abstract syntax tree for checking track if where the nodes are made within the file that is passed to it but I cannot access this variable from my semantic checker class.
The code is bellow and any help would be appreciated! I've tried everything and I'm losing hope at this stage.
This is the integer in the .jjt file i'd like to access
TOKEN_MGR_DECLS :
{
static int commentNesting = 0;
public static int linenumber = 0;
}
SKIP : /*STRUCTURES AND CHARACTERS TO SCAPE*/
{
" "
| "\t"
| "\n" {linenumber++;}
| "\r"
| "\f"
}
An example of one of my nodes
void VariableDeclaration() #VariableDeclaration : {Token t; String id; String type;}
{
t = <VARIABLE> id = Identifier() <COLON> type = Type()
}
My semantic checker class
public class SemanticCheckVisitor implements "My jjt file visitor" {
public Object visit(VariableDeclaration node, Object data) {
node.childrenAccept(this, data);
return data;
}
How would it be possible to get the linenumber which this node was declared?
Thanks everyone.
}
You can see an example of this in the Teaching Machine's Java parser, which is here.
First you need to modify your SimpleNode type to include a field for the line number. In the TM I added a declaration
private SourceCoords myCoords ;
where SourceCoords is a type that includes not only the line number, but also information about what file the line was in. You can just use an int field. Also in SimpleNode you need to declare some methods like this
public void setCoords( SourceCoords toSet ) { myCoords = toSet ; }
public SourceCoords getCoords() { return myCoords ; }
You might want to declare them in the Node interface too.
Over in your .jjt file, use the option
NODE_SCOPE_HOOK=true;
And declare two methods in your parser class
void jjtreeOpenNodeScope(Node n) {
((SimpleNode)n).setCoords( new SourceCoords( file, getToken(1).beginLine ) ) ;
}
void jjtreeCloseNodeScope(Node n) {
}
Hmm. I probably should have declared the methods in Node to avoid that ugly cast.
One more thing, you are keeping count of the lines yourself. It's better the get the line number from the token, like I did. Your counter will generally by one token ahead. But when the parser looks ahead, it could be several tokens ahead.
If the token manager isn't keeping count of the lines correctly, then use your own count, but communicate it to the parser through an extra added field in the Token class.
Generally it's a bad idea to compute anything in the token manager and then use it in the parser unless its information you store in the tokens.
Most people understand the innate benefits that enum brings into a program verses the use of int or String. See here and here if you don't know. Anyway, I came across a problem that I wanted to solve that kind of is on the same playing field as using int or String to represent a constant instead of using an enum. This deals specifically with String.format(...).
With String.format, there seems to be a large opening for programmatic error that isn't found at compile-time. This can make fixing errors more complex and / or take longer.
This was the issue for me that I set out to fix (or hack a solution). I came close, but I am not close enough. For this problem, this is more certainly over-engineered. I understand that, but I just want to find a good compile-time solution to this, that provides the least amount of boiler-plate code.
I was writing some non-production code just to write code with the following rules.
Abstraction was key.
Readability was very important
Yet the simplest way to the above was preferred.
I am running on...
Java 7 / JDK 1.7
Android Studio 0.8.2
These are unsatisfactory
Is there a typesafe alternative to String.format(...)
How to get string.format to complain at compile time
My Solution
My solution uses the same idea that enums do. You should use enum types any time you need to represent a fixed set of constants...data sets where you know all possible values at compile time(docs.oracle.com). The first argument in String.format seems to fit that bill. You know the whole string beforehand, and you can split it up into several parts (or just one), so it can be represented as a fixed set of "constants".
By the way, my project is a simple calculator that you probably seen online already - 2 input numbers, 1 result, and 4 buttons (+, -, ×, and ÷). I also have a second duplicate calculator that has only 1 input number, but everything else is the same
Enum - Expression.java & DogeExpression.java
public enum Expression implements IExpression {
Number1 ("%s"),
Operator (" %s "),
Number2 ("%s"),
Result (" = %s");
protected String defaultFormat;
protected String updatedString = "";
private Expression(String format) { this.defaultFormat = format; }
// I think implementing this in ever enum is a necessary evil. Could use a switch statement instead. But it would be nice to have a default update method that you could overload if needed. Just wish the variables could be hidden.
public <T> boolean update(T value) {
String replaceValue
= this.equals(Expression.Operator)
? value.toString()
: Number.parse(value.toString()).toString();
this.updatedString = this.defaultFormat.replace("%s", replaceValue);
return true;
}
}
...and...
public enum DogeExpression implements IExpression {
Total ("Wow. Such Calculation. %s");
// Same general code as public enum Expression
}
Current Issue
IExpression.java - This is a HUGE issue. Without this fixed, my solution cannot work!!
public interface IExpression {
public <T> boolean update(T Value);
class Update { // I cannot have static methods in interfaces in Java 7. Workaround
public static String print() {
String replacedString = "";
// for (Expression expression : Expression.values()) { // ISSUE!! Switch to this for Expression
for (DogeExpression expression : DogeExpression.values()) {
replacedString += expression.updatedString;
}
return replacedString;
}
}
}
So Why Is This An Issues
With IExpression.java, this had to hacked to work with Java 7. I feel that Java 8 would have played a lot nicer with me. However, the issue I am having is paramount to getting my current implementation working The issue is that IExpression does not know which enum to iterate through. So I have to comment / uncomment code to get it to work now.
How can I fix the above issue??
How about something like this:
public enum Operator {
addition("+"),
subtraction("-"),
multiplication("x"),
division("÷");
private final String expressed;
private Operator(String expressed) { this.expressed = expressed; }
public String expressedAs() { return this.expressed; }
}
public class ExpressionBuilder {
private Number n1;
private Number n2;
private Operator o1;
private Number r;
public void setN1(Number n1) { this.n1 = n1; }
public void setN2(Number n2) { this.n2 = n2; }
public void setO1(Operator o1) { this.o1 = o1; }
public void setR(Number r) { this.r = r; }
public String build() {
final StringBuilder sb = new StringBuilder();
sb.append(format(n1));
sb.append(o1.expressedAs());
sb.append(format(n2));
sb.append(" = ");
sb.append(format(r));
return sb.toString();
}
private String format(Number n) {
return n.toString(); // Could use java.text.NumberFormat
}
}
I have a string which contains an underscore as shown below:
123445_Lisick
I want to remove all the characters from the String after the underscore. I have tried the code below, it's working, but is there any other way to do this, as I need to put this logic inside a for loop to extract elements from an ArrayList.
public class Test {
public static void main(String args[]) throws Exception {
String str = "123445_Lisick";
int a = str.indexOf("_");
String modfiedstr = str.substring(0, a);
System.out.println(modfiedstr);
}
}
Another way is to use the split method.
String str = "123445_Lisick";
String[] parts = string.split("_");
String modfiedstr = parts[0];
I don't think that really buys you anything though. There's really nothing wrong with the method you're using.
Your method is fine. Though not explicitly stated in the API documentation, I feel it's safe to assume that indexOf(char) will run in O(n) time. Since your string is unordered and you don't know the location of the underscore apriori, you cannot avoid this linear search time. Once you have completed the search, extraction of the substring will be needed for future processing. It's generally safe to assume the for simple operations like this in a language which is reasonably well refined the library functions will have been optimized.
Note however, that you are making an implicit assumption that
an underscore will exist within the String
if there are more than one underscore in the string, all but the first should be included in the output
If either of these assumptions will not always hold, you will need to make adjustments to handle those situations. In either case, you should at least defensively check for a -1 returned from indexAt(char) indicating that '_' is not in the string. Assuming in this situation the entire String is desired, you could use something like this:
public static String stringAfter(String source, char delim) {
if(source == null) return null;
int index = source.indexOf(delim);
return (index >= 0)?source.substring(index):source;
}
You could also use something like that:
public class Main {
public static void main(String[] args) {
String str = "123445_Lisick";
Pattern pattern = Pattern.compile("^([^_]*).*");
Matcher matcher = pattern.matcher(str);
String modfiedstr = null;
if (matcher.find()) {
modfiedstr = matcher.group(1);
}
System.out.println(modfiedstr);
}
}
The regex groups a pattern from the start of the input string until a character that is not _ is found.
However as #Bill the lizard wrote, i don't think that there is anything wrong with the method you do it now. I would do it the same way you did it.