JavaCC: How to handle tokens that contain common words - java

I'm trying to create a parser for source code like this:
[code table 1.0]
code table code_table_name
id = 500
desc = "my code table one"
end code table
... and here below is the grammar I defined:
PARSER_BEGIN(CodeTableParser)
...
PARSER_END(CodeTableParser)
/* skip spaces */
SKIP: {
" "
| "\t"
| "\r"
| "\n"
}
/* reserved words */
TOKEN [IGNORE_CASE]: {
<CODE_TAB_HEADER: "[code table 1.0]">
| <CODE_TAB_END: "end" (" ")+ <CODE_TAB_BEGIN>>
| <CODE_TAB_BEGIN: <IDENT> | "code" (" ")+ "table">
| <ID: "id">
| <DESC: "desc">
}
/* token images */
TOKEN: {
<NUMBER: (<DIGIT>)+>
| <IDENT: (<ALPHA>)+>
| <VALUE: (<ALPHA> ["[", "]"])+>
| <STRING: <QUOTED>>
}
TOKEN: {
<#ALPHA: ["A"-"Z", "a"-"z", "0"-"9", "$", "_", "."]>
| <#DIGIT: ["0"-"9"]>
| <#QUOTED: "\"" (~["\""])* "\"">
}
void parse():
{
}
{
expression() <EOF>
}
void expression():
{
Token tCodeTab;
}
{
<CODE_TAB_HEADER>
<CODE_TAB_BEGIN>
tCodeTab = <IDENT>
(
<ID>
<DESC>
)*
<CODE_TAB_END>
}
The problem is that the parser correctly identifies token ("code table")... but it doesn't identifies token IDENT ("code_table_name") since it contains the words already contained in token CODE_TAB_BEGIN (i.e. "code"). The parser complains saying that "code is followed by invalid character _"...
Having said that, I'm wondering what I'm missing in order to let the parser work correctly. I'm a newbie and any help would be really appreciated ;-)
Thanks,
j3d

Your lexer will never produce an IDENT because the production
<CODE_TAB_BEGIN: <IDENT> | "code" (" ")+ "table">
says that every IDENT can be a CODE_TAB_BEGIN and, as this production comes first, it beats the production for IDENT by the first match rule. (RTFFAQ)
Replace that production by
<CODE_TAB_BEGIN: "code" (" ")+ "table">
You will run into trouble with ID and DESC, but this gets you past the second line of input.

Related

javacc (ph-javacc-maven-plugin) generates java switch with case `\`

I'm a newly to javacc. I tried to parse an existing javacc grammar (its the JSR341, EL 3.0 Grammar). It generates (almost) correct java. However, the generated code contains an illegal switch statement. I'm using the ph-javacc-maven-plugin.
private int jjMoveStringLiteralDfa0_0(){
switch(curChar)
{
case '#':
return jjMoveStringLiteralDfa1_0(0x8L);
case '$':
return jjMoveStringLiteralDfa1_0(0x4L);
case '\': // should be '\\'
return jjStartNfaWithStates_0(0, 4, 2);
default :
return jjMoveNfa_0(7, 0);
}
}
This is the offending grammar section from JS341 (although I'm not sure its the grammar itself) that's causing the problem:
<DEFAULT> TOKEN :
{
< LITERAL_EXPRESSION:
((~["\\", "$", "#"])
| ("\\" ("\\" | "$" | "#"))
| ("$" ~["{", "$"])
| ("#" ~["{", "#"])
)+
| "$"
| "#"
>
|
< START_DYNAMIC_EXPRESSION: "${" > {stack.push(DEFAULT);}:
IN_EXPRESSION
|
< START_DEFERRED_EXPRESSION: "#{" > {stack.push(DEFAULT);}:
IN_EXPRESSION
}
<DEFAULT> SKIP : { "\\" }
I played around with the options (JAVA_UNICODE_ESCAPE, UNICODE_INPUT) and grammar. But without result.
Question: how do I make javacc generate valid Java switch statement, i.e., with '\\' instead of '\'?
The observed behaviour is an issue and will be solved in parser-generator-cc 1.1.0.

Newline in datatable Gherkin/Cucumber

I have this datatable in my cucumber scenario:
| name | value
| Description | one \n two \n three |
I want the values to appear in the textarea like this:
one
two
three
Because I need to make bullet points out of them.
So my actual question is, is it possible to use newline characters in one line or is there a better way to approach this?
EDIT: to clarify, it's not working with the code written above:
WebDriverException: unknown error: Runtime. evaluate threw exception: SyntaxError: Invalid or unexpected token
EDIT 2: I'm using a bit of unusual code to access the value, seeing as it is a p element and this is normally not possible:
js.executeScript("document.getElementsByTagName('p')[0].innerHTML = ' " + row.get("value") + " ' ");
This has been working for other rows tho, maybe because i'm using \n now?
You can try this way:
WebDriver driver = new ChromeDriver();
driver.get("https://stackoverflow.com/questions/51786797/newline-in-datatable-gherkin-cucumber/51787544#51787544");
Thread.sleep(3000); // pause to wait until page loads
String s = "SOME<br>WORDS";
JavascriptExecutor js = (JavascriptExecutor) driver;
js.executeScript("document.getElementsByTagName('div')[0].innerHTML = '" + s + "';");
Output:
SOME
WORDS
So, main idea is to use <br> tag as new line separator.
In your case it would be like this:
| name | value
| Description | one<br>two<br>three |
and code would be:
// make sure, that row.get("value") returns a string
js.executeScript("document.getElementsByTagName('p')[0].innerHTML = ' " + row.get("value") + " ' ");

Spring MVC Endpoint not fetch while pass empty map {} into GET Request

Facing the issue into the endpoint fetching
Below is the case while I call the api get the response.
http://localhost:28080/restServices/apps/1762/users/USERNAME/?password=PASSWORD
But when ever I set following data it's not working could any body help me out into this issues.
http://localhost:28080/restServices/apps/1762/users/USERNAME/?password=PASSWORD&data={}
#RequestMapping(value = "/apps/{appId_}/users/{username_}", method = RequestMethod.GET)
#ResponseBody
#Transactional
public UserResponseDTO getUserAndToken(#PathVariable Long appId_, #PathVariable String username_, #RequestParam("password") String password_, #RequestParam("data") String datas) throws Exception {
//do stuff
}
EDIT
This problem with any edit it's works into the Tomcat Version 7.0.63 While another version 7.0.73, 8.0.x + not working.
You forgot to specify {appId_} in value = "/apps/users/{username_}"
Fix: value = "/apps/{appId_}/users/{username_}"
~~~
Your method accepts #RequestParam("data") String datas
and you're sending data={}
String should be quoted so the fix is data=""
This kind of behaviour affect of major update of Tomcat.
For a quick fix, you can downgrade to one of older versions, I downgraded tomcat with 7.0.63 .
After getting more R&D on Spring MVC It's reject the request while adding list of invalid character into request listed below.
Excluded US-ASCII Characters disallowed within the URI syntax:
control = <US-ASCII coded characters 00-1F and 7F hexadecimal>
space = <US-ASCII coded character 20 hexadecimal>
delims = "<" | ">" | "#" | "%" | <">
List of unwise characters are allowed but may cause problems:
unwise = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"
The following characters are reserved within a query component and have special meaning within a URI/URL:
reserved = ";" | "/" | "?" | ":" | "#" | "&" | "=" | "+" | "$" | ","
For more information visit.
Solution:
encodeURI("http://localhost:28080/restServices/apps/1762/users/USERNAME/?password=PASSWORD&data={}")
> http://localhost:28080/restServices/apps/1762/users/USERNAME/?password=PASSWORD&data=%7B%7D

Consume only commented (/** ..... */ )section of java file thorugh ANTLR 4 and skip the rest

I'm new to ANTLR and getting familiar with ANTLR 4. How to consume only the commented section (/** ... */) from a java file(or any file) and skip the rest.
I do have the following file "t.txt" :-
t.txt
/**
#Key1("value1")
#Key2("value2")
*/
This is the text that we need to skip. Only wanted to read the above commented section.
//END_OF_FILE
AND My grammar file as below:-
MyGrammar.g4
grammar MyGrammar;
file : (pair | LINE_COMMENT)* ;
pair : ID VALUE ;
ID : '#' ('A'..'Z') (~('('|'\r'|'\n') | '\\)')* ;
VALUE : '(' (~('\r'|'\n'))*;
COMMENT : '/**' .*? '*/';
WS : [\t\r\n]+ -> skip;
LINE_COMMENT
: '#' ~('\r'|'\n')* ('\r'|'\n'|EOF)
;
I know the COMMENT rule will read the commented section but here i'm stuck that how should skip the rest of the file content and force the antlr to read ID and value from COMMENT content only.
You can use lexical modes for this. Simply switch to another mode when the lexer stumbles upon "/**" and ignore everything else.
Note that lexical modes cannot be used in a combined grammar. You will have to define a separate lexer- and parser-grammar.
A small demo:
AnnotationLexer.g4
lexer grammar AnnotationLexer;
ANNOTATION_START
: '/**' -> mode(INSIDE), skip
;
IGNORE
: . -> skip
;
mode INSIDE;
ID
: '#' [A-Z] (~[(\r\n] | '\\)')*
;
VALUE
: '(' ~[\r\n]*
;
ANNOTATION_END
: '*/' -> mode(DEFAULT_MODE), skip
;
IGNORE_INSIDE
: [ \t\r\n] -> skip
;
file: AnnotationParser.g4
parser grammar AnnotationParser;
options {
tokenVocab=AnnotationLexer;
}
parse
: pair* EOF
;
pair
: ID VALUE {System.out.println("ID=" + $ID.text + ", VALUE=" + $VALUE.text);}
;
And now simply use the lexer and parser:
String input = "/**\n" +
"\n" +
"#Key1(\"value1\")\n" +
"#Key2(\"value2\")\n" +
"\n" +
"*/\n" +
"\n" +
"This is the text that we need to skip. Only wanted to read the above commented section.\n" +
"\n" +
"//END_OF_FILE";
AnnotationLexer lexer = new AnnotationLexer(new ANTLRInputStream(input));
AnnotationParser parser = new AnnotationParser(new CommonTokenStream(lexer));
parser.parse();
which will produce the following output:
ID=#Key1, VALUE=("value1")
ID=#Key2, VALUE=("value2")

How can I access blocks of text as an attribute that are matched using a greedy=false option in ANTLR?

I have a rule in my ANTLR grammar like this:
COMMENT : '/*' (options {greedy=false;} : . )* '*/' ;
This rule simply matches c-style comments, so it will accept any pair of /* and */ with any arbitrary text lying in between, and it works fine.
What I want to do now is capture all the text between the /* and the */ when the rule matches, to make it accessible to an action. Something like this:
COMMENT : '/*' e=((options {greedy=false;} : . )*) '*/' {System.out.println("got: " + $e.text);
This approach doesn't work, during parsing it gives "no viable alternative" upon reaching the first character after the "/*"
I'm not really clear on if/how this can be done - any suggestions or guidance welcome, thanks.
Note that you can simply do:
getText().substring(2, getText().length()-2)
on the COMMENT token since the first and the last 2 characters will always be /* and */.
You could also remove the options {greedy=false;} : since both .* and .+ are ungreedy (although without the . they are greedy) (i).
EDIT
Or use setText(...) on the Comment token to discard the /* and */ immediately. A little demo:
file T.g:
grammar T;
#parser::members {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream(
"/* abc */ \n" +
" \n" +
"/* \n" +
" DEF \n" +
"*/ "
);
TLexer lexer = new TLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
TParser parser = new TParser(tokens);
parser.parse();
}
}
parse
: ( Comment {System.out.printf("parsed :: >\%s<\%n", $Comment.getText());} )+ EOF
;
Comment
: '/*' .* '*/' {setText(getText().substring(2, getText().length()-2));}
;
Space
: (' ' | '\t' | '\r' | '\n') {skip();}
;
Then generate a parser & lexer, compile all .java files and run the parser containing the main method:
java -cp antlr-3.2.jar org.antlr.Tool T.g
javac -cp antlr-3.2.jar *.java
java -cp .:antlr-3.2.jar TParser
(or `java -cp .;antlr-3.2.jar TParser` on Windows)
which will produce the following output:
parsed :: > abc <
parsed :: >
DEF
<
(i) The Definitive ANTLR Reference, Chapter 4, Extended BNF Subrules, page 86.
Try this:
COMMENT :
'/*' {StringBuilder comment = new StringBuilder();} ( options {greedy=false;} : c=. {comment.appendCodePoint(c);} )* '*/' {System.out.println(comment.toString());};
Another way which will actually return the StringBuilder object so you can use it in your program:
COMMENT returns [StringBuilder comment]:
'/*' {comment = new StringBuilder();} ( options {greedy=false;} : c=. {comment.append((char)c);} )* '*/';

Categories