I have to read java file by java code and to determine the greatest nested count of if statements in it.
for example:
if (someCondition)
{
if (someCondition)
{
// Expression
}
}
In this case program should display greatest nested if depth is 2.
Now the problem is that position of curly brace after if is uncertain.
for example it can be like :
Curly brace start and end comes in same line
if (someCondition){}
OR
Curly brace start in next line
if (someCondition)
{
}
OR
Conditions without curly brace
if (someCondition)
if (someCondition) // Single line without curly brace
Can anybody suggest what will be the best way to get the required nested count?
You'll need to parse the Abstract Syntax Tree (AST) of the Java source code. See Java library for code analysis. Once you have the AST, you can do a search to find the longest path of nested conditionals.
As the answer already said, you should rely on the AST rather than viewing code manually for this. The AST will never be wrong, your own reading abilities most often will.
I don't know a complete solution right now, but I suggest you spend some time looking at existing tools for computing software metrics. Nesting depth is a typical metric and there should be tools around.
If you can't find anything, you can at least fall back to writing something like an Eclipse plugin. In that case, you could simply load the Java file in the Eclipse editor, and Eclipse performs all the hard work for you and gives you the AST for free. Determining the nesting depth of a given AST is then rendered a simple task. Developing a prototype for that shouldn't take more than a few hours. And it's easy to extend it to cover your whole project and have it answer questions like "which java file in our project has the maximum nesting depth and what depth is that?". But then again.. someone else will surely point out an existing tool that already does this and much more.
I82Much's answer will certainly get you there, but feels a little like cheating.
Knowing little about your project, I would think that a simple stack mechanism with a max value record would do the trick push on { and pop on }. Once you have that basic model working, simply add the special case of control statements with one line bodies (this is valid for if, for, while ...). In those cases, you'll be looking for those keywords, followed by ( and a ). Once you've encountered that combination, if the scan encounters either another control statement or a semi-colon before it encounters a { then this is one of those special cases and you should push (using a special marker indicating to pop on ; rather than }).
Related
Background:
Before asking my question, I wish to state that I have checked the following links:
Identify loops in java byte code
goto in Java bytecode
http://blog.jamesdbloom.com/JavaCodeToByteCode_PartOne.html
I can detect the loops in the bytecode (class files) using dominator analysis algorithm-based approach of detecting back edges on the control-flow graphs (https://en.wikipedia.org/wiki/Control_flow_graph).
My problem:
After detection of loops, you can end up having two loops (defined by two distinct back edges) sharing the same loop head. This can be created by the following two cases as I have realized: (Case 1) In the source code, you have a for or while loop with a continue statement, (Case 2) In the source code, you have two loops - an outer loop that is a do-while and an inner loop; and no instructions between these loops.
My question is the following: By only looking at the bytecode, how can you distinguish between these two cases?
My thoughts:
In a do-while loop (that is without any continue statements), you don't expect a go-to statement that goes back to the loop head, in other words, creating a back edge.
For a while or for loop (that is again without any continue statements), it appears that there can be a go-to statement (I am not sure if there must be one). My compiler generates (I am using a standard 1.7 compiler) this go-to instruction outside of the loop, not as a back edge unlike what is mentioned in the given links (this go-to statement creates a control-flow to the head of the loop, but not as a jump back from the end of the loop).
So, my guess is, (repeating, in case of two back edges), if one of them is a back edge created by a go-to statement, then there is only one loop in the source code and it includes a continue statement (Case 1). Otherwise, there are two loops in the source code (Case 2).
Thank you.
When two loops are equivalent all you can do is to take the simplest one.
e.g. there is no way to tell the difference between while (true), do { } while (true) and for (;;)
If you have do { something(); } while (false) this loop might not appear in the byte code at all.
As Peter Lawrey already pointed out, there is no way to determine the source code form by looking at the bytecode. To name an example closer to your intention, the following single-loop code
do action(); while(condition1() || condition2());
produces exactly the same code as the nested loop
do do action(); while(condition1()); while(condition2());
Likewise the following loop
do {
action();
if(condition1()) continue;
break;
} while(condition2());
produces exactly the same code as
do action(); while(condition1() && condition2());
with current javac, whereas surprisingly
do {
action();
if(!condition1()) break;
} while(condition2());
does not, which only shows how much the exact form depends on compiler internals. The next version of javac might compile them differently.
This is the code sample which I want to parse. I want getSaveable PaymentMethodsSmartList() as a token, when I overwrite the function in the parserBaseListener.java file created by ANTLR.
/** #suppress */
public any function getSaveablePaymentMethodsSmartList() {
if(!structKeyExists(variables, "saveablePaymentMethodsSmartList")) {
variables.saveablePaymentMethodsSmartList = getService("paymentService").getPaymentMethodSmartList();
variables.saveablePaymentMethodsSmartList.addFilter('activeFlag', 1);
variables.saveablePaymentMethodsSmartList.addFilter('allowSaveFlag', 1);
variables.saveablePaymentMethodsSmartList.addInFilter('paymentMethodType', 'creditCard,giftCard,external,termPayment');
if(len(setting('accountEligiblePaymentMethods'))) {
variables.saveablePaymentMethodsSmartList.addInFilter('paymentMethodID', setting('accountEligiblePaymentMethods'));
}
}
return variables.saveablePaymentMethodsSmartList;
}
I already have the grammar that parses function declaration, but I need a new rule that can associate doctype comments with a function declaration and give the function name as separate token if there is a doctype comment associated with it.
Grammar looks like this:
functionDeclaration
: accessType? typeSpec? FUNCTION identifier
LEFTPAREN parameterList? RIGHTPAREN
functionAttribute* body=compoundStatement
;
You want grammar rules that:
return X if something "far away" in the source is a A,
returns Y if something far away is a B (or ...).
In general, this is context dependency. It is not handled well by context free grammars, which is something that ANTLR is trying to approximate with its BNF rules. In essence, what you think you want to do is to encode history of what the parser has seen long ago, to influence what is being produced now. Generally that is hard.
The usual solution to something like this is to not address it in the grammar at all. Instead:
have the grammar rules produce an X regardless of what is far away,
build a tree as you parse (ANTLR does this for you); this captures not only X but everything about the parsed entity, including tokens for A that are far away
walk over the tree, interpreting a found X as Y if the tree contains the A (usually far away in the tree).
For your specific case of docstring-influences-function name, you can probably get away with encoding far away history.
You need (IMHO, ugly) grammar rules that look something like this:
functionDeclaration: documented_function | undocumented_function ;
documented_function: docstring accessType? typeSpec? FUNCTION
documented_function_identifier rest_of_function ;
undocumented_function: accessType? typeSpec? FUNCTION
identifier rest_of_function ;
rest_of_function: // avoids duplication, not pretty
LEFTPAREN parameterList? RIGHTPAREN
functionAttribute* body=compoundStatement ;
You have to recognize the docstring as an explicit token that can be "seen" by the parser, which means modifying your lexer to make docstrings from comments (e.g, whitespace) into tokens. [This is the first ugly thing]. Then having seen such a docstring, the lexer has to switch to a lexical mode that will pick up identifier-like text and produce documented_function_identifier, and then switch back to normal mode. [This is the second ugly thing]. What you are doing is implementing literally a context dependency.
The reason you can accomplish this in spite of my remarks about context dependency is that A is not very far away; it is within few tokens of X.
So, you could do it this way. I would not do this; you are trying to make the parser do too much. Stick to the "usual solution". (You'll have different problem: your A is a comment/whitespace, and probably isn't stored in the tree by ANTLR. You'll have to solve that; I'm not an ANTLR expert.)
This question already has answers here:
Is it ok if I omit curly braces in Java? [closed]
(16 answers)
Closed 9 years ago.
I am using if condition without braces in java something like
if(somecondition)
//Only one line Business logic
but some told use braces always even one line statement something like this
if(somecondition){
//Only one line Business logic
}
What is the better way according to java sandard?
there's no real "standard". i prefer always using braces because if you dont you risk someone adding an innocent looking logging statement turning your code from
if(somecondition)
//Only one line Business logic
into
if(somecondition)
log.debug("condition was true");
//Only one line Business logic
and then things stop working :-)
That's a matter of taste. I would use braces or else no braces but write all code in one line to improve readability.
Also you might consider using a ternary operator
booleanExpression ? value1 : value2
In addition to #radai answer, if you are a real evil mind, when you see a if with no braces you can do something that will make you ennemies by adding a semi-colon on the same line of the if but at the 800th column of the line(or something).
like
if(condition) /*a loooot of whitespace*/ ;
//Only one line Business logic that will get executed whatever is the condition
This is why i prefer to use braces and recommend people to use them
No naked if statements. You're just asking for trouble. Always use { }
it is better to use braces when checking for errors or updating the code.
imagine.
if(answer.equals("add"))
addedValue += Scanner.readInt();
but you have a new requirement to add only the absolute value, so you change to.
if(answer.equals("add2))
valueToBeAdded = Scanner.readInt();
if(valueToBeAdded < 0) valueToBeAdded = - valueToBeAdded;
addedValue += valueToBeAdded;
this is not a really correct algorithm, is just an example of what can happens.
Using if statement with braces is better way to java standard, because it increase the readability and reduce unwanted error.
The two statements have exactly the same effect but I have suffered so often from the lack of braces that I also always comment that there should be braces even around 1 line statements. This makes the code easier to maintain and can save a lot of headache. My experience shows that one line if statements often turn into multi-line statements on later iterations so what you save by not writing two { the first time, you will give later on.
According to java standard braces are better because if they are not there compiler has to work around more and also would be performance issue.
I have an ANTLR Grammar that is in the form:
A - B: more,things
However sometimes, either A or B can be missing such as:
- B: more, things //Skip
A - : some, other, things //Skip
C - D: yet, more, things //Read this one and following
E - F: things
I want ANTLR to skip over those lines (where either side of - is missing) and continue processing the rest.
Basically, something like that
- B: more, things {if (!hasBothParts()) { continueAtNextLine();} };
From the book, I provided a #rulecatch and this catch after my appropriate Parser block:
catch[RecognitionException re]{
reportError(re);
consumeUntil(input, NEWLINE);
input.consume();
}
EDIT 2 - I tried doing this instead:
while (input.LA(1) != 6){
input.consume();
}
This worked as expected. 6 is the token identifier for "NEWLINE" for me. I don't know how to do a comparison like input.LA(1) != "\n" or something similar. I know it's not correct to do it this way, I am just experimenting. If you know the right way, please tell me! :)
But this works, unlike the first loop. I suspect consumeUntil is perhaps not seeing the NEWLINE on channel Hidden.
The NullPointer seems to be caused by the input fast-forwarding to EOF, and hence, the tree grammar is hitting a Null when it's doing input.LT(1).
However, in doing so, I get a NullPointerException in my tree grammar:
Exception in thread "main" java.lang.NullPointerException
at org.antlr.runtime.tree.BaseTreeAdaptor.isNil(BaseTreeAdaptor.java:70)
at org.antlr.runtime.tree.CommonTreeNodeStream.nextElement(CommonTreeNodeStream.java:93)
at org.antlr.runtime.misc.LookaheadStream.fill(LookaheadStream.java:94)
at org.antlr.runtime.misc.LookaheadStream.sync(LookaheadStream.java:88)
at org.antlr.runtime.misc.LookaheadStream.LT(LookaheadStream.java:119)
....
The behavior I want is for the parser to skip over the lines missing components and proceed with the remaining lines. The tree parser should not be a problem, I assume?
The ANTLR book does not mention anything regarding this issue.
Also, I think the ANTLR Error Recovery mentions something along those lines but the solution provided is fairly complex/ugly and dates back to 2004. So, is there a better way of doing this relatively simple thing in ANTLR?
Thank you.
EDIT If this helps, the error was caused by this line in the generated tree grammar:
retval.start = input.LT(1);
Which is, I assume, being called with nothing. I.e. LT(1) is returning Null, since it skipped over.
I worked the last 5 days to understand how unification algorithm works in Prolog .
Now ,I want to implement such algorithm in Java ..
I thought maybe best way is to manipulate the string and decompose its parts using some datastructure such as Stacks ..
to make it clear :
suppose user inputs is:
a(X,c(d,X)) = a(2,c(d,Y)).
I already take it as one string and split it into two strings (Expression1 and 2 ).
now, how can I know if the next char(s) is Variable or constants or etc.. ,
I can do it by nested if but it seems to me not good solution ..
I tried to use inheritance but the problem still ( how can I know the type of chars being read ?)
First you need to parse the inputs and build expression trees. Then apply Milner's unification algorithm (or some other unification algorithm) to figure out the mapping of variables to constants and expressions.
A really good description of Milner's algorithm may be found in the Dragon Book: "Compilers: Principles, Techniques and Tools" by Aho, Sethi and Ullman. (Milners algorithm can also cope with unification of cyclic graphs, and the Dragon Book presents it as a way to do type inference). By the sounds of it, you could benefit from learning a bit about parsing ... which is also covered by the Dragon Book.
EDIT: Other answers have suggested using a parser generator; e.g. ANTLR. That's good advice, but (judging from your example) your grammar is so simple that you could also get by with using StringTokenizer and a hand-written recursive descent parser. In fact, if you've got the time (and inclination) it is worth implementing the parser both ways as a learning exercise.
It sounds like this problem is more to do with parsing than unification specifically. Using something like ANTLR might help in terms of turning the original string into some kind of tree structure.
(It's not quite clear what you mean by "do it by nested", but if you mean that you're doing something like trying to read an expression, and recursing when meeting each "(", then that's actually one of the right ways to do it -- this is at heart what the code that ANTLR generates for you will do.)
If you are more interested in the mechanics of unifying things than you are in parsing, then one perfectly good way to do this is to construct the internal representation in code directly, and put off the parsing aspect for now. This can get a bit annoying during development, as your Prolog-style statements are now a rather verbose set of Java statements, but it lets you focus on one problem at a time, which is usually helpful.
(If you structure things this way, this should make it straightforward to insert a proper parser later, that will produce the same sort of tree as you have until then been constructing by hand. This will let you attack the two problems separately in a reasonably neat fashion.)
Before you get to do the semantics of the language, you have to convert the text into a form that's easy to operate on. This process is called parsing and the semantic representation is called an abstract syntax tree (AST).
A simple recursive descent parser for Prolog might be hand written, but it's more common to use a parser toolkit such as Rats! or Antlr
In an AST for Prolog, you might have classes for Term, and CompoundTerm, Variable, and Atom are all Terms. Polymorphism allows the arguments to a compound term to be any Term.
Your unification algorithm then becomes unifying the name of any compound term, and recursively unifying the value of each argument of corresponding compound terms.