We are developing an eclipse plugin tool to remove sysout statements from the workspace projects. We are able to achieve our goal only partially. If the sysouts are in one line we are able to delete it easily. But if the sysout is spanned over a couple of lines (generally occurs due to code formatting), this is when we face the issue.
For Example :
System.out.println("Hello World");
The regular expression to remove this line would be simple:
System.out.println*
But if the code is this:
System.out.println(New Line)("HelloWorld");
This is where the issue comes. Can anyone please suggest how I can replace this using a java regular expression.
I suggest
String regex = "System\.out\.println[^)]+[)]\s*;"
Where the [^)]+ will scan until the closing parenthesis. However, this will fail in multiple cases:
(possibly-unbalanced) parenthesis inside the output
commented-out code
the few cases where it is possible to omit the ';'
cases where System.out is assigned to another variable, instead of being used directly
Go the extra mile and use a Eclipse's in-built parser (which understands lexical issues, comments, and can flag any compile-time references to System.out).
Related
I have received ownership of a code base that, although very well written, uses a rather bizarre convention:
public void someMethod(String pName, Integer pAge, Context pContext)
{
...
}
I'd like to make the following two changes to the entire code:
public void someMethod(String name, Integer age, Context context) { ... }
Opening bracket in the same line of the method declaration
Use a camelCase name for all parameters of the method, without this weird "p" prefix
Can checkstyle help me here? I'm looking but I can't find a way to rename all parameters in all method signatures to something more pleasant.
If you are willing to use the eclipse IDE it'd offer a very handy feature for auto-formatting code:
http://help.eclipse.org/juno/index.jsp?topic=%2Forg.eclipse.jdt.doc.user%2Freference%2Fpreferences%2Fjava%2Fcodestyle%2Fref-preferences-formatter.htm
It is pretty self-explanatory and straight-forward in my opinion.
Eclipse allows for regex based search and replace operations.
Just open Search > File... there enter the following regex for Containing text:
\b[p]([A-Z][a-z]+)\b
And tick both Case sensitive and Regular expression.
Then press Replace...
In the newly popped up window enter
\1
in the With: field and tick Regular expression.
Edit: Sadly in its current version Eclipse does not support the \L flag for content groups so you are still stuck with an uppercase leading letter.
To answer your question about checkstyle: No, checkstyle is a tool used for analyzing code not for changing.
Using checkstyle to format code
(Question from Oct'12)
Also did some research, here's another stackoverflow question aiming at the practically same. The solution offered there is similarly work intensive.
Can I automatically refactor an entire java project and rename uppercase method parameters to lowercase?
(Question from Oct'10)
Hopefully my title isn't completely terrible. I don't really know what this should be called. I'm trying to write a very basic scheme parser in Java. The issue I'm having is with implementation.
I open a file, and I want to parse individual tokens:
while(sc.hasNext()) {
System.out.println(sc.next());
}
Generally, to get tokens, this is fine. But in scheme, recognizing the begining and end of a list is crucial; my program's functionality depends on this, so I need a way to treat a token such as:
(define
or
poly))
As multiple tokens, where any parentheses is its own token:
(
define
poly
)
)
If I can do that, I can properly recognize different symbols to add to my symtab, and know when/how to add nodes to my parse tree.
The Java API shows that the scanner class doesn't have any methods for doing exactly what I want. The closest thing I could think of is using the parantheses as custom delimiters, which would make each token clean enough to be recognized more easily by my logic, but then what happens to my parentheses?
Another method I'm thinking about is forgoing the Java tokenizer, and just scanning char by char until I find a complete symbol.
What should I do? Try to work around the Java scanner methods, or just do a character by character approach?
First, you need to get your terminology straight. (define is not a single token; it's a ( token followed by a define one. Similarly, poly)) is not a single token, it's three.
Don't let java.util.Scanner (that's what you're using, right?) throw you for a loop -- when you say "Generally, to get tokens, this is fine", I say no, it's not. Don't settle for what it provides if it's not enough.
To correctly tokenize Scheme code, I'd expect you need to at least be able to deal with regular languages. That would probably be very tough to do using Scanner, so here's a couple of alternatives:
learn and apply a tried-and-true parsing tool like Antlr or Lex. Will be beneficial for any of your future parsing projects
roll your own regular expression approach (I don't know Scheme well enough to be sure that this will work) for tokenizing, but don't forget that you need at least context-free for full parsing
learn about parser combinators and recursive descent parsing, which are relatively easy to implement by hand -- and you'll end up learning a ton about Java's type system
I've recently started working on a project with someone and he prefers the code to be commented in a very particular manner. Instead off adding it all by hand I was wondering if Netbeans had any way to add custom comments to new functions and logic statements.
The comments added would be backslashes after a closing bracket and then the name of what it's closing.
For example:
public void funcA(int arg){
if(arg>2)
{
System.out.println("hi");
} //if
} //funcA
I looked around but all I could find was creating your own code template for a new class, which isn't what I need. Any help? thanks.
If your code is consistently indented as in your question, you can manage with a regular expression like this:
Capture:
(?s)^(\s*)if.*?\1\{.*?\1\}
Replace:
$0 //if
Where the (?s) at the start turns on single-line mode (meaning dots match newlines), and the backreferences ensure that the if statement and braces match in indentation and are therefore matching.
I'm trying to write a regex that finds all variables (and only variables, ignoring methods completely) in a given piece of JavaScript code. The actual code (the one which executes regex) is written in Java.
For now, I've got something like this:
Matcher matcher=Pattern.compile(".*?([a-z]+\\w*?).*?").matcher(string);
while(matcher.find()) {
System.out.println(matcher.group(1));
}
So, when value of "string" is variable*func()*20
printout is:
variable
func
Which is not what I want. The simple negation of ( won't do, because it makes regex catch unnecessary characters or cuts them off, but still functions are captured. For now, I have the following code:
Matcher matcher=Pattern.compile(".*?(([a-z]+\\w*)(\\(?)).*?").matcher(formula);
while(matcher.find()) {
if(matcher.group(3).isEmpty()) {
System.out.println(matcher.group(2));
}
}
It works, the printout is correct, but I don't like the additional check. Any ideas? Please?
EDIT (2011-04-12):
Thank you for all answers. There were questions, why would I need something like that. And you are right, in case of bigger, more complicated scripts, the only sane solution would be parsing them. In my case, however, this would be excessive. The scraps of JS I'm working on are intented to be simple formulas, something like (a+b)/2. No comments, string literals, arrays, etc. Only variables and (probably) some built-in functions. I need variables list to check if they can be initalized and this point (and initialized at all). I realize that all of it can be done manually with RPN as well (which would be safer), but these formulas are going to be wrapped with bigger script and evaluated in web browser, so it's more convenient this way.
This may be a bit dirty, but it's assumed that whoever is writing these formulas (probably me, for most of the time), knows what is doing and is able to check if they are working correctly.
If anyone finds this question, wanting to do something similar, should now the risks/difficulties. I do, at least I hope so ;)
Taking all the sound advice about how regex is not the best tool for the job into consideration is important. But you might get away with a quick and dirty regex if your rule is simple enough (and you are aware of the limitations of that rule):
Pattern regex = Pattern.compile(
"\\b # word boundary\n" +
"[A-Za-z]# 1 ASCII letter\n" +
"\\w* # 0+ alnums\n" +
"\\b # word boundary\n" +
"(?! # Lookahead assertion: Make sure there is no...\n" +
" \\s* # optional whitespace\n" +
" \\( # opening parenthesis\n" +
") # ...at this position in the string",
Pattern.COMMENTS);
This matches an identifier as long as it's not followed by a parenthesis. Of course, now you need group(0) instead of group(1). And of course this matches lots of other stuff (inside strings, comments, etc.)...
If you are rethinking using regex and wondering what else you could do, you could consider using an AST instead to access your source programatically. This answer shows you could use the Eclipse Java AST to build a syntax tree for Java source. I guess you could do similar for Javascript.
A regex won't cut in this case because Java isn't regular. Your best best is to get a parser that understands Java syntax and build onto that. Luckily, ANTLR has a Java 1.6 grammar (and 1.5 grammar).
For your rather limited use case you could probably easily extend the variable assignment rules and get the info you need. It's a bit of a learning curve but this will probably be your best best for a quick and accurate solution.
It's pretty well established that regex cannot be reliably used to parse structured input. See here for the famous response: RegEx match open tags except XHTML self-contained tags
As any given sequence of characters may or may not change meaning depending on previous or subsequent sequences of characters, you cannot reliably identify a syntactic element without both lexing and parsing the input text. Regex can be used for the former (breaking an input stream into tokens), but cannot be used reliably for the latter (assigning meaning to tokens depending on their position in the stream).
I'm writing a simple scripting language on top of Java/JVM, where you can also embed Java code using the {} brackets. The problem is, how do I parse this in the grammar? I have two options:
Allow everything to be in it, such as: [a-z|a-Z|0-9|_|$], and go on
Get an extra java grammar and use that grammar to parse that small code (is it actually possible and efficient?)
Since option 2] is basically a double-check since when evaluating java code it's also being checked. Now my last question is -- is way that can dynamically execute java code also with objects which have been created at runtime?
Thanks,
William van Doorn
1] Allow everything to be in it, such as: [a-z|a-Z|0-9|_|$], and go on
You can't just do that: you'll have to account for opening and closing brackets.
2] Get an extra java grammar and use that grammar to parse that small code (is it actually possible and efficient?)
Yes that's possible. But I suggest you first get something working, and then worry about efficiency (is that really an issue here?).
... is way that can dynamically execute java code also with objects which have been created at runtime?
Yes, since Java 6, there's a way to compile source files dynamically. See the JavaCompiler API.
I propose enclose your Java code inside characters like '`' which are not used in Java code and barely present in literals.
JavaCode: '' ( EscapeSequence | ~('\\'|'') )* '`'
;
Use java.g provided by antlr examples to get definition of EscapeSequence ,...
The only catch is that you need to ask programmers to use code of this character ('`') if it is required to be as an literal.