I have a lot of java files wherein I have to search for a method, if present I have to add a line inside this method "If this line does not already exist". This line has to be added before the closing brace of the method.
So far I have the following code:
import os
import ntpath
extensions = set(['.java','.kt'])
for subdir, dirs, files in os.walk("/src/main"):
for file in files:
filepath = subdir + os.sep + file
extension = os.path.splitext(filepath)[1]
if extension in extensions:
if 'onCreate(' in open(filepath).read():
print (ntpath.basename(filepath))
if 'onPause' in open (filepath).read():
print ("is Activity and contains onPause\n")
#Check if Config.pauseCollectingLifecycleData(); is in this code bloack, if exists do nothing, if does not exist add to the end of code block before }
if 'onResume' in open (filepath).read():
print ("is Activity and contains onResume\n")
#Check if Config.resumeCollectingLifecycleData(); is in this code bloack, if exists do nothing, if does not exist add to the end of code block before }
But I am not sure where to go from here, Python not being my first language. Could I request to be guided in the right direction.
Example:
I am looking for a method with the following signature:
public void onPause(){
super.onPause();
// Add my line here
}
public void onPause(){
super.onPause();
Config.pauseCollectingLifecycleData(); // Line exists do nothing
}
This is actually quite difficult. First of all, your if "onPause" in sourcecode approach currently doesn't distinguish between defining onPause() and calling it. And second of all, finding the correct closing } isn't trivial. Naively, you might just count opening and closing curlies ({ increments the blocklevel, } decrements it), and assume that the } that makes the blocklevel zero is the closing curly of the method. However, this might be wrong! Because the method might contain some string literal containing (possibly unbalanced) curlies. Or comments with curlies. This would mess up the blocklevel count.
To do this properly, you would have to build an actual Java parser. That's a lot of work, even when using libraries such as tatsu.
If you're fine with a rather volatile kludge, you can try and use the blocklevel count mentioned above together with the indentation as a clue (assuming your source code is decently indented). Here's something I've hacked up as a starting point:
def augment_function(sourcecode, function, line_to_insert):
in_function = False
blocklevel = 0
insert_before = None
source = sourcecode.split("\n")
for line_no, line in enumerate(source):
if in_function:
if "{" in line:
blocklevel += 1
if "}" in line:
blocklevel -= 1
if blocklevel == 0:
insert_before = line_no
indent = len(line) - len(line.lstrip(" ")) + 4 #4=your indent level
break
elif function in line and "public " in line:
in_function = True
if "{" in line:
blocklevel += 1
if insert_before:
source.insert(insert_before, " "*indent + line_to_insert)
return "\n".join(source)
# test code:
java_code = """class Foo {
private int foo;
public void main(String[] args) {
foo = 1;
}
public void setFoo(int f)
{
foo = f;
}
public int getFoo(int f) {
return foo;
}
}
"""
print(augment_function(java_code, "setFoo", "log.debug(\"setFoo\")"))
Note that this is vulnerable to all sorts of edge cases (such as { in a string or in a comment, or tab indent instead of space, or possibly a thousand other things). This is just a starting point for you.
Related
I have string looks like below, the string is joined by line-breaker. In this string, the the first 2 lines and last two lines are fixed, "public class MyClass {/n public void code() {/n"
String doc =
"public class MyClass {
public void code() {
try (...) {
...
}
}
}"
I only want to take out the multiple lines code in the method code, which means no first 2 lines and last 2 lines. This is what I did in my project:
String[] lines = docj.split("\\r?\\n");
String[] codes = Arrays.copyOfRange(lines, 2, lines.length - 2);
String result = String.join("\n", codes);
Do you have better way to fetch the string in the middle?
The only real answer: use an existing parser framework, such as javaparser.
Seriously, that simple.
Anything else means: you are spending time and energy to solve a solved problem. The result will be deficient, compared to any mature product, and it will be a constant liability in the future. You can get your tool to work with code you have in front of you right now, but the second your tool gets used to "parse" slightly different code, it will most likely break.
In case you are asking for educational purposes, then learn how compiler works, and what it takes to tokenize Java source code, and how to turn it into an abstract syntax tree (AST) representation.
Assuming the task is meant for basic educational purposes or a quick hack (otherwise #GhostCat's answer draws first):
Already method detection, taken seriously is not so easy. Basically you have to start implementing your own syntax parser for a fraction the Java language: chop everything to single words, skip the class declaration, wait for "static", "public", "protected", "private", "synchronized", hope I didn't forget one, skip over them and the return type definition ("void", "string"...), then you are at the name, then come optional type parameters ("<T>"), then "(", then optionally method parameters etc.).
Perhaps there are restrictions to the task, that make it less complicated. You should ask for clarification.
The problem in any case will be to find the closing braces and skip them. If you can afford to neglect such stuff as braces in strings (string s = "ab{{c";) or comments ("/* {{{ */")it is enough to count up for each { occuring after e.g. "public void code() {" and count down for "}". when the brace count is 0 and you see another "}", that one can be skipped and everything until the next method declaration.
If that's not precise enough, or your requirements are of a more serious nature, you'd have to get into parsing, e.g. using antlr or Javaparser. Here's a project that seems to do a similar task.
Learning Java Parser takes some amount of time. It isn't difficult, and there is a Java Doc Documentation Page available on the Internet. (See Here) ... But unfortunately, there isn't a lot of text to read in the documentation pages themselves. This class prints out the Method Bodies from a source-code file that is saved as a String.
Every method in the class is printed...
import com.github.javaparser.ast.*;
import com.github.javaparser.ast.stmt.BlockStmt;
import com.github.javaparser.ast.body.MethodDeclaration;
import com.github.javaparser.ast.visitor.VoidVisitor;
import com.github.javaparser.ast.visitor.VoidVisitorAdapter;
import com.github.javaparser.*;
import java.io.IOException;
import java.util.Optional;
public class MethodBody
{
static final String src =
"public class MyClass {" + '\n' +
" public void code() {" + '\n' +
" try {" + '\n' +
" /* do stuff */ " + '\n' +
" }" + '\n' +
" catch (Exception e) { }" + '\n' +
" }" + '\n' +
"}";
public static void main(String[] argv) throws IOException
{
CompilationUnit cu = StaticJavaParser.parse(src);
VoidVisitor<?> visitor = new VoidVisitorAdapter<Void>()
{
public void visit(MethodDeclaration md, Void arg)
{
System.out.println("Method Name: " + md.getName());
Optional<BlockStmt> optBody = md.getBody();
if (! optBody.isPresent()) System.out.println("No Method Body Definition\n");
System.out.println("Method Body:\n" + optBody.get().toString() + "\n\n");
}
};
visitor.visit(cu, null);
}
}
The above code will print this to terminal:
Method Name: code
Method Body:
{
try {
/* do stuff */
} catch (Exception e) {
}
}
Let's say we have a java file that looks like this :
class Something {
public static void main(String[] args){
System.out.println("Hello World!");
}
}
I would like to write some Kotlin code that would go through this java file and detect how many lines there is in the method body (Here is the main method only). Empty lines are counted!
My approach is to simply use the File forEachline method to read the java file line by line. I can write code to detect the method signature. Now I want to be able to determine where the method ends. I don't know how to do that.
If I simply look for "}" my code could make mistakes assuming that we are at the end of the body of the method while in reality we are at the end of an if statement body within the method.
How can I avoid this pitfall?
One way to approach this is keeping track of the number of open brackets('{') and close brackets ('}') seen. At the start of the method, the count will increment to 1. Assuming the method is validly structured, at the end of the method the number of unclosed brackets should be 0. Pseudocode like this should work:
int numLines = 1 (assuming method start line counts)
int numBrackets = 1 (after finding method open bracket for method)
while(numBrackets > 0)
if char = '{' -> numBrackets++
if char = '}' -> numBrackets--
if char = newline -> numLines++
if numBrackets not 0 -> FAIL
return numLines
Edit
As noted by Gidds below, this pseudo-code is insufficient. A more complete answer will need to include the fact that not all brackets impact method structure. One way to approach this is by keeping track of the context of the current character being parsed. Only increment/decrement numBrackets when in a valid context (non-string literal, comment, etc..). Though as noted by Gidds, this will increase complexity. Updated Pseudocode:
int numLines = 1
int numValidBrackets = 1
Context context = Context(MethodStructure)
while(numValidBrackets > 0)
context.acceptNextChar(char)
if char = newline -> numLines++
if(context.state() != MethodStructure) continue;
if char = '{' -> numValidBrackets++
if char = '}' -> numValidBrackets--
if numBrackets not 0 -> FAIL
return numLines
I have a method called render_something which can creates a lot of whitespace, for example:
#render_something('xxx')
The result can be:
<a href="#">
something that generate from redner_something
</a>
Which actually I want it to be like this:
something that generate from redner_something
Does velocity has something like this?
#trim(#render_something('xxx'))
I just read this article on Velocity Whitespace Gobbling which suggests a few work-arounds including Velocity Whitespace Truncated By Line Comment.
This basically suggests commenting out line breaks by putting comments at the end of each line. It also suggests not indenting the code in your macros to prevent superfluous (one of my favourite words) spaces occurring.
TBH it's not a great solution but may suit your needs. Simply put ## at the end of each line in your macro and that will make things a little bit nicer... sort of
It seems just java native trim() works.
$someValue.trim() works for me
Solution
In the class where you create the VelocityEngine, add a method as follows
public String trim(String str) {
return str.trim()/*.replace("\n", "").replace("\r", "")*/;
}
then add the following to the VelocityContext that you create:
context.put("trimmer", this);
and finally in the velocity template do the following
$trimmer.trim("#render_something('xxx')")
Why does it work?
Although the behavior of Velocity is clearly define, it can be a bit tricky to see how it works sometimes. The separate trim()-method is necessary to get the char-sequence from the template into a Java method where you can call the actual trim() on the String. As far as I know there is no trim inside Velocity, but you always can call back to Java with tricks like this one.
The double-quotes are necessary because the #render_something is just a macro, not a function call, this means the results of the statements in the macro are put verbatim into the point where the macro is "executed".
I struggled a while to find a straightforward solution to whitespace gobbling, so here the one I finally came up with. It is inspired from and Vadzim's answer and this page http://wiki.apache.org/velocity/StructuredGlobbingResourceLoader
The StructuredGlobbingResourceLoader we can find on the website has a complex behaviour and doesn’t get rid of any kind of whitespace, so I modified it to get the simple behaviour: "Delete any whitespace at the beginning of the lines, and add a comment at the end of each line" (which prevents the linebreak evaluation). The filter is applied on the input stream at loading time.
This kind of velocity template
#if($value)
the value is $value
#end
is transformed to
#if($value)##
the value is $value##
#end##
Then if you want to have linebreaks or beginning of line whitespaces, you'll have to put($br,"\n") and put($sp," ") in your context like Vadzim's explained and explicitly use them in your template. This way of doing will allow you to keep indented templates, with maximum control.
take the class from this page http://wiki.apache.org/velocity/StructuredGlobbingResourceLoader
change the extended class to the kind of loader your need (this one uses the webapp loader)
replace the read() method with the code I provide
use the class as your resource loader in your properties. Example for the webapp loader: webapp.resource.loader.class=...StructuredGlobbingResourceLoader
public int read() throws IOException {
int ch;
switch(state){
case bol: //beginning of line, read until non-indentation character
while(true){
ch = in.read();
if (ch!=(int)' ' && ch!=(int)'\t'){
state = State.content;
return processChar(ch);
}
}
case content:
ch = in.read();
return processChar(ch);
//eol states replace all "\n" by "##\n"
case eol1:
state = State.eol2;
return (int)'#';
case eol2:
state = State.bol;
return (int)'\n';
case eof:
return -1;
}
return -1;
}
//Return the normal character if not end of file or \n
private int processChar(int ch){
switch(ch){
case -1:
state = State.eof;
return -1;
case (int)'\n':
state = State.eol1;
return (int)'#';
default:
return ch;
}
}
Any feedback on my implementation is welcome
Inspired by Velocity Whitespace Truncated By Line Comment one could use block comments instead of line comments for a better looking result:
#foreach( $record in $records )#**
*##if( $record.id == 0 )#**
*##end
#end
With a decent syntax highlighting the comments aren't very obtrusive.
Here is my alternative solution to velocity whitespace gobbling that allows tabbing template structure.
Each template text is preprocessed on first load in custom ResourceLoader:
private String enhanceTemplate(String body) {
if (!body.startsWith("##preserveWhitespace")) {
body = body.replaceAll("(##.*)?[ \\t\\r]*\\n+[ \\t\\r]*", Matcher.quoteReplacement("##\n"));
body = body.trim();
}
return body;
}
This replaces all new lines and adjustent spaces with just one commented newline.
Line breaks and tailing spaces can be inserted explicitly with $br and $sp variables from default context:
private static final VelocityContext DEFAULT_CONTEXT = new VelocityContext(new HashMap<String, String>() {{
put("sp", " ");
put("br", "\n");
}});
In some cases, I've had to essentially minimize my script like I would js or css. It works well, though it is not as easy for humans to read. Just one other option to eliminate the excess space:
<ul class="tabs">#foreach($par in $bodypars)#set( $parLen = ${_MathTool.toInteger($bodypars.size())} )#set( $parLn = $parLen - 1 )#set( $thClass = 'tb'+${parLn} )#set( $thaClass = '' )#if( $foreach.index == 1 )#set( $thClass = ${thClass}+' selected' )#set( $thaClass = ' selected' )#end#if($foreach.index != 0 && $parLen <= $maxTabs)#set ( $btitle = $_XPathTool.selectSingleNode($par,'item-subtitle') )<li class="${thClass}">#if($!btitle && $btitle != '')$_SerializerTool.serialize($btitle, true)#end</li>#end#end</ul>
You can use standard java trim, taking attention to your variable if are a object instead string.
$string.trim() //work fine
$object.trim() //exception
Have a good day!
I'm trying to write a simple interactive (using System.in as source) language using antlr, and I have a few problems with it. The examples I've found on the web are all using a per line cycle, e.g.:
while(readline)
result = parse(line)
doStuff(result)
But what if I'm writing something like pascal/smtp/etc, with a "first line" looks like X requirment? I know it can be checked in doStuff, but I think logically it is part of the syntax.
Or what if a command is split into multiple lines? I can try
while(readline)
lines.add(line)
try
result = parse(lines)
lines = []
doStuff(result)
catch
nop
But with this I'm also hiding real errors.
Or I could reparse all lines everytime, but:
it will be slow
there are instructions I don't want to run twice
Can this be done with ANTLR, or if not, with something else?
Dutow wrote:
Or I could reparse all lines everytime, but:
it will be slow
there are instructions I don't want to run twice
Can this be done with ANTLR, or if not, with something else?
Yes, ANTLR can do this. Perhaps not out of the box, but with a bit of custom code, it sure is possible. You also don't need to re-parse the entire token stream for it.
Let's say you want to parse a very simple language line by line that where each line is either a program declaration, or a uses declaration, or a statement.
It should always start with a program declaration, followed by zero or more uses declarations followed by zero or more statements. uses declarations cannot come after statements and there can't be more than one program declaration.
For simplicity, a statement is just a simple assignment: a = 4 or b = a.
An ANTLR grammar for such a language could look like this:
grammar REPL;
parse
: programDeclaration EOF
| usesDeclaration EOF
| statement EOF
;
programDeclaration
: PROGRAM ID
;
usesDeclaration
: USES idList
;
statement
: ID '=' (INT | ID)
;
idList
: ID (',' ID)*
;
PROGRAM : 'program';
USES : 'uses';
ID : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*;
INT : '0'..'9'+;
SPACE : (' ' | '\t' | '\r' | '\n') {skip();};
But, we'll need to add a couple of checks of course. Also, by default, a parser takes a token stream in its constructor, but since we're planning to trickle tokens in the parser line-by-line, we'll need to create a new constructor in our parser. You can add custom members in your lexer or parser classes by putting them in a #parser::members { ... } or #lexer::members { ... } section respectively. We'll also add a couple of boolean flags to keep track whether the program declaration has happened already and if uses declarations are allowed. Finally, we'll add a process(String source) method which, for each new line, creates a lexer which gets fed to the parser.
All of that would look like:
#parser::members {
boolean programDeclDone;
boolean usesDeclAllowed;
public REPLParser() {
super(null);
programDeclDone = false;
usesDeclAllowed = true;
}
public void process(String source) throws Exception {
ANTLRStringStream in = new ANTLRStringStream(source);
REPLLexer lexer = new REPLLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
super.setTokenStream(tokens);
this.parse(); // the entry point of our parser
}
}
Now inside our grammar, we're going to check through a couple of gated semantic predicates if we're parsing declarations in the correct order. And after parsing a certain declaration, or statement, we'll want to flip certain boolean flags to allow- or disallow declaration from then on. The flipping of these boolean flags is done through each rule's #after { ... } section that gets executed (not surprisingly) after the tokens from that parser rule are matched.
Your final grammar file now looks like this (including some System.out.println's for debugging purposes):
grammar REPL;
#parser::members {
boolean programDeclDone;
boolean usesDeclAllowed;
public REPLParser() {
super(null);
programDeclDone = false;
usesDeclAllowed = true;
}
public void process(String source) throws Exception {
ANTLRStringStream in = new ANTLRStringStream(source);
REPLLexer lexer = new REPLLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
super.setTokenStream(tokens);
this.parse();
}
}
parse
: programDeclaration EOF
| {programDeclDone}? (usesDeclaration | statement) EOF
;
programDeclaration
#after{
programDeclDone = true;
}
: {!programDeclDone}? PROGRAM ID {System.out.println("\t\t\t program <- " + $ID.text);}
;
usesDeclaration
: {usesDeclAllowed}? USES idList {System.out.println("\t\t\t uses <- " + $idList.text);}
;
statement
#after{
usesDeclAllowed = false;
}
: left=ID '=' right=(INT | ID) {System.out.println("\t\t\t " + $left.text + " <- " + $right.text);}
;
idList
: ID (',' ID)*
;
PROGRAM : 'program';
USES : 'uses';
ID : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*;
INT : '0'..'9'+;
SPACE : (' ' | '\t' | '\r' | '\n') {skip();};
which can be tested wit the following class:
import org.antlr.runtime.*;
import java.util.Scanner;
public class Main {
public static void main(String[] args) throws Exception {
Scanner keyboard = new Scanner(System.in);
REPLParser parser = new REPLParser();
while(true) {
System.out.print("\n> ");
String input = keyboard.nextLine();
if(input.equals("quit")) {
break;
}
parser.process(input);
}
System.out.println("\nBye!");
}
}
To run this test class, do the following:
# generate a lexer and parser:
java -cp antlr-3.2.jar org.antlr.Tool REPL.g
# compile all .java source files:
javac -cp antlr-3.2.jar *.java
# run the main class on Windows:
java -cp .;antlr-3.2.jar Main
# or on Linux/Mac:
java -cp .:antlr-3.2.jar Main
As you can see, you can only declare a program once:
> program A
program <- A
> program B
line 1:0 rule programDeclaration failed predicate: {!programDeclDone}?
uses cannot come after statements:
> program X
program <- X
> uses a,b,c
uses <- a,b,c
> a = 666
a <- 666
> uses d,e
line 1:0 rule usesDeclaration failed predicate: {usesDeclAllowed}?
and you must start with a program declaration:
> uses foo
line 1:0 rule parse failed predicate: {programDeclDone}?
Here's an example of how to parse input from System.in without first manually parsing it one line at a time and without making major compromises in the grammar. I'm using ANTLR 3.4. ANTLR 4 may have addressed this problem already. I'm still using ANTLR 3, though, and maybe someone else with this problem still is too.
Before getting into the solution, here are the hurdles I ran into that keeps this seemingly trivial problem from being easy to solve:
The built-in ANTLR classes that derive from CharStream consume the entire stream of data up-front. Obviously an interactive mode (or any other indeterminate-length stream source) can't provide all the data.
The built-in BufferedTokenStream and derived class(es) will not end on a skipped or off-channel token. In an interactive setting, this means that the current statement can't end (and therefore can't execute) until the first token of the next statement or EOF has been consumed when using one of these classes.
The end of the statement itself may be indeterminate until the next statement begins.
Consider a simple example:
statement: 'verb' 'noun' ('and' 'noun')*
;
WS: //etc...
Interactively parsing a single statement (and only a single statement) isn't possible. Either the next statement has to be started (that is, hitting "verb" in the input), or the grammar has to be modified to mark the end of the statement, e.g. with a ';'.
I haven't found a way to manage a multi-channel lexer with my solution. It doesn't hurt me since I can replace my $channel = HIDDEN with skip(), but it's still a limitation worth mentioning.
A grammar may need a new rule to simplify interactive parsing.
For example, my grammar's normal entry point is this rule:
script
: statement* EOF -> ^(STMTS statement*)
;
My interactive session can't start at the script rule because it won't end until EOF. But it can't start at statement either because STMTS might be used by my tree parser.
So I introduced the following rule specifically for an interactive session:
interactive
: statement -> ^(STMTS statement)
;
In my case, there are no "first line" rules, so I can't say how easy or hard it would be to do something similar for them. It may be a matter of making a rule like so and execute it at the beginning of the interactive session:
interactive_start
: first_line
;
The code behind a grammar (e.g., code that tracks symbols) may have been written under the assumption that the lifespan of the input and the lifespan of the parser object would effectively be the same. For my solution, that assumption doesn't hold. The parser gets replaced after each statement, so the new parser must be able to pick up the symbol tracking (or whatever) where the last one left off. This is a typical separation-of-concerns problem so I don't think there's much else to say about it.
The first problem mentioned, the limitations of the built-in CharStream classes, was my only major hang-up. ANTLRStringStream has all the workings that I need, so I derived my own CharStream class off of it. The base class's data member is assumed to have all the past characters read, so I needed to override all the methods that access it. Then I changed the direct read to a call to (new method) dataAt to manage reading from the stream. That's basically all there is to this. Please note that the code here may have unnoticed problems and does no real error handling.
public class MyInputStream extends ANTLRStringStream {
private InputStream in;
public MyInputStream(InputStream in) {
super(new char[0], 0);
this.in = in;
}
#Override
// copied almost verbatim from ANTLRStringStream
public void consume() {
if (p < n) {
charPositionInLine++;
if (dataAt(p) == '\n') {
line++;
charPositionInLine = 0;
}
p++;
}
}
#Override
// copied almost verbatim from ANTLRStringStream
public int LA(int i) {
if (i == 0) {
return 0; // undefined
}
if (i < 0) {
i++; // e.g., translate LA(-1) to use offset i=0; then data[p+0-1]
if ((p + i - 1) < 0) {
return CharStream.EOF; // invalid; no char before first char
}
}
// Read ahead
return dataAt(p + i - 1);
}
#Override
public String substring(int start, int stop) {
if (stop >= n) {
//Read ahead.
dataAt(stop);
}
return new String(data, start, stop - start + 1);
}
private int dataAt(int i) {
ensureRead(i);
if (i < n) {
return data[i];
} else {
// Nothing to read at that point.
return CharStream.EOF;
}
}
private void ensureRead(int i) {
if (i < n) {
// The data has been read.
return;
}
int distance = i - n + 1;
ensureCapacity(n + distance);
// Crude way to copy from the byte stream into the char array.
for (int pos = 0; pos < distance; ++pos) {
int read;
try {
read = in.read();
} catch (IOException e) {
// TODO handle this better.
throw new RuntimeException(e);
}
if (read < 0) {
break;
} else {
data[n++] = (char) read;
}
}
}
private void ensureCapacity(int capacity) {
if (capacity > n) {
char[] newData = new char[capacity];
System.arraycopy(data, 0, newData, 0, n);
data = newData;
}
}
}
Launching an interactive session is similar to the boilerplate parsing code, except that UnbufferedTokenStream is used and the parsing takes place in a loop:
MyLexer lex = new MyLexer(new MyInputStream(System.in));
TokenStream tokens = new UnbufferedTokenStream(lex);
//Handle "first line" parser rule(s) here.
while (true) {
MyParser parser = new MyParser(tokens);
//Set up the parser here.
MyParser.interactive_return r = parser.interactive();
//Do something with the return value.
//Break on some meaningful condition.
}
Still with me? Okay, well that's it. :)
If you are using System.in as source, which is an input stream, why not just have ANTLR tokenize the input stream as it is read and then parse the tokens?
You have to put it in doStuff....
For instance, if you're declaring a function, the parse would return a function right? without body, so, that's fine, because the body will come later. You'd do what most REPL do.
I know this is silly but I can't overcome my curiosity. Is it possible to write a shell script to format a piece of java code?
For example, if a user writes in a code:
public class Super{
public static void main(String[] args){
System.out.println("Hello world");
int a=0;
if(a==100)
{
System.out.println("Hello world");
}
else
{
System.out.println("Hello world with else");
}
}
}
I would like to write a shell script which would make the code like this.
public class Super
{
public static void main(String[] args)
{
System.out.println("Hello world");
int a=0;
if(a==100){
System.out.println("Hello world");
}
else{
System.out.println("Hello world with else");
}
}
To be precise, we should change the formatting of flower brackets. If it is try/catch or control structures we should change it to same line and if it is function/method/class it should come in next line.I have little knowledge about sed and awk which can do this task so easily. Also I know this can be done using eclipse.
Well, I've had some free time on my hands, so I decided to relive my good old linux days :]
After reading a bit about awk and sed, I've decided that it might be better to use both, as it is easier to add indentation in awk and parse strings in sed.
Here is the ~/sed_script that formats the source file:
# delete indentation
s/^ \+//g
# format lines with class
s/^\(.\+class.\+\) *\({.*\)$/\1\n\2/g
# format lines with methods
s/^\(public\|private\)\( \+static\)\?\( \+void\)\? \+\(.\+(.*)\) *\({.*\)$/\1\2\3 \4\n\5/g
# format lines with other structures
/^\(if\|else\|for\|while\|case\|do\|try\)\([^{]*\)$/,+1 { # get lines not containing '{'
# along with the next line
/.*{.*/ d # delete the next line with '{'
s/\([^{]*\)/\1 {/g # and add '{' to the first line
}
And here is the ~/awk_script that adds indentation:
BEGIN { depth = 0 }
/}/ { depth = depth - 1 }
{
getPrefix(depth)
print prefix $0
}
/{/ { depth = depth + 1 }
function getPrefix(depth) {
prefix = ""
for (i = 0; i < depth; i++) { prefix = prefix " "}
return prefix
}
And you use them like that:
> sed -f ~/sed_script ~/file_to_format > ~/.tmp_sed
> awk -f ~/awk_script ~/.tmp_sed
It is far from proper formatting tool, but I hope it will do OK as a sample script for reference :] Good luck with your learning.
A quick, flawed attempt, but one that works on your sample input:
BEGIN {depth = 0;}
/{$/ {depth = depth + 1}
/^}/ {depth = depth - 1}
{prefix = ""; for (i = 0; i < depth; i++) { prefix = prefix " "} print prefix $0 ; }
This is an awk script: place it in a file and do
awk -f awk_script_file source_file
Obvious flaws with this include:
It doesn't catch braceless places where you'd like indentation like
if (foo)
bar();
It will modify the indent depth based on braces in comments and string literals
It won't detect { braces followed by comments
I think this could be done through a simple 'sed' script.
Use 1 variable (bCount) that stores the amount of '{' (opening brackets) - (minus) the amount of '}' (closing brackets)
Then I would go through the file and insert 'tabs' according to the actual count of bracets that are used.
So
public class Super{ //bCount=0
public static void main(String[] args){ //bCount=1
System.out.println("Hello world"); //bCount=2
int a=0; //bCount=2
....and so on
so instert 0 tabs on line 0
1 tab on line 1
2 tabs on line 3 and 4 and so on...
It is definitely possible... I just don't understand why you would want to spend your time doing it? :] There are enough tools to do that and any decent IDE provides a way to re-format the source code (including Eclipse).
For example, to format in Eclipse 3.4 (should be similar in other versions) just right click on your project or a file and select "Source > Format" from the menu.
And if you need to change the way it formats the code, just go to the "Preferences > Java > Code Style > Formatter" and change the template. As far as I know, it is very similar in JDeveloper and NetBeans.
Have a look at the CLI for Jalopy. Jalopy is a pretty powerful source formatter.
Consider using Jindent, which is a "a simple Java Indent Tool using Emacs". It's a free shell script which is a part of the Ptolemy project at Berkeley.