Java Code formatting using shell script - java

I know this is silly but I can't overcome my curiosity. Is it possible to write a shell script to format a piece of java code?
For example, if a user writes in a code:
public class Super{
public static void main(String[] args){
System.out.println("Hello world");
int a=0;
if(a==100)
{
System.out.println("Hello world");
}
else
{
System.out.println("Hello world with else");
}
}
}
I would like to write a shell script which would make the code like this.
public class Super
{
public static void main(String[] args)
{
System.out.println("Hello world");
int a=0;
if(a==100){
System.out.println("Hello world");
}
else{
System.out.println("Hello world with else");
}
}
To be precise, we should change the formatting of flower brackets. If it is try/catch or control structures we should change it to same line and if it is function/method/class it should come in next line.I have little knowledge about sed and awk which can do this task so easily. Also I know this can be done using eclipse.

Well, I've had some free time on my hands, so I decided to relive my good old linux days :]
After reading a bit about awk and sed, I've decided that it might be better to use both, as it is easier to add indentation in awk and parse strings in sed.
Here is the ~/sed_script that formats the source file:
# delete indentation
s/^ \+//g
# format lines with class
s/^\(.\+class.\+\) *\({.*\)$/\1\n\2/g
# format lines with methods
s/^\(public\|private\)\( \+static\)\?\( \+void\)\? \+\(.\+(.*)\) *\({.*\)$/\1\2\3 \4\n\5/g
# format lines with other structures
/^\(if\|else\|for\|while\|case\|do\|try\)\([^{]*\)$/,+1 { # get lines not containing '{'
# along with the next line
/.*{.*/ d # delete the next line with '{'
s/\([^{]*\)/\1 {/g # and add '{' to the first line
}
And here is the ~/awk_script that adds indentation:
BEGIN { depth = 0 }
/}/ { depth = depth - 1 }
{
getPrefix(depth)
print prefix $0
}
/{/ { depth = depth + 1 }
function getPrefix(depth) {
prefix = ""
for (i = 0; i < depth; i++) { prefix = prefix " "}
return prefix
}
And you use them like that:
> sed -f ~/sed_script ~/file_to_format > ~/.tmp_sed
> awk -f ~/awk_script ~/.tmp_sed
It is far from proper formatting tool, but I hope it will do OK as a sample script for reference :] Good luck with your learning.

A quick, flawed attempt, but one that works on your sample input:
BEGIN {depth = 0;}
/{$/ {depth = depth + 1}
/^}/ {depth = depth - 1}
{prefix = ""; for (i = 0; i < depth; i++) { prefix = prefix " "} print prefix $0 ; }
This is an awk script: place it in a file and do
awk -f awk_script_file source_file
Obvious flaws with this include:
It doesn't catch braceless places where you'd like indentation like
if (foo)
bar();
It will modify the indent depth based on braces in comments and string literals
It won't detect { braces followed by comments

I think this could be done through a simple 'sed' script.
Use 1 variable (bCount) that stores the amount of '{' (opening brackets) - (minus) the amount of '}' (closing brackets)
Then I would go through the file and insert 'tabs' according to the actual count of bracets that are used.
So
public class Super{ //bCount=0
public static void main(String[] args){ //bCount=1
System.out.println("Hello world"); //bCount=2
int a=0; //bCount=2
....and so on
so instert 0 tabs on line 0
1 tab on line 1
2 tabs on line 3 and 4 and so on...

It is definitely possible... I just don't understand why you would want to spend your time doing it? :] There are enough tools to do that and any decent IDE provides a way to re-format the source code (including Eclipse).
For example, to format in Eclipse 3.4 (should be similar in other versions) just right click on your project or a file and select "Source > Format" from the menu.
And if you need to change the way it formats the code, just go to the "Preferences > Java > Code Style > Formatter" and change the template. As far as I know, it is very similar in JDeveloper and NetBeans.

Have a look at the CLI for Jalopy. Jalopy is a pretty powerful source formatter.

Consider using Jindent, which is a "a simple Java Indent Tool using Emacs". It's a free shell script which is a part of the Ptolemy project at Berkeley.

Related

Adding a line in a method block of java code using python

I have a lot of java files wherein I have to search for a method, if present I have to add a line inside this method "If this line does not already exist". This line has to be added before the closing brace of the method.
So far I have the following code:
import os
import ntpath
extensions = set(['.java','.kt'])
for subdir, dirs, files in os.walk("/src/main"):
for file in files:
filepath = subdir + os.sep + file
extension = os.path.splitext(filepath)[1]
if extension in extensions:
if 'onCreate(' in open(filepath).read():
print (ntpath.basename(filepath))
if 'onPause' in open (filepath).read():
print ("is Activity and contains onPause\n")
#Check if Config.pauseCollectingLifecycleData(); is in this code bloack, if exists do nothing, if does not exist add to the end of code block before }
if 'onResume' in open (filepath).read():
print ("is Activity and contains onResume\n")
#Check if Config.resumeCollectingLifecycleData(); is in this code bloack, if exists do nothing, if does not exist add to the end of code block before }
But I am not sure where to go from here, Python not being my first language. Could I request to be guided in the right direction.
Example:
I am looking for a method with the following signature:
public void onPause(){
super.onPause();
// Add my line here
}
public void onPause(){
super.onPause();
Config.pauseCollectingLifecycleData(); // Line exists do nothing
}
This is actually quite difficult. First of all, your if "onPause" in sourcecode approach currently doesn't distinguish between defining onPause() and calling it. And second of all, finding the correct closing } isn't trivial. Naively, you might just count opening and closing curlies ({ increments the blocklevel, } decrements it), and assume that the } that makes the blocklevel zero is the closing curly of the method. However, this might be wrong! Because the method might contain some string literal containing (possibly unbalanced) curlies. Or comments with curlies. This would mess up the blocklevel count.
To do this properly, you would have to build an actual Java parser. That's a lot of work, even when using libraries such as tatsu.
If you're fine with a rather volatile kludge, you can try and use the blocklevel count mentioned above together with the indentation as a clue (assuming your source code is decently indented). Here's something I've hacked up as a starting point:
def augment_function(sourcecode, function, line_to_insert):
in_function = False
blocklevel = 0
insert_before = None
source = sourcecode.split("\n")
for line_no, line in enumerate(source):
if in_function:
if "{" in line:
blocklevel += 1
if "}" in line:
blocklevel -= 1
if blocklevel == 0:
insert_before = line_no
indent = len(line) - len(line.lstrip(" ")) + 4 #4=your indent level
break
elif function in line and "public " in line:
in_function = True
if "{" in line:
blocklevel += 1
if insert_before:
source.insert(insert_before, " "*indent + line_to_insert)
return "\n".join(source)
# test code:
java_code = """class Foo {
private int foo;
public void main(String[] args) {
foo = 1;
}
public void setFoo(int f)
{
foo = f;
}
public int getFoo(int f) {
return foo;
}
}
"""
print(augment_function(java_code, "setFoo", "log.debug(\"setFoo\")"))
Note that this is vulnerable to all sorts of edge cases (such as { in a string or in a comment, or tab indent instead of space, or possibly a thousand other things). This is just a starting point for you.

Java String Analysis for complete string regular expression

I am looking for a tool like Java String Analysis (JSA) that could sum up a string as a regex. I have tried to do that with JSA, but there I need to search for a specific method like StringBuffer.append or other string operations.
I have strings like that:
StringBuilder test=new StringBuilder("hello ");
boolean codition=false;
if(codition){
test.append("world");
}
else{
test.append("other world");
}
test.append(" so far");
for(int i=0;i<args.length;i++){
test.append(" again hello");
}
// regularExpression = "hello (world| other world) so far( again hello)*"
And my JSA implementation looks like that so far:
public static void main(String[] args) {
StringAnalysis.addDirectoryToClassPath("bootstrap.jar");
StringAnalysis.loadClass("org.apache.catalina.loader.Extension");
List<ValueBox> list = StringAnalysis.getArgumentExpressions("<java.lang.StringBuffer: java.lang.StringBuffer append(java.lang.String)>", 0);
StringAnalysis sa = new StringAnalysis(list);
for (ValueBox e : list) {
Automaton a = sa.getAutomaton(e);
if (a.isFinite()) {
Iterator<String> si = a.getFiniteStrings().iterator();
StringBuilder sb = new StringBuilder();
while (si.hasNext()) {
sb.append((String) si.next());
}
System.out.println(sb.toString());
} else if (a.complement().isEmpty()) {
System.out.println(e.getValue());
} else {
System.out.println("common prefix:" + a.getCommonPrefix());
}
}
}
I would be very appreciated for any help with the JSA tool or for a hint to another tool. My biggest issue with the regex the control flow structure around the string constant.
I'm not aware of a tool which yields you a regex out of the box.
But since you have issues with the CFG I would recommend you to write a static analysis tailored to your problem. You could use a static analysis/bytecode framework like OPAL (Scala) or Soot (Java). You will find tutorials on each project page.
Once you set it up you can load the target jar. You should be able to leverage the control flow of the program then like in the following example:
1 public static void example(String unknown) {
2 String source = "hello";
3 if(Math.random() * 20 > 5){
4 source += "world";
5 } else {
6 source += "unknown";
7 }
8 source += unknown;
}
If your analysis finds a String or StringBuilder which is initialized you can start to build your regular expression. Line number two for instance would bring your regex to "hello". If you meet a conditional in the control flow of your program you can analyze each path and combine them via an "|" later on.
Then branch: "world" (line 4)
Else branch: "unknown" (line 6)
This could be summarized at line 7 to (world)|(unknown) and append to the regex before the conditional.
If you encounter a variable you either can trace it back if you do an inter-procedural analysis or you have to use the wildcard operator ".*" otherwise.
Final regex: "hello((world)|(unknown)).*"
I hope that this leads you to your solution you want to achieve.
Apache Lucene has some tools around finite state automata and regular expressions. In particular, you can take the union of automata, so I'd guess you can easily build an automaton accepting a finite number of words.

How can I trim whitespace by Velocity

I have a method called render_something which can creates a lot of whitespace, for example:
#render_something('xxx')
The result can be:
<a href="#">
something that generate from redner_something
</a>
Which actually I want it to be like this:
something that generate from redner_something
Does velocity has something like this?
#trim(#render_something('xxx'))
I just read this article on Velocity Whitespace Gobbling which suggests a few work-arounds including Velocity Whitespace Truncated By Line Comment.
This basically suggests commenting out line breaks by putting comments at the end of each line. It also suggests not indenting the code in your macros to prevent superfluous (one of my favourite words) spaces occurring.
TBH it's not a great solution but may suit your needs. Simply put ## at the end of each line in your macro and that will make things a little bit nicer... sort of
It seems just java native trim() works.
$someValue.trim() works for me
Solution
In the class where you create the VelocityEngine, add a method as follows
public String trim(String str) {
return str.trim()/*.replace("\n", "").replace("\r", "")*/;
}
then add the following to the VelocityContext that you create:
context.put("trimmer", this);
and finally in the velocity template do the following
$trimmer.trim("#render_something('xxx')")
Why does it work?
Although the behavior of Velocity is clearly define, it can be a bit tricky to see how it works sometimes. The separate trim()-method is necessary to get the char-sequence from the template into a Java method where you can call the actual trim() on the String. As far as I know there is no trim inside Velocity, but you always can call back to Java with tricks like this one.
The double-quotes are necessary because the #render_something is just a macro, not a function call, this means the results of the statements in the macro are put verbatim into the point where the macro is "executed".
I struggled a while to find a straightforward solution to whitespace gobbling, so here the one I finally came up with. It is inspired from and Vadzim's answer and this page http://wiki.apache.org/velocity/StructuredGlobbingResourceLoader
The StructuredGlobbingResourceLoader we can find on the website has a complex behaviour and doesn’t get rid of any kind of whitespace, so I modified it to get the simple behaviour: "Delete any whitespace at the beginning of the lines, and add a comment at the end of each line" (which prevents the linebreak evaluation). The filter is applied on the input stream at loading time.
This kind of velocity template
#if($value)
the value is $value
#end
is transformed to
#if($value)##
the value is $value##
#end##
Then if you want to have linebreaks or beginning of line whitespaces, you'll have to put($br,"\n") and put($sp," ") in your context like Vadzim's explained and explicitly use them in your template. This way of doing will allow you to keep indented templates, with maximum control.
take the class from this page http://wiki.apache.org/velocity/StructuredGlobbingResourceLoader
change the extended class to the kind of loader your need (this one uses the webapp loader)
replace the read() method with the code I provide
use the class as your resource loader in your properties. Example for the webapp loader: webapp.resource.loader.class=...StructuredGlobbingResourceLoader
public int read() throws IOException {
int ch;
switch(state){
case bol: //beginning of line, read until non-indentation character
while(true){
ch = in.read();
if (ch!=(int)' ' && ch!=(int)'\t'){
state = State.content;
return processChar(ch);
}
}
case content:
ch = in.read();
return processChar(ch);
//eol states replace all "\n" by "##\n"
case eol1:
state = State.eol2;
return (int)'#';
case eol2:
state = State.bol;
return (int)'\n';
case eof:
return -1;
}
return -1;
}
//Return the normal character if not end of file or \n
private int processChar(int ch){
switch(ch){
case -1:
state = State.eof;
return -1;
case (int)'\n':
state = State.eol1;
return (int)'#';
default:
return ch;
}
}
Any feedback on my implementation is welcome
Inspired by Velocity Whitespace Truncated By Line Comment one could use block comments instead of line comments for a better looking result:
#foreach( $record in $records )#**
*##if( $record.id == 0 )#**
*##end
#end
With a decent syntax highlighting the comments aren't very obtrusive.
Here is my alternative solution to velocity whitespace gobbling that allows tabbing template structure.
Each template text is preprocessed on first load in custom ResourceLoader:
private String enhanceTemplate(String body) {
if (!body.startsWith("##preserveWhitespace")) {
body = body.replaceAll("(##.*)?[ \\t\\r]*\\n+[ \\t\\r]*", Matcher.quoteReplacement("##\n"));
body = body.trim();
}
return body;
}
This replaces all new lines and adjustent spaces with just one commented newline.
Line breaks and tailing spaces can be inserted explicitly with $br and $sp variables from default context:
private static final VelocityContext DEFAULT_CONTEXT = new VelocityContext(new HashMap<String, String>() {{
put("sp", " ");
put("br", "\n");
}});
In some cases, I've had to essentially minimize my script like I would js or css. It works well, though it is not as easy for humans to read. Just one other option to eliminate the excess space:
<ul class="tabs">#foreach($par in $bodypars)#set( $parLen = ${_MathTool.toInteger($bodypars.size())} )#set( $parLn = $parLen - 1 )#set( $thClass = 'tb'+${parLn} )#set( $thaClass = '' )#if( $foreach.index == 1 )#set( $thClass = ${thClass}+' selected' )#set( $thaClass = ' selected' )#end#if($foreach.index != 0 && $parLen <= $maxTabs)#set ( $btitle = $_XPathTool.selectSingleNode($par,'item-subtitle') )<li class="${thClass}">#if($!btitle && $btitle != '')$_SerializerTool.serialize($btitle, true)#end</li>#end#end</ul>
You can use standard java trim, taking attention to your variable if are a object instead string.
$string.trim() //work fine
$object.trim() //exception
Have a good day!

Write to same location in a console window with java

I would like to write a character to the same location in a console window.
The characters I would like to write are / - \ _. This will get me a little spinner I can display to show progress or loading.
How can you write the chars to the same location though? Otherwise, you will wind up with something like this /-\_/-\_/-\
With Java 6 you can use the Console to do something like this:
class Main {
public static void main(String[] args) throws InterruptedException {
String[] spinner = new String[] {"\u0008/", "\u0008-", "\u0008\\", "\u0008|" };
Console console = System.console();
console.printf("|");
for (int i = 0; i < 1000; i++) {
Thread.sleep(150);
console.printf("%s", spinner[i % spinner.length]);
}
}
}
\u0008 is the special backspace character. Printing that erases the last character on the line. By starting to print a | and then prepending the \u0008 before all other characters you get the spinner behavior.
Note that this might not be 100% compatible with all consoles (and that System.console() can return null).
Also note that you don't necessarily have to use the console class, as printing this sequence to standard output commonly works just as well.
I don't think Java natively allows for that. You need to use some external library - maybe JCurses can help you.

Interactive Antlr

I'm trying to write a simple interactive (using System.in as source) language using antlr, and I have a few problems with it. The examples I've found on the web are all using a per line cycle, e.g.:
while(readline)
result = parse(line)
doStuff(result)
But what if I'm writing something like pascal/smtp/etc, with a "first line" looks like X requirment? I know it can be checked in doStuff, but I think logically it is part of the syntax.
Or what if a command is split into multiple lines? I can try
while(readline)
lines.add(line)
try
result = parse(lines)
lines = []
doStuff(result)
catch
nop
But with this I'm also hiding real errors.
Or I could reparse all lines everytime, but:
it will be slow
there are instructions I don't want to run twice
Can this be done with ANTLR, or if not, with something else?
Dutow wrote:
Or I could reparse all lines everytime, but:
it will be slow
there are instructions I don't want to run twice
Can this be done with ANTLR, or if not, with something else?
Yes, ANTLR can do this. Perhaps not out of the box, but with a bit of custom code, it sure is possible. You also don't need to re-parse the entire token stream for it.
Let's say you want to parse a very simple language line by line that where each line is either a program declaration, or a uses declaration, or a statement.
It should always start with a program declaration, followed by zero or more uses declarations followed by zero or more statements. uses declarations cannot come after statements and there can't be more than one program declaration.
For simplicity, a statement is just a simple assignment: a = 4 or b = a.
An ANTLR grammar for such a language could look like this:
grammar REPL;
parse
: programDeclaration EOF
| usesDeclaration EOF
| statement EOF
;
programDeclaration
: PROGRAM ID
;
usesDeclaration
: USES idList
;
statement
: ID '=' (INT | ID)
;
idList
: ID (',' ID)*
;
PROGRAM : 'program';
USES : 'uses';
ID : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*;
INT : '0'..'9'+;
SPACE : (' ' | '\t' | '\r' | '\n') {skip();};
But, we'll need to add a couple of checks of course. Also, by default, a parser takes a token stream in its constructor, but since we're planning to trickle tokens in the parser line-by-line, we'll need to create a new constructor in our parser. You can add custom members in your lexer or parser classes by putting them in a #parser::members { ... } or #lexer::members { ... } section respectively. We'll also add a couple of boolean flags to keep track whether the program declaration has happened already and if uses declarations are allowed. Finally, we'll add a process(String source) method which, for each new line, creates a lexer which gets fed to the parser.
All of that would look like:
#parser::members {
boolean programDeclDone;
boolean usesDeclAllowed;
public REPLParser() {
super(null);
programDeclDone = false;
usesDeclAllowed = true;
}
public void process(String source) throws Exception {
ANTLRStringStream in = new ANTLRStringStream(source);
REPLLexer lexer = new REPLLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
super.setTokenStream(tokens);
this.parse(); // the entry point of our parser
}
}
Now inside our grammar, we're going to check through a couple of gated semantic predicates if we're parsing declarations in the correct order. And after parsing a certain declaration, or statement, we'll want to flip certain boolean flags to allow- or disallow declaration from then on. The flipping of these boolean flags is done through each rule's #after { ... } section that gets executed (not surprisingly) after the tokens from that parser rule are matched.
Your final grammar file now looks like this (including some System.out.println's for debugging purposes):
grammar REPL;
#parser::members {
boolean programDeclDone;
boolean usesDeclAllowed;
public REPLParser() {
super(null);
programDeclDone = false;
usesDeclAllowed = true;
}
public void process(String source) throws Exception {
ANTLRStringStream in = new ANTLRStringStream(source);
REPLLexer lexer = new REPLLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
super.setTokenStream(tokens);
this.parse();
}
}
parse
: programDeclaration EOF
| {programDeclDone}? (usesDeclaration | statement) EOF
;
programDeclaration
#after{
programDeclDone = true;
}
: {!programDeclDone}? PROGRAM ID {System.out.println("\t\t\t program <- " + $ID.text);}
;
usesDeclaration
: {usesDeclAllowed}? USES idList {System.out.println("\t\t\t uses <- " + $idList.text);}
;
statement
#after{
usesDeclAllowed = false;
}
: left=ID '=' right=(INT | ID) {System.out.println("\t\t\t " + $left.text + " <- " + $right.text);}
;
idList
: ID (',' ID)*
;
PROGRAM : 'program';
USES : 'uses';
ID : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*;
INT : '0'..'9'+;
SPACE : (' ' | '\t' | '\r' | '\n') {skip();};
which can be tested wit the following class:
import org.antlr.runtime.*;
import java.util.Scanner;
public class Main {
public static void main(String[] args) throws Exception {
Scanner keyboard = new Scanner(System.in);
REPLParser parser = new REPLParser();
while(true) {
System.out.print("\n> ");
String input = keyboard.nextLine();
if(input.equals("quit")) {
break;
}
parser.process(input);
}
System.out.println("\nBye!");
}
}
To run this test class, do the following:
# generate a lexer and parser:
java -cp antlr-3.2.jar org.antlr.Tool REPL.g
# compile all .java source files:
javac -cp antlr-3.2.jar *.java
# run the main class on Windows:
java -cp .;antlr-3.2.jar Main
# or on Linux/Mac:
java -cp .:antlr-3.2.jar Main
As you can see, you can only declare a program once:
> program A
program <- A
> program B
line 1:0 rule programDeclaration failed predicate: {!programDeclDone}?
uses cannot come after statements:
> program X
program <- X
> uses a,b,c
uses <- a,b,c
> a = 666
a <- 666
> uses d,e
line 1:0 rule usesDeclaration failed predicate: {usesDeclAllowed}?
and you must start with a program declaration:
> uses foo
line 1:0 rule parse failed predicate: {programDeclDone}?
Here's an example of how to parse input from System.in without first manually parsing it one line at a time and without making major compromises in the grammar. I'm using ANTLR 3.4. ANTLR 4 may have addressed this problem already. I'm still using ANTLR 3, though, and maybe someone else with this problem still is too.
Before getting into the solution, here are the hurdles I ran into that keeps this seemingly trivial problem from being easy to solve:
The built-in ANTLR classes that derive from CharStream consume the entire stream of data up-front. Obviously an interactive mode (or any other indeterminate-length stream source) can't provide all the data.
The built-in BufferedTokenStream and derived class(es) will not end on a skipped or off-channel token. In an interactive setting, this means that the current statement can't end (and therefore can't execute) until the first token of the next statement or EOF has been consumed when using one of these classes.
The end of the statement itself may be indeterminate until the next statement begins.
Consider a simple example:
statement: 'verb' 'noun' ('and' 'noun')*
;
WS: //etc...
Interactively parsing a single statement (and only a single statement) isn't possible. Either the next statement has to be started (that is, hitting "verb" in the input), or the grammar has to be modified to mark the end of the statement, e.g. with a ';'.
I haven't found a way to manage a multi-channel lexer with my solution. It doesn't hurt me since I can replace my $channel = HIDDEN with skip(), but it's still a limitation worth mentioning.
A grammar may need a new rule to simplify interactive parsing.
For example, my grammar's normal entry point is this rule:
script
: statement* EOF -> ^(STMTS statement*)
;
My interactive session can't start at the script rule because it won't end until EOF. But it can't start at statement either because STMTS might be used by my tree parser.
So I introduced the following rule specifically for an interactive session:
interactive
: statement -> ^(STMTS statement)
;
In my case, there are no "first line" rules, so I can't say how easy or hard it would be to do something similar for them. It may be a matter of making a rule like so and execute it at the beginning of the interactive session:
interactive_start
: first_line
;
The code behind a grammar (e.g., code that tracks symbols) may have been written under the assumption that the lifespan of the input and the lifespan of the parser object would effectively be the same. For my solution, that assumption doesn't hold. The parser gets replaced after each statement, so the new parser must be able to pick up the symbol tracking (or whatever) where the last one left off. This is a typical separation-of-concerns problem so I don't think there's much else to say about it.
The first problem mentioned, the limitations of the built-in CharStream classes, was my only major hang-up. ANTLRStringStream has all the workings that I need, so I derived my own CharStream class off of it. The base class's data member is assumed to have all the past characters read, so I needed to override all the methods that access it. Then I changed the direct read to a call to (new method) dataAt to manage reading from the stream. That's basically all there is to this. Please note that the code here may have unnoticed problems and does no real error handling.
public class MyInputStream extends ANTLRStringStream {
private InputStream in;
public MyInputStream(InputStream in) {
super(new char[0], 0);
this.in = in;
}
#Override
// copied almost verbatim from ANTLRStringStream
public void consume() {
if (p < n) {
charPositionInLine++;
if (dataAt(p) == '\n') {
line++;
charPositionInLine = 0;
}
p++;
}
}
#Override
// copied almost verbatim from ANTLRStringStream
public int LA(int i) {
if (i == 0) {
return 0; // undefined
}
if (i < 0) {
i++; // e.g., translate LA(-1) to use offset i=0; then data[p+0-1]
if ((p + i - 1) < 0) {
return CharStream.EOF; // invalid; no char before first char
}
}
// Read ahead
return dataAt(p + i - 1);
}
#Override
public String substring(int start, int stop) {
if (stop >= n) {
//Read ahead.
dataAt(stop);
}
return new String(data, start, stop - start + 1);
}
private int dataAt(int i) {
ensureRead(i);
if (i < n) {
return data[i];
} else {
// Nothing to read at that point.
return CharStream.EOF;
}
}
private void ensureRead(int i) {
if (i < n) {
// The data has been read.
return;
}
int distance = i - n + 1;
ensureCapacity(n + distance);
// Crude way to copy from the byte stream into the char array.
for (int pos = 0; pos < distance; ++pos) {
int read;
try {
read = in.read();
} catch (IOException e) {
// TODO handle this better.
throw new RuntimeException(e);
}
if (read < 0) {
break;
} else {
data[n++] = (char) read;
}
}
}
private void ensureCapacity(int capacity) {
if (capacity > n) {
char[] newData = new char[capacity];
System.arraycopy(data, 0, newData, 0, n);
data = newData;
}
}
}
Launching an interactive session is similar to the boilerplate parsing code, except that UnbufferedTokenStream is used and the parsing takes place in a loop:
MyLexer lex = new MyLexer(new MyInputStream(System.in));
TokenStream tokens = new UnbufferedTokenStream(lex);
//Handle "first line" parser rule(s) here.
while (true) {
MyParser parser = new MyParser(tokens);
//Set up the parser here.
MyParser.interactive_return r = parser.interactive();
//Do something with the return value.
//Break on some meaningful condition.
}
Still with me? Okay, well that's it. :)
If you are using System.in as source, which is an input stream, why not just have ANTLR tokenize the input stream as it is read and then parse the tokens?
You have to put it in doStuff....
For instance, if you're declaring a function, the parse would return a function right? without body, so, that's fine, because the body will come later. You'd do what most REPL do.

Categories