Remove LineComment in Java AST

Remove LineComment in Java AST - java

I´m trying to remove a LineComment from a java file via AST. I read the document from a source file, create an AST parser (AST.JLS3) and afterwards create a CompilationUnit and an ASTRewrite instance.
doc = new Document( doctext );
parser = ASTParser.newParser( AST.JLS3 );
parser.setSource( doc.get().toCharArray() );
cu = (CompilationUnit) parser.createAST( null );
astRewrite = ASTRewrite.create( cu.getAST() );
Nothing special so far, I´m able to add and remove fields a.s.o. Now I´m trying to remove comments from the unit with the following code:
#SuppressWarnings( "unchecked" )
final List<Comment> comments = (List<Comment>) cu.getCommentList();
final Iterator<Comment> commentIter = comments.iterator();
while ( commentIter.hasNext() ) {
final Comment curComment = commentIter.next();
if ( curComment.isLineComment() ) {
final LineComment lineComment = (LineComment) curComment;
lineComment.accept( new CommentCopyrightFieldVisitor( cu, document.get(), astRewrite ) );
}
}
Here´s the visitor that should perform the action and remove the comment.
public class CommentFieldVisitor extends ASTVisitor {
final CompilationUnit cu;
final String sourceCode;
final ASTRewrite astRewrite;
public CommentFieldVisitor( final CompilationUnit cu, final String sourceCode, final ASTRewrite astRewrite ) {
this.cu = cu;
this.sourceCode = sourceCode;
this.astRewrite = astRewrite;
}
#Override
public boolean visit( final LineComment commentNode ) {
int start = commentNode.getStartPosition();
int end = start + commentNode.getLength();
final String comment = sourceCode.substring( start, end );
final String fieldComment = Config.INSTANCE.getTargetFieldComment();
if ( comment != null && comment.equalsIgnoreCase( fieldComment ) ) {
System.out.println( "REMOVE COMMENT" );
assert astRewrite != null : "ERROR: AST Rewriter is null";
astRewrite.remove( commentNode, null );
}
return false;
}
}
I iterate all comments in the compilation unit, I create a visitor for every comment in the list. This visitor checks, if the content of the comment equals a preconfigured string. If it does, it should be removed. Though if I call
astRewrite.remove( commentNode, null );
I always get a NPE from inside the remove method. astRewrite and commentNode are defined (because the remove-code is reached.
Does anyone have an idea, what I might be doing wrong? Or another approach how to remove such a comment via AST?

I managed it via a workaround which uses Comment.getStartPosition() and Comment.getLength(). I use these methods to exract the comments from my source code file and replace them with "". After that I need to re-create the AST tree from the modified source code. This is far away from perfect, but I didn´t find an alternative solution.

A tiny bit late here but I'm sure the answer will help other people.
I can confirm the NPE with remove method when it lacks the required context of a ICompilationUnit to execute. This results in :
REMOVE COMMENT
Exception in thread "main" java.lang.NullPointerException: Cannot invoke "org.eclipse.jdt.core.dom.StructuralPropertyDescriptor.isChildListProperty()" because "property" is null
at org.eclipse.jdt.core.dom.rewrite.ASTRewrite.remove(ASTRewrite.java:398)
Here's how I do when I don't have such context:
String sourceCode = "/* block comment */ public class Hello {} // line comment";
Document doc = new Document(sourceCode);
ASTParser parser = ASTParser.newParser(AST.getJLSLatest());
parser.setSource(doc.get().toCharArray());
CompilationUnit cu = (CompilationUnit) parser.createAST(null);
ASTRewrite astRewrite = ASTRewrite.create(cu.getAST());
TextEdit edits = astRewrite.rewriteAST(doc, null);
final List<Comment> comments = cu.getCommentList();
List<TextEdit> textEdits = new ArrayList<>();
for (Comment curComment : comments) {
if (curComment.isLineComment()) {
final LineComment lineComment = (LineComment) curComment;
int commentStart = lineComment.getStartPosition();
int commentLength = lineComment.getLength();
int commentEnd = commentStart + commentLength;
String comment = sourceCode.substring(commentStart, commentEnd);
if (comment != null && comment.equalsIgnoreCase("// line comment")) {
textEdits.add(new DeleteEdit(commentStart, commentLength));
}
}
}
edits.addChildren(textEdits.toArray(TextEdit[]::new));
try {
edits.apply(doc);
System.out.println(doc.get());
} catch (MalformedTreeException | BadLocationException e) {
e.printStackTrace();
}
See also this project https://github.com/JnRouvignac/AutoRefactor and especially these 2 classes below:
ASTCommentRewriter.java for the solution above with the list of DeleteEdit
CommentsCleanUp.java for the usage of ASTRewrite.remove()

Related

How to efficiently check if read line from Buffered reader contains a string from an enum list

I am a computer science university student working on my first 'big' project outside of class. I'm attempting to read through large text files (2,000 - 3,000 lines of text), line by line with buffered reader. When a keyword from a list of enums is located, I want it to send the current line from buffered reader to its appropriate method to be handled appropriatley.
I have a solution, but I have a feeling in my gut that there is a much better way to handle this situation. Any suggestions or feedback would be greatly appreciated.
Current Solution
I am looping through the the list of enums, then checking if the current enum's toString return is in the current line from buffered reader using the String.contains method.
If the enum is located, the enum is used in a switch statement for the appropriate method call. (I have 13 total cases just wanted to keep the code sample short).
try (BufferedReader reader = new BufferedReader(new FileReader(inputFile.getAbsoluteFile()))){
while ((currentLine = reader.readLine()) != null) {
for (GameFileKeys gameKey : GameFileKeys.values()) {
if (currentLine.contains(gameKey.toString())) {
switch (gameKey) {
case SEAT -> seatAndPlayerAssignment(currentTableArr, currentLine);
case ANTE -> playerJoinLate(currentLine);
}
}
}
}
}
Previous Solution
Originally, I had a nasty list of if statements checking if the current line contained one of the keywords and then handled it appropriatley. Clearly that is far from optimal, but my gut tells me that my current solution is also less than optimal.
try (BufferedReader reader = new BufferedReader(new FileReader(inputFile.getAbsoluteFile()))){
while ((currentLine = reader.readLine()) != null) {
if(currentLine.contains(GameFileKey.SEAT){
seatAndPlayerAssignment(currentTableArr, currentLine);
}
else if(currentLine.contains(GameFileKey.ANTE){
playerJoinLate(currentLine);
}
}
}
Enum Class
In case you need this, or have any general feedback for how I'm implementing my enums.
public enum GameFileKeys {
ANTE("posts ante"),
SEAT("Seat ");
private final String gameKey;
GameFileKeys(String str) {
this.gameKey = str;
}
#Override
public String toString() {
return gameKey;
}
}

I cannot improve over the core of your code: the looping on values() of the enum, performing a String#contains for each enum object’s string, and using a switch. I can make a few minor suggestions.
I suggest you not override the toString method on your enum. The Object#toString method is generally best used only for debugging and logging, not logic or presentation.
Your string passed to constructor of the enum is likely similar to the idea of a display name commonly seen in such enums. The formal enum name (all caps) is used internally within Java, while the display name is used for display to the user or exchanged with external systems. See the Month and DayOfWeek enums as examples offering a getDisplayName method.
Also, an enum should be named in the singular. This avoids confusion with any collections of the enum’s objects.
By the way, looks like you have a stray SPACE in your second enum's argument.
At first I thought it would help to have a list of all the display names, and a map of display name to enum object. However, in the end neither is needed for your purpose. I kept those as they might prove interesting.
public enum GameFileKey
{
ANTE( "posts ante" ),
SEAT( "Seat" );
private String displayName = null;
private static final List < String > allDisplayNames = Arrays.stream( GameFileKey.values() ).map( GameFileKey :: getDisplayName ).toList();
private static final Map < String, GameFileKey > mapOfDisplayNameToGameFileKey = Arrays.stream( GameFileKey.values() ).collect( Collectors.toUnmodifiableMap( GameFileKey :: getDisplayName , Function.identity() ) );
GameFileKey ( String str ) { this.displayName = str; }
public String getDisplayName ( ) { return this.displayName; }
public static GameFileKey forDisplayName ( final String displayName )
{
return
Objects.requireNonNull(
GameFileKey.mapOfDisplayNameToGameFileKey.get( displayName ) ,
"None of the " + GameFileKey.class.getCanonicalName() + " enum objects has a display name of: " + displayName + ". Message # 4dcefee2-4aa2-48cf-bf66-9a4bde02ac37." );
}
public static List < String > allDisplayNames ( ) { return GameFileKey.allDisplayNames; }
}
You can use a stream of the lines of your file being processed. Just FYI, not necessarily better than your code.
public class Demo
{
public static void main ( String[] args )
{
Demo app = new Demo();
app.demo();
}
private void demo ( )
{
try
{
Path path = Demo.getFilePathToRead();
Stream < String > lines = Files.lines( path );
lines.forEach(
line -> {
for ( GameFileKey gameKey : GameFileKey.values() )
{
if ( line.contains( gameKey.getDisplayName() ) )
{
switch ( gameKey )
{
case SEAT -> this.seatAndPlayerAssignment( line );
case ANTE -> this.playerJoinLate( line );
}
}
}
}
);
}
catch ( IOException e )
{
throw new RuntimeException( e );
}
}
private void playerJoinLate ( String line )
{
System.out.println( "line = " + line );
}
private void seatAndPlayerAssignment ( String line )
{
System.out.println( "line = " + line );
}
public static Path getFilePathToRead ( ) throws IOException
{
Path tempFile = Files.createTempFile( "bogus" , ".txt" );
Files.write( tempFile , "apple\nSeat\norange\nposts ante\n".getBytes() );
return tempFile;
}
}
When run:
line = Seat
line = posts ante

Extract word document comments and the text they comment on

I need to extract word document comments and the text they comment on. Below is my current solution, but it is not working as expcted
public class Main {
public static void main(String[] args) throws Exception {
var document = new Document("sample.docx");
NodeCollection<Paragraph> paragraphs = document.getChildNodes(PARAGRAPH, true);
List<MyComment> myComments = new ArrayList<>();
for (Paragraph paragraph : paragraphs) {
var comments = getComments(paragraph);
int commentIndex = 0;
if (comments.isEmpty()) continue;
for (Run run : paragraph.getRuns()) {
var runText = run.getText();
for (int i = commentIndex; i < comments.size(); i++) {
Comment comment = comments.get(i);
String commentText = comment.getText();
if (paragraph.getText().contains(runText + commentText)) {
myComments.add(new MyComment(runText, commentText));
commentIndex++;
break;
}
}
}
}
myComments.forEach(System.out::println);
}
private static List<Comment> getComments(Paragraph paragraph) {
#SuppressWarnings("unchecked")
NodeCollection<Comment> comments = paragraph.getChildNodes(COMMENT, false);
List<Comment> commentList = new ArrayList<>();
comments.forEach(commentList::add);
return commentList;
}
static class MyComment {
String text;
String commentText;
public MyComment(String text, String commentText) {
this.text = text;
this.commentText = commentText;
}
#Override
public String toString() {
return text + "-->" + commentText;
}
}
}
sample.docx contents are:
And the output is (which is incorrect):
factors-->This is word comment
%–10% of cancers are caused by inherited genetic defects from a person's parents.-->Second paragraph comment
Expected output is:
factors-->This is word comment
These factors act, at least partly, by changing the genes of a cell. Typically, many genetic changes are required before cancer develops. Approximately 5%–10% of cancers are caused by inherited genetic defects from a person's parents.-->Second paragraph comment
These factors act, at least partly, by changing the genes of a cell. Typically, many genetic changes are required before cancer develops. Approximately 5%–10% of cancers are caused by inherited genetic defects from a person's parents.-->First paragraph comment
Please help me with a better way of extarcting word document comments and the text they comment on. If you need additional details let me know, I will provide all the required details

The commented text is marked by special nodes CommentRangeStart and CommentRangeEnd. CommentRangeStart and CommentRangeEnd nodes has Id, which corresponds the Comment id the range is linked to. So you need to extract content between the corresponding start and end nodes.
By the way, the code example in the Aspose.Words API reference shows how print the contents of all comments and their comment ranges using a document visitor. Looks like exactly what you are looking for.
EDIT: You can use code like the following to accomplish your task. I did not provide full code for extracting content between nodes, is is availabel on GitHub
Document doc = new Document("C:\\Temp\\in.docx");
// Get the comments in the document.
Iterable<Comment> comments = doc.getChildNodes(NodeType.COMMENT, true);
Iterable<CommentRangeStart> commentRangeStarts = doc.getChildNodes(NodeType.COMMENT_RANGE_START, true);
Iterable<CommentRangeEnd> commentRangeEnds = doc.getChildNodes(NodeType.COMMENT_RANGE_END, true);
for (Comment c : comments)
{
System.out.println(String.format("Comment %d : %s", c.getId(), c.toString(SaveFormat.TEXT)));
CommentRangeStart start = null;
CommentRangeEnd end = null;
// Search for an appropriate start and end.
for (CommentRangeStart s : commentRangeStarts)
{
if (c.getId() == s.getId())
{
start = s;
break;
}
}
for (CommentRangeEnd e : commentRangeEnds)
{
if (c.getId() == e.getId())
{
end = e;
break;
}
}
if (start != null && end != null)
{
// Extract content between the start and end nodes.
// Code example how to extract content between nodes is here
// https://github.com/aspose-words/Aspose.Words-for-Java/blob/master/Examples/src/main/java/com/aspose/words/examples/programming_documents/document/ExtractContentBetweenCommentRange.java
}
else
{
System.out.println(String.format("Comment %d Does not have comment range"));
}
}

Java + MongoDB: how get a nested field value using complete path?

I have this path for a MongoDB field main.inner.leaf and every field couldn't be present.
In Java I should write, avoiding null:
String leaf = "";
if (document.get("main") != null &&
document.get("main", Document.class).get("inner") != null) {
leaf = document.get("main", Document.class)
.get("inner", Document.class).getString("leaf");
}
In this simple example I set only 3 levels: main, inner and leaf but my documents are deeper.
So is there a way avoiding me writing all these null checks?
Like this:
String leaf = document.getString("main.inner.leaf", "");
// "" is the deafult value if one of the levels doesn't exist
Or using a third party library:
String leaf = DocumentUtils.getNullCheck("main.inner.leaf", "", document);
Many thanks.

Since the intermediate attributes are optional you really have to access the leaf value in a null safe manner.
You could do this yourself using an approach like ...
if (document.containsKey("main")) {
Document _main = document.get("main", Document.class);
if (_main.containsKey("inner")) {
Document _inner = _main.get("inner", Document.class);
if (_inner.containsKey("leaf")) {
leafValue = _inner.getString("leaf");
}
}
}
Note: this could be wrapped up in a utility to make it more user friendly.
Or use a thirdparty library such as Commons BeanUtils.
But, you cannot avoid null safe checks since the document structure is such that the intermediate levels might be null. All you can do is to ease the burden of handling the null safety.
Here's an example test case showing both approaches:
#Test
public void readNestedDocumentsWithNullSafety() throws IllegalAccessException, NoSuchMethodException, InvocationTargetException {
Document inner = new Document("leaf", "leafValue");
Document main = new Document("inner", inner);
Document fullyPopulatedDoc = new Document("main", main);
assertThat(extractLeafValueManually(fullyPopulatedDoc), is("leafValue"));
assertThat(extractLeafValueUsingThirdPartyLibrary(fullyPopulatedDoc, "main.inner.leaf", ""), is("leafValue"));
Document emptyPopulatedDoc = new Document();
assertThat(extractLeafValueManually(emptyPopulatedDoc), is(""));
assertThat(extractLeafValueUsingThirdPartyLibrary(emptyPopulatedDoc, "main.inner.leaf", ""), is(""));
Document emptyInner = new Document();
Document partiallyPopulatedMain = new Document("inner", emptyInner);
Document partiallyPopulatedDoc = new Document("main", partiallyPopulatedMain);
assertThat(extractLeafValueManually(partiallyPopulatedDoc), is(""));
assertThat(extractLeafValueUsingThirdPartyLibrary(partiallyPopulatedDoc, "main.inner.leaf", ""), is(""));
}
private String extractLeafValueUsingThirdPartyLibrary(Document document, String path, String defaultValue) {
try {
Object value = PropertyUtils.getNestedProperty(document, path);
return value == null ? defaultValue : value.toString();
} catch (Exception ex) {
return defaultValue;
}
}
private String extractLeafValueManually(Document document) {
Document inner = getOrDefault(getOrDefault(document, "main"), "inner");
return inner.get("leaf", "");
}
private Document getOrDefault(Document document, String key) {
if (document.containsKey(key)) {
return document.get(key, Document.class);
} else {
return new Document();
}
}

Iterating over tokens in HIDDEN channel

I am currently working on creating an IDE for the custom, very lua-like scripting language MobTalkerScript (MTS), which provides me with an ANTLR4 lexer. Since the specifications from the language file for MTS puts comments into the HIDDEN_CHANNEL channel, I need to tell the lexer to actually read from the HIDDEN_CHANNEL channel. This is how I tried to do that.
Mts3Lexer lexer = new Mts3Lexer(new ANTLRInputStream("<replace this with the input>"));
lexer.setTokenFactory(new CommonTokenFactory(false));
lexer.setChannel(Token.HIDDEN_CHANNEL);
Token token = lexer.emit();
int type = token.getType();
do {
switch(type) {
case Mts3Lexer.LINE_COMMENT:
case Mts3Lexer.COMMENT:
System.out.println("token "+token.getText()+" is a comment");
default:
System.out.println("token "+token.getText()+" is not a comment");
}
} while((token = lexer.nextToken()) != null && (type = token.getType()) != Token.EOF);
Now, if I use this code on the following input, nothing but token ... is not a comment gets printed to the console.
function foo()
-- this should be a single-line comment
something = "blah"
--[[ this should
be a multi-line
comment ]]--
end
The tokens containing the comments never show up, though. So I searched for the source of this problem and found the following method in the ANTLR4 Lexer class:
/** Return a token from this source; i.e., match a token on the char
* stream.
*/
#Override
public Token nextToken() {
if (_input == null) {
throw new IllegalStateException("nextToken requires a non-null input stream.");
}
// Mark start location in char stream so unbuffered streams are
// guaranteed at least have text of current token
int tokenStartMarker = _input.mark();
try{
outer:
while (true) {
if (_hitEOF) {
emitEOF();
return _token;
}
_token = null;
_channel = Token.DEFAULT_CHANNEL;
_tokenStartCharIndex = _input.index();
_tokenStartCharPositionInLine = getInterpreter().getCharPositionInLine();
_tokenStartLine = getInterpreter().getLine();
_text = null;
do {
_type = Token.INVALID_TYPE;
// System.out.println("nextToken line "+tokenStartLine+" at "+((char)input.LA(1))+
// " in mode "+mode+
// " at index "+input.index());
int ttype;
try {
ttype = getInterpreter().match(_input, _mode);
}
catch (LexerNoViableAltException e) {
notifyListeners(e); // report error
recover(e);
ttype = SKIP;
}
if ( _input.LA(1)==IntStream.EOF ) {
_hitEOF = true;
}
if ( _type == Token.INVALID_TYPE ) _type = ttype;
if ( _type ==SKIP ) {
continue outer;
}
} while ( _type ==MORE );
if ( _token == null ) emit();
return _token;
}
}
finally {
// make sure we release marker after match or
// unbuffered char stream will keep buffering
_input.release(tokenStartMarker);
}
}
The line that caught my eye was the following.
_channel = Token.DEFAULT_CHANNEL;
I don't know much about ANTLR, but apparently this line keeps the lexer in the DEFAULT_CHANNEL channel.
Is the way I tried to read from the HIDDEN_CHANNEL channel right or can't I use nextToken() with the hidden channel?

I found out why the lexer didn't give me any tokens containing the comments - I seem to have missed that the grammar file skips comments instead of putting them into the hidden channel. Contacted the author, changed the grammar file and now it works.
Note to myself: pay more attention to what you read.

For Go (golang) this snippet works for me:
import (
"github.com/antlr/antlr4/runtime/Go/antlr"
)
type antlrparser interface {
GetParser() antlr.Parser
}
func fullText(prc antlr.ParserRuleContext) string {
p := prc.(antlrparser).GetParser()
ts := p.GetTokenStream()
tx := ts.GetTextFromTokens(prc.GetStart(), prc.GetStop())
return tx
}
just pass your ctx.GetSomething() into fullText. Of course, as shown above, whitespace has to go to the hidden channel in the *.g4 file:
WS: [ \t\r\n] -> channel(HIDDEN);

Using ANTLR to identify global variable declarations in a JavaScript file

I've been using the ANTLR supplied ECMAScript grammar with the objective of identifying JavaScript global variables. An AST is produced and I'm now wondering what the based way of filtering out the global variable declarations is.
I'm interested in looking for all of the outermost "variableDeclaration" tokens in my AST; the actual how-to-do-this is eluding me though. Here's my set up code so far:
String input = "var a, b; var c;";
CharStream cs = new ANTLRStringStream(input);
JavaScriptLexer lexer = new JavaScriptLexer(cs);
CommonTokenStream tokens = new CommonTokenStream();
tokens.setTokenSource(lexer);
JavaScriptParser parser = new JavaScriptParser(tokens);
program_return programReturn = parser.program();
Being new to ANTLR can anyone offer any pointers?

I guess you're using this grammar.
Although that grammar suggests a proper AST is created, this is not the case. It uses some inline operators to exclude certain tokens from the parse-tree, but it never creates any roots for the tree, resulting in a completely flat parse tree. From this, you can't get all global vars in a reasonable way.
You'll need to adjust the grammar slightly:
Add the following under the options { ... } at the top of the grammar file:
tokens
{
VARIABLE;
FUNCTION;
}
Now replace the following rules: functionDeclaration, functionExpression and variableDeclaration with these:
functionDeclaration
: 'function' LT* Identifier LT* formalParameterList LT* functionBody
-> ^(FUNCTION Identifier formalParameterList functionBody)
;
functionExpression
: 'function' LT* Identifier? LT* formalParameterList LT* functionBody
-> ^(FUNCTION Identifier? formalParameterList functionBody)
;
variableDeclaration
: Identifier LT* initialiser?
-> ^(VARIABLE Identifier initialiser?)
;
Now a more suitable tree is generated. If you now parse the source:
var a = 1; function foo() { var b = 2; } var c = 3;
the following tree is generated:
All you now have to do is iterate over the children of the root of your tree and when you stumble upon a VARIABLE token, you know it's a "global" since all other variables will be under FUNCTION nodes.
Here's how to do that:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
public class Main {
public static void main(String[] args) throws Exception {
String source = "var a = 1; function foo() { var b = 2; } var c = 3;";
ANTLRStringStream in = new ANTLRStringStream(source);
JavaScriptLexer lexer = new JavaScriptLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
JavaScriptParser parser = new JavaScriptParser(tokens);
JavaScriptParser.program_return returnValue = parser.program();
CommonTree tree = (CommonTree)returnValue.getTree();
for(Object o : tree.getChildren()) {
CommonTree child = (CommonTree)o;
if(child.getType() == JavaScriptParser.VARIABLE) {
System.out.println("Found a global var: "+child.getChild(0));
}
}
}
}
which produces the following output:
Found a global var: a
Found a global var: c

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Remove LineComment in Java AST - java

Related

How to efficiently check if read line from Buffered reader contains a string from an enum list

Extract word document comments and the text they comment on

Java + MongoDB: how get a nested field value using complete path?

Iterating over tokens in HIDDEN channel

Using ANTLR to identify global variable declarations in a JavaScript file

Categories

Resources