Find specific node in parse tree depending on children it has

Find specific node in parse tree depending on children it has - java

To automate a security review of C# code, I want to retrieve all methods from controllers that do have a [HttpPost] attribute, but do not have a [ValidateAntiForgeryToken] attribute. I am using ANTLR to get a ParseTree of the C# code. When I have that, what is the best way to obtain the nodes that have a HttpPost child but not a ValidateAntiForgeryToken child?
I have tried XPath, but it seems ANTLR only supports a subset of XPath. I am considering converting the parse tree to XML and use real XPath on it. Is there an easier way?
I am using the following code to parse the C# file:
import java.io.*;
import java.util.*;
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.*;
import org.antlr.v4.runtime.tree.xpath.*;
public class MyParser {
public static void main(String[] args) throws IOException {
CharStream input = CharStreams.fromFileName(args[0]);
Lexer lexer = new CSharpLexer(input);
TokenStream stream = new CommonTokenStream(lexer);
CSharpParser parser = new CSharpParser(stream);
ParseTree tree = parser.compilation_unit();
String xpath = "//class_member_declaration";
Collection<ParseTree> matches = XPath.findAll(tree, xpath, parser);
System.out.println(matches);
}
}
The tree looks like this:

Antlr4 does not support fancy matching on the ParseTree besides a subset of XPath. However, that is also probably the wrong way to solve this problem.
For most use cases, you should walk through the parse tree and collect the information you want. This can be done using a listener or a visitor. For example, the following code collects methods and attributes and prints methods that have certain attributes:
import java.util.*;
public class MyListener extends CSharpParserBaseListener {
String currentClass = null;
String currentMethod = null;
List<String> attributes;
boolean inClassMember = false;
#Override public void enterClass_definition(CSharpParser.Class_definitionContext ctx) {
this.currentClass = ctx.identifier().getText();
}
// Class member declaration. This thing holds both the attributes and the method declaration.
#Override public void enterClass_member_declaration(CSharpParser.Class_member_declarationContext ctx) {
this.attributes = new ArrayList<String>();
this.inClassMember = true;
}
#Override public void enterAttribute(CSharpParser.AttributeContext ctx) {
if (this.inClassMember) {
String attrName = ctx.namespace_or_type_name().identifier().get(0).getText();
this.attributes.add(attrName);
}
}
#Override public void enterMethod_declaration(CSharpParser.Method_declarationContext ctx) {
this.currentMethod = ctx.method_member_name().identifier().get(0).getText();
}
// In the exit we have collected our method name and attributes.
#Override public void exitClass_member_declaration(CSharpParser.Class_member_declarationContext ctx) {
if (this.attributes.contains("HttpPost") && !this.attributes.contains("ValidateAntiForgeryToken")) {
System.out.println(this.currentClass + "." + this.currentMethod);
}
this.attributes = null;
this.currentMethod = null;
this.inClassMember = false;
}
}
To make this more versatile, a better approach would be to convert the parse tree to another tree (i.e. abstract syntax tree) and search that tree for the information you want.

I am considering converting the parse tree to XML and use real XPath on it. Is there an easier way?
There is no need to convert a tree to actual XML in order to use XPath queries. The Apache Commons libary JXPath supports XPath queries on in-memory trees of Java objects.

Related

Sharing Antlr visitor code between visitors

I have two grammar files/visitors, Simple and Complex, that parse JSON objects into strings. The Complex objects that I'm parsing can essentially contain a number of Simple objects (along with additional things). For simplicity's sake let's say that when I parse a base Simple object (not a Simple object contained within the Complex object) I want to start the string with something like "Simple start: ", but when I reach a Simple object within a Complex object I want to start it with something else, say "Simple within Complex: ".
So currently I have two different visitor classes, the Simple visitor's visitSimpleObject method will return the String starting with "Simple Start: " whereas the Complex visitor's visitSimpleObject method will return the String starting with "Simple within Complex: ". Besides this difference everything else should be the same, everything else within a Simple object can be parsed the same whether it is on it's own or inside a Complex object.
My question is, how can I share code between these two visitors? Obviously I could copy and paste all the applicable SimpleVisitor code into the ComplexVisitor but then I'll have to keep them in sync for any changes.
Note: The two visitor classes already extend a BaseVisitor class so I can't use typical inheritance

You can use inheritance to do this, see for example:
https://docs.oracle.com/javase/tutorial/java/IandI/subclasses.html
A short example would be something like:
abstract class Visitor{
public void sharedMethod() {
//Do something
}
public abstract void visitSimpleObject();
}
class SimpleVisitor extends Visitor{
#Override
public void visitSimpleObject() {
System.out.println("Simple Start:");
}
}
class ComplexVisitor extends Visitor{
#Override
public void visitSimpleObject() {
System.out.println("Simple within Complex:");
}
}
In this case both your visitors would extend from the same superclass (Visitor) where the shared code is. Both subclasses can define their own behavior. The superclass can itself also extend another kind of visitor (or implement an interface).
EDIT
After the comments maybe more like this:
Sample Ex.g4:
grammar Ex;
START_COMPLEX : 'complex';
START_SIMPLE : 'simple';
SEPERATOR : ':';
TEXT : [A-Za-z]+;
simple : START_SIMPLE ' ' SEPERATOR ' ' TEXT;
complex : START_COMPLEX ' ' SEPERATOR ' ' TEXT;
And a code sample:
public class Example{
abstract class Visitor extends ExBaseVisitor<String>{
#Override
public String visitComplex(ExParser.ComplexContext ctx) {
System.out.println("Visiting complex");
return "";
}
}
class SimpleVisitor extends Visitor{
#Override
public String visitSimple(ExParser.SimpleContext ctx) {
System.out.println("Visiting Simple! " + ctx.TEXT());
return "";
}
}
class ComplexVisitor extends Visitor{
#Override
public String visitSimple(ExParser.SimpleContext ctx) {
System.out.println("Visiting Simple, from within complex! " + ctx.TEXT());
return "";
}
}
public static void main(String[] args) {
String text = "simple : hi";
CharStream charStream = new ANTLRInputStream(text);
ExLexer exLexer = new ExLexer(charStream);
TokenStream tokenStream = new CommonTokenStream(exLexer);
ExParser exParser = new ExParser(tokenStream);
ComplexVisitor complexVisitor = new Example().new ComplexVisitor();
complexVisitor.visit(exParser.simple());
String text2 = "simple : hmmmmm";
CharStream charStream2 = new ANTLRInputStream(text2);
ExLexer exLexer2 = new ExLexer(charStream2);
TokenStream tokenStream2 = new CommonTokenStream(exLexer2);
ExParser exParser2 = new ExParser(tokenStream2);
SimpleVisitor simpleVisitor = new Example().new SimpleVisitor();
simpleVisitor.visit(exParser2.simple());
}
}
For me this prints:
Visiting Simple, from within complex! hi
Visiting Simple! hmmmmm
The shared code is now visitComplex which is maybe a bit silly but for the example maybe ok.

java parse tweet corpus json

I have a problem: I need to parse a JSON file in Java where each line represents a tweet and follows the standard JSON of Twitter. I do not need all the information, I attach two photos to show you which fields I need. I would do it without using any support library. Thank you!
This is what I did for now. I do not think it's the best way to do it, especially going ahead I'll be in trouble because the names of many fields repeat
import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
public class TweetCorpus implements Iterable<Tweet>
{
private List<Tweet> tweets;
public static TweetCorpus parseFile(File file)
{
List<Tweet> tweets = new ArrayList<>();
try(BufferedReader br = Files.newBufferedReader(file.toPath()))
{
while(br.ready())
{
String tweet = br.readLine();
//System.out.println(tweet);
if(!tweet.isEmpty())
{
long l = Long.parseLong(tweet.substring(tweet.indexOf("\"id\":") + 5, tweet.indexOf(",\"id_str\":")));
String t = tweet.substring(tweet.indexOf(",\"text\":\"") + 9, tweet.indexOf(",\"source\":"));
tweets.add(new Tweet(l, t));
}
}
}
catch(IOException e)
{
e.printStackTrace();
}
return new TweetCorpus(tweets);
}
public int getTweetCount() { return tweets.size(); }
public TweetCorpus(List<Tweet> tweets)
{
this.tweets = tweets;
}
#Override
public Iterator<Tweet> iterator()
{
return tweets.iterator();
}
public static void main(String[] args)
{
TweetCorpus t = parseFile(new File("C:\\Users\\acer\\Desktop\\Moroder\\Uni\\1 Anno - 2 Semestre\\Metodologie Di Programmazione\\Progetto\\HM4Test\\tweetsCorpus.js"));
t.getTweetCount();
}
}
json media/retweet tweet
json "normal" tweet

You can use Gson or Jackson java library to parse json to Tweet object. Their are tools online which generates pojo from json, which you can use with jackson to parse your json string to object.
Once you have json values in an object, you can use getters/setters to extract/modify the values you are interested in from input json.
Well writing your own parser would be a reinventing the wheel kind of task. But if your need is to write your own parser, refer to jackson project on github for inspiration on design and maintenance.
This would help you in making a generic application.
Quick reference for jackson parser ,
https://dzone.com/articles/processing-json-with-jackson

Re-inventing a JSON parser using only readLine() is a really bad idea. If you don't have experience writing parsers by hand, you will end up with a lot of bad code that is really hard to understand. Just use a library. There are tons of good JSON libraries for Java.
Jackson
GSON
Boon
Example code:
static class User {
String id, name;
}
static class MyTweet {
String id, text;
User user;
}
// if the entire file is a JSON array:
void parse(Reader r) {
List<MyTweet> tweets = objectMapper.readValue(
r, new TypeReference<List<MyTweet>>(){});
}
// if each line is a single JSON object:
void parse(BufferedReader r) {
while (r.ready()) {
String line = r.readLine();
MyTweet tweet = objectMapper.readValue(line, MyTweet.class);
}
}

Strategy Pattern too many if statements

A user enters a code and the type of that code is determined by regular expressions. There are many different type of codes, such as EAN, ISBN, ISSN and so on. After the type is detected, a custom query has to be created for the code. I thought it might be a good idea to create a strategy for type, but with time it feels wrong.
public interface SearchQueryStrategie {
SearchQuery createSearchQuery(String code);
}
-
public class IssnSearchQueryStrategie implements SearchQueryStrategie {
#Override
public SearchQuery createSearchQuery(final String code) {
// Create search query for issn number
}
}
-
public class IsbnSearchQueryStrategie implements SearchQueryStrategie {
#Override
public SearchQuery createSearchQuery(final String code) {
// Create search query for ISBN number
}
}
-
public class EanShortNumberSearchQueryStrategie implements SearchQueryStrategie {
#Override
public SearchQuery createSearchQuery(final String code) {
// Create search query for ean short number
}
}
-
public class TestApplication {
public static void main(final String... args) {
final String code = "1144875X";
SearchQueryStrategie searchQueryStrategie = null;
if (isIssn(code)) {
searchQueryStrategie = new IssnSearchQueryStrategie();
} else if (isIsbn(code)) {
searchQueryStrategie = new IsbnSearchQueryStrategie();
} else if (isEan(code)) {
searchQueryStrategie = new EanShortNumberSearchQueryStrategie();
}
if (searchQueryStrategie != null) {
performSearch(searchQueryStrategie.createSearchQuery(code));
}
}
private SearchResult performSearch(final SearchQuery searchQuery) {
// perform search
}
// ...
}
I have to say that there are many more strategies. How should I dispatch the code to the right strategy?
My second approach was to put a boolean method into every strategy to decide if the code is correct for that strategy.
public class TestApplication {
final SearchQueryStrategie[] searchQueryStrategies = {new IssnSearchQueryStrategie(), new IsbnSearchQueryStrategie(),
new EanShortNumberSearchQueryStrategie()};
public static void main(final String... args) {
final String code = "1144875X";
for (final SearchQueryStrategie searchQueryStrategie : searchQueryStrategie) {
if (searchQueryStrategie.isRightCode(code)) {
searchQueryStrategie.createSearchQuery(code);
break;
}
}
}
private SearchResult performSearch(final SearchQuery searchQuery) {
// perform search
}
// ...
}
How would you solve this problem? Is the strategy pattern the right one for my purposes?

If you are using Java 8 and you can profit from the functional features I think one Enum will be sufficient.
You can avoid using if/else statements by mapping each type of code with a Function that will return the query that needs to be executed:
import java.util.HashMap;
import java.util.Map;
import java.util.function.Function;
import java.util.regex.Pattern;
public enum CodeType
{
EAN("1|2|3"),
ISBN("4|5|6"),
ISSN("7|8|9");
String regex;
Pattern pattern;
CodeType(String regex)
{
this.regex = regex;
this.pattern = Pattern.compile(regex);
}
private static Map<CodeType, Function<String, String>> QUERIES =
new HashMap<>();
static
{
QUERIES.put(EAN, (String code) -> String.format("Select %s from EAN", code));
QUERIES.put(ISBN, (String code) -> String.format("Select %s from ISBB", code));
QUERIES.put(ISSN, (String code) -> String.format("Select %s from ISSN", code));
}
private static CodeType evalType(String code)
{
for(CodeType codeType : CodeType.values())
{
if (codeType.pattern.matcher(code).matches())
return codeType;
}
// TODO DON'T FORGET ABOUT THIS NULL HERE
return null;
}
public static String getSelect(String code)
{
Function<String, String> function = QUERIES.get(evalType(code));
return function.apply(code);
}
}
And in the main you can test your query:
public class Main
{
public static void main(String... args)
{
System.out.println(CodeType.getSelect("1"));
// System.out: Select 1 from EAN
System.out.println(CodeType.getSelect("4"));
// System.out: Select 4 from ISBB
System.out.println(CodeType.getSelect("9"));
// System.out: Select 9 from ISSN
}
}
I usually tend to keep the code as compact as possible.
Some people dislike enums, so I believe you can use a normal class instead.
You can engineer further the way you obtain the QUERIES (selects), so instead of having String templates you can have a Runnable there.
If you don't want to use the the functional aspects of Java 8 you can use Strategy objects that are associated with each type of code:
import java.util.HashMap;
import java.util.Map;
import java.util.function.Function;
import java.util.regex.Pattern;
public enum CodeType2
{
EAN("1|2|3", new StrategyEAN()),
ISBN("4|5|6", new StrategyISBN()),
ISSN("7|8|9", new StrategyISSN());
String regex;
Pattern pattern;
Strategy strategy;
CodeType2(String regex, Strategy strategy)
{
this.regex = regex;
this.pattern = Pattern.compile(regex);
this.strategy = strategy;
}
private static CodeType2 evalType(String code)
{
for(CodeType2 codeType2 : CodeType2.values())
{
if (codeType2.pattern.matcher(code).matches())
return codeType2;
}
// TODO DON'T FORGET ABOUT THIS NULL HERE
return null;
}
public static void doQuery(String code)
{
evalType(code).strategy.doQuery(code);
}
}
interface Strategy { void doQuery(String code); }
class StrategyEAN implements Strategy {
#Override
public void doQuery(String code)
{
System.out.println("EAN-" + code);
}
}
class StrategyISBN implements Strategy
{
#Override
public void doQuery(String code)
{
System.out.println("ISBN-" + code);
}
}
class StrategyISSN implements Strategy
{
#Override
public void doQuery(String code)
{
System.out.println("ISSN-" + code);
}
}
And the main method will look like this:
public class Main
{
public static void main(String... args)
{
CodeType2.doQuery("1");
CodeType2.doQuery("4");
CodeType2.doQuery("9");
}
}

So, The strategy pattern is indeed the right choice here, but strategy by itself is not enough. You have several options:
Use a Factory with simple if/else or switch. It's ugly, error prone to extend with new strategies, but is simple and quick to implement.
Use a registry. During the application initialization phase you can register in a registry each SearchQueryStratgeyFactory with the right code. For instance if you use a simple Map you can just do :
strategyRegistry.put("isbn", new IsbnSearchStrategyFactory());
strategyRegistry.put("ean", new EanSearchStrategyFactory());
.... and so on
Then when you need to get the right strategy you just get() the strategy factory from the map using the code id. This approach is better if you have a lot of strategies, but it requires an aditional iitialization step during the application startup.
Use a service locator. ServiceLocator is a pattern that enables the dynamic lookup of implementations. Java comes with an implementation of the ServiceLocator pattern -> the infamous ServiceLoader class. This is my favourite approach because it allows for complete decoupling of the consumer and implementation. Also using the service locator you can easily add new strategies without having to modify the existing code. I won't explain how to use the ServiceLoader - there is plenty of information online. I'll just mention that using the service locator you'll need to implement a "can process such codes ?" logic in each strategy factory. For instance if the factory cannot create a strategy for "isbn" then return null and try with the next factory.
Also note that in all cases you work with factories that produce the strategy implementations.
PS: It's strategy not strategie :)

Your approach is not the Strategy Pattern. Strategy Pattern is all about customizing behavior of an object (Context in terms of this pattern) by passing alternative Strategy object to it. By this way, we don't need to modify the source code of the Context class but still can customize the behavior of objects instanced from it.
Your problem is somewhat related to the Chain of Responsibility (CoR) Pattern where you have a request (your code) and need to figure out which SearchQueryStrategie in a predefined list should handle the request.
The second approach -- using array -- that you mentioned is fine. However, to make it usable in production code, you must have another object -- let's say Manager -- that manages the array and is responsible to find the relevant element for each request. So your client code have to depend on two objects: the Manager and the result SearchQueryStrategie. As you can see, the source code of Manager class tend to be changed frequently because new implementations of SearchQueryStrategie may come. This might make your client annoyed.
That's why the CoR Pattern uses the linked list mechanism instead of array. Each SearchQueryStrategie object A would hold a reference to a next SearchQueryStrategie B. If A cannot handle the request, it will delegate to B (it can even decorate the request before delegating). Of course, somewhere still must know all kinds of strategies and create a linked list of SearchQueryStrategie, but your client will then depend only on a SearchQueryStrategie object (the head one of the list).
Here is the code example:
class SearchQueryConsumer {
public void consume(SearchQuery sq) {
// ...
}
}
abstract class SearchQueryHandler {
protected SearchQueryHandler next = null;
public void setNext(SearchQueryHandler next) { this.next = next; }
public abstract void handle(String code, SearchQueryConsumer consumer);
}
class IssnSearchQueryHandler extends SearchQueryHandler {
#Override
public void handle(String code, SearchQueryConsumer consumer) {
if (issn(code)) {
consumer.consume(/* create a SearchQuery */);
} else if (next != null) {
next.handle(code, consumer);
}
}
private boolean issn(String code) { ... }
}

What i recommend is using the Factory pattern. It describes and handles your scenario better.
Factory Pattern

You can design in the following way (using concepts of factory DP and polymorphism):
Code as interface.
ISSNCode, ISBNCode and EANCode as concrete classes
implementing Code interface, having single-arg constructor taking text as String.
Code has method getInstanceOfCodeType(String text) which returns an instance of a sub-class of Code (decided by checking the type of text passed to it). Let's say the returned value be code
Class SearchQueryStrategieFactory with
getSearchQueryStrategie(code) method. It consumes the returned value from step 3, and generates different
instances of SearchQueryStrategie subclasses based on code type using new operator and, then returns the same.
So, you need to call two methods getInstanceOfCodeType(text) and getSearchQueryStrategie(code) from anywhere.
Instead of implicitly implementing the factory inside main, keep the whole factory code separate, to make it easily maintainable and extensible .

How to parse a method or any other valid expression using JavaParser

JavaParser is a java source code parsing tool. I read the documents but found it only able to parse source of a full java class, like:
public class X {
public void show(String id) {
Question q = Quesiton.findById(id);
aaa.BBB.render(q);
}
}
But I want to parse only part of it, e.g. the method declaration:
public void show(String id) {
Question q = Quesiton.findById(id);
aaa.BBB.render(q);
}
How to do it or is it possible? If not, is there any other tool can do this?
Update
Actually I want to parse any valid expression in java, so I can do this easily with JavaParser:
CompilationUnit unit = JavaParser.parse(codeInputStream);
addField(unit, parseField("public String name")); // not possible now

I see you can include the method in a dummy class and parse it as usual. For the example you provided, enclose it inside:
public class DUMMY_CLASS_FOO {
// The string you're trying to parse (method or whatever)
}
Then you parse it as usual and neglect your dummy class while parsing.
UPDATE:
For the update you provided, you may try and catch
If the previous enclosure didn't do it, so enclose it into:
public class DUMMY_CLASS_FOO {
public void DUMMY_METHOD_FOO {
// Anything here
}
}
You might also search for Access Level Modifiers (public, private, ...etc), and if found, do the 1st solution. If not found, do the other one.

how to remove attribute from html tags in java

I wanted to remove the particular attribute from anchor tag:
<a id="nav-askquestion" style="cursor:default" href="/questions">
output:-
<a href="/questions">
through java program

we use htmlparser for this kind of job
you can parse and modify nodes with this untested snipplet:
NodeVisitor visitor = new NodeVisitor() {
public void visitTag(Tag tag) {
tag.removeAttribute("id");
tag.removeAttribute("style");
}
};
Parser parser = new Parser(...);
parser.visitAllNodesWith(visitor);

This little snippet will do the trick.
Ask me if you need some questions about the Regex
public class test {
public static void main(String[] args) {
String htmlFragment ="<a id=\"nav-askquestion\" style=\"cursor:default\" href=\"/questions\">";
String attributesToRemove = "id|style";
System.out.println(htmlFragment);
System.out.println(cleanHtmlFragment(htmlFragment, attributesToRemove));
}
private static String cleanHtmlFragment(String htmlFragment, String attributesToRemove) {
return htmlFragment.replaceAll("\\s+(?:" + attributesToRemove + ")\\s*=\\s*\"[^\"]*\"","");
}
}

People might suggest to use regex, but beware, you can use an XML Parser.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.