Java - Abstract Syntax Tree with grammar

Java - Abstract Syntax Tree with grammar - java

i am building a simple grammar parser, with regex. It works but now i want to add Abstract Syntax Tree. But i still dont understand how to set it up. i included the parser.
The parser gets a string and tokeniaze it with the lexer.
The tokens include the value and a type.
Any idea how to setup nodes to build a AST?
public class Parser {
lexer lex;
Hashtable<String, Integer> data = new Hashtable<String, Integer>();
public Parser( String str){
ArrayList<Token> token = new ArrayList<Token>();
String[] strpatt = { "[0-9]*\\.[0-9]+", //0
"[a-zA-Z_][a-zA-Z0-9_]*",//1
"[0-9]+",//2
"\\+",//3
"\\-",//4
"\\*",//5
"\\/",//6
"\\=",// 7
"\\)",// 8
"\\("//9
};
lex = new lexer(strpatt, "[\\ \t\r\n]+");
lex.set_data(str);
}
public int peek() {
//System.out.println(lex.peek().type);
return lex.peek().type;
}
public boolean peek( String[] regex) {
return lex.peek(regex);
}
public void set_data( String s) {
lex.set_data(s);
}
public Token scan() {
return lex.scan();
}
public int goal() {
int ret = 0;
while(peek() != -1) {
ret = expr();
}
return ret;
}
}

Currently, you are simply evaluating as you parse:
ret = ret * term()
The easiest way to think of an AST is that it is just a different kind of evaluation. Instead of producing a numeric result from numeric sub-computations, as above, you produce a description of the computation from descriptions of the sub-computations. The description is represented as small structure which contains the essential information:
ret = BuildProductNode(ret, term());
Or, perhaps
ret = BuildBinaryNode(Product, ret, term());
It's a tree because the Node objects which are being passed around refer to other Node objects without there ever being a cycle or a node with two different parents.
Clearly there are a lot of details missing from the above, particularly the precise nature of the Node object. But it's a rough outline.

Related

Sonarqube: How to get the expression string when writing custom java rules?

The target class is:
class Example{
public void m(){
System.out.println("Hello" + 1);
}
}
I want to get the full string of MethodInvocation "System.out.println("Hello" + 1)" for some regex check. How to write?
public class Rule extends BaseTreeVisitor implements JavaFileScanner {
#Override
public void visitMethodInvocation(MethodInvocationTree tree) {
//get the string of MethodInvocation
//some regex check
super.visitMethodInvocation(tree);
}
}
I wrote some code inspection rules using eclipse jdt and idea psi whose expression tree node has these attributes. I wonder why sonar's just has first and last token instead.
Thanks!

An old question, but I have a solution.
This works for any sort of tree.
#Override
public void visitMethodInvocation(MethodInvocationTree tree) {
int firstLine = tree.firstToken().line();
int lastLine = tree.lastToken().line();
String rawText = getRelevantLines(firstLine, lastLine);
// do your thing here with rawText
}
private String getRelevantLines(int startLine, int endLine) {
StringBuilder builder = new StringBuilder();
context.getFileLines().subList(startLine, endLine).forEach(builder::append);
return builder.toString();
}
If you want to refine further, you can also use firstToken().column or perhaps use the method name in your regex.
If you want more lines/bigger scope, just use the parent of that tree tree.parent()
This will also handle cases where the expression/params/etc span multiple lines.
There might be a better way... but I don't know of any other way. May update if I figure out something better.

Most efficient way to convert Enum values into comma seperated String

I have a java class in which I store an Enum.(shown at the bottom of this question) In this enum, I have a method named toCommaSeperatedString() who returns a comma separated String of the enums values. I am using a StringBuilder after reading some information on performance in this question here.
Is the way I am converting this enum's values into a commaSeperatedString the most efficient way of doing so, and if so, what would be the most efficient way to remove the extra comma at the last char of the String?
For example, my method returns 123, 456, however I would prefer 123, 456. If I wanted to return PROPERTY1, PROPERTY2 I could easily use Apache Commons library StringUtils.join(), however, I need to get one level lower by calling the getValue method when I am iterating through the String array.
public class TypeEnum {
public enum validTypes {
PROPERTY1("123"),
PROPERTY2("456");
private String value;
validTypes(String value) {
this.value = value;
}
public String getValue() {
return value;
}
public static boolean contains(String type) {
for (validTypes msgType : validTypes.values()) {
if (msgType.value.equals(type)) {
return true;
}
}
return false;
}
public static String toCommaSeperatedString() {
StringBuilder commaSeperatedValidMsgTypes = new StringBuilder();
for(validTypes msgType : validTypes.values()) {
commaSeperatedValidMsgTypes.append(msgType.getValue() + ", ");
}
return commaSeperatedValidMsgTypes.toString();
}
}
}

I wouldn't worry much about efficiency. It's simple enough to do this that it will be fast, provided you don't do it in a crazy way. If this is the most significant performance bottleneck in your code, I would be amazed.
I'd do it something like this:
return Arrays.stream(TypeEnum.values())
.map(t -> t.value)
.collect(Collectors.joining(','));
Cache it if you want; but that's probably not going to make a huge difference.

A common pattern for the trailing comma problem I see is something like
String[] values = {"A", "B", "C"};
boolean is_first = true;
StringBuilder commaSeperatedValidMsgTypes = new StringBuilder();
for(String value : values){
if(is_first){
is_first = false;
}
else{
commaSeperatedValidMsgTypes.append(',');
}
commaSeperatedValidMsgTypes.append(value);
}
System.out.println(commaSeperatedValidMsgTypes.toString());
which results in
A,B,C
Combining this with the answers about using a static block to initialize a static final field will probably give the best performance.

The most efficient code is code that doesn't run. This answer can't ever change, so run that code as you have it once when creating the enums. Take the hit once, return the calculated answer every other time somebody asks for it. The savings in doing that would be far greater in the long term over worrying about how specifically to construct the string, so use whatever is clearest to you (write code for humans to read).
For example:
public enum ValidTypes {
PROPERTY1("123"),
PROPERTY2("345");
private final static String asString = calculateString();
private final String value;
private static String calculateString() {
return // Do your work here.
}
ValidTypes(final String value) {
this.value = value;
}
public static String toCommaSeparatedString() {
return asString;
}
}

If you have to call this static method thousand and thousand of times on a short period, you may worry about performance and you should first check that this has a performance cost.
The JVM performs at runtime many optimizations.
So finally you could write more complex code without added value.
Anyway, the actual thing that you should do is storing the String returned by toCommaSeperatedString and returned the same instance.
Enum are constant values. So caching them is not a problem.
You could use a static initializer that values a static String field.
About the , character, just remove it after the loop.
public enum validTypes {
PROPERTY1("123"), PROPERTY2("456");
private static String valueSeparatedByComma;
static {
StringBuilder commaSeperatedValidMsgTypes = new StringBuilder();
for (validTypes msgType : validTypes.values()) {
commaSeperatedValidMsgTypes.append(msgType.getValue());
commaSeperatedValidMsgTypes.append(",");
}
commaSeperatedValidMsgTypes.deleteCharAt
(commaSeperatedValidMsgTypes.length()-1);
valueSeparatedByComma = commaSeperatedValidMsgTypes.toString();
}
public static String getvalueSeparatedByComma() {
return valueSeparatedByComma;
}

I usually add a static method on the enum class itself:
public enum Animal {
CAT, DOG, LION;
public static String possibleValues() {
return Arrays.stream(Animal.values())
.map(Enum::toString)
.collect(Collectors.joining(","));
}
}
So I can use it like String possibleValues = Animal.possibleValues();

Collect HashSet / Java 8 / Regex Pattern / Stream API

Recently I change version of the JDK 8 instead 7 of my project and now I overwrite some code snippets using new features that came with Java 8.
final Matcher mtr = Pattern.compile(regex).matcher(input);
HashSet<String> set = new HashSet<String>() {{
while (mtr.find()) add(mtr.group().toLowerCase());
}};
How I can write this code using Stream API ?

A Matcher-based spliterator implementation can be quite simple if you reuse the JDK-provided Spliterators.AbstractSpliterator:
public class MatcherSpliterator extends AbstractSpliterator<String[]>
{
private final Matcher m;
public MatcherSpliterator(Matcher m) {
super(Long.MAX_VALUE, ORDERED | NONNULL | IMMUTABLE);
this.m = m;
}
#Override public boolean tryAdvance(Consumer<? super String[]> action) {
if (!m.find()) return false;
final String[] groups = new String[m.groupCount()+1];
for (int i = 0; i <= m.groupCount(); i++) groups[i] = m.group(i);
action.accept(groups);
return true;
}
}
Note that the spliterator provides all matcher groups, not just the full match. Also note that this spliterator supports parallelism because AbstractSpliterator implements a splitting policy.
Typically you will use a convenience stream factory:
public static Stream<String[]> matcherStream(Matcher m) {
return StreamSupport.stream(new MatcherSpliterator(m), false);
}
This gives you a powerful basis to concisely write all kinds of complex regex-oriented logic, for example:
private static final Pattern emailRegex = Pattern.compile("([^,]+?)#([^,]+)");
public static void main(String[] args) {
final String emails = "kid#gmail.com, stray#yahoo.com, miks#tijuana.com";
System.out.println("User has e-mail accounts on these domains: " +
matcherStream(emailRegex.matcher(emails))
.map(gs->gs[2])
.collect(joining(", ")));
}
Which prints
User has e-mail accounts on these domains: gmail.com, yahoo.com, tijuana.com
For completeness, your code will be rewritten as
Set<String> set = matcherStream(mtr).map(gs->gs[0].toLowerCase()).collect(toSet());

Marko's answer demonstrates how to get matches into a stream using a Spliterator. Well done, give that man a big +1! Seriously, make sure you upvote his answer before you even consider upvoting this one, since this one is entirely derivative of his.
I have only a small bit to add to Marko's answer, which is that instead of representing the matches as an array of strings (with each array element representing a match group), the matches are better represented as a MatchResult which is a type invented for this purpose. Thus the result would be a Stream<MatchResult> instead of Stream<String[]>. The code gets a little simpler, too. The tryAdvance code would be
if (m.find()) {
action.accept(m.toMatchResult());
return true;
} else {
return false;
}
The map call in his email-matching example would change to
.map(mr -> mr.group(2))
and the OP's example would be rewritten as
Set<String> set = matcherStream(mtr)
.map(mr -> mr.group(0).toLowerCase())
.collect(toSet());
Using MatchResult gives a bit more flexibility in that it also provides offsets of match groups within the string, which could be useful for certain applications.

I don't think you can turn this into a Stream without writing your own Spliterator, but, I don't know why you would want to.
Matcher.find() is a state changing operation on the Matcher object so running each find() in a parallel stream would produce inconsistent results. Running the stream in serial wouldn't have better performance that the Java 7 equivalent and would be harder to understand.

What about Pattern.splitAsStream ?
Stream<String> stream = Pattern.compile(regex).splitAsStream(input);
and then a collector to get a set.
Set<String> set = stream.map(String::toLowerCase).collect(Collectors.toSet());

What about
public class MakeItSimple {
public static void main(String[] args) throws FileNotFoundException {
Scanner s = new Scanner(new File("C:\\Users\\Admin\\Desktop\\TextFiles\\Emails.txt"));
HashSet<String> set = new HashSet<>();
while ( s.hasNext()) {
String r = s.next();
if (r.matches("([^,]+?)#([^,]+)")) {
set.add(r);
}
}
set.stream().map( x -> x.toUpperCase()).forEach(x -> print(x));
s.close();
}
}

Here is the implementation using Spliterator interface.
// To get the required set
Set<String> result = (StreamSupport.stream(new MatcherGroupIterator(pattern,input ),false))
.map( s -> s.toLowerCase() )
.collect(Collectors.toSet());
...
private static class MatcherGroupIterator implements Spliterator<String> {
private final Matcher matcher;
public MatcherGroupIterator(Pattern p, String s) {
matcher = p.matcher(s);
}
#Override
public boolean tryAdvance(Consumer<? super String> action) {
if (!matcher.find()){
return false;
}
action.accept(matcher.group());
return true;
}
#Override
public Spliterator<String> trySplit() {
return null;
}
#Override
public long estimateSize() {
return Long.MAX_VALUE;
}
#Override
public int characteristics() {
return Spliterator.NONNULL;
}
}

Cyclic string replacements

Say I had the string "foo1bar2" and I wanted to replace to perform the following replacements in parallel with an expected output of "bar1foo2".
foo => bar
bar => foo
The string cannot be tokenized as the substrings might occur anywhere, any number of times.
A naive approach would to be to replace like this, however it would fail as the 2nd replacement would undo the first.
String output = input.replace("foo", "bar").replace("bar", "foo");
=> foo1foo2
or
String output = input.replace("bar", "foo").replace("foo", "bar");
=> bar1bar2
I'm not sure regex can help me here either? This isn't homework by the way, just geeky interest. I've tried googling this but unsure how to describe the problem.

Try first replacing "foo" with something else that won't occur anywhere else in the String. Then replace "bar" with "foo" then replace the temporary replacement from step 1 with "bar".

I actually like Code-Guru's answer better, but since you said it's just a curiosity, here's a recursive solution. The idea is to isolate just the piece of the string that you are replacing and recurse on the rest so we don't accidentally replace something that we already did. Now if two of your rules have a common prefix, you may have to do some ordering of your rules to get the desired results, but here goes:
public class ParallelReplace
{
public String replace(String s, Rule... rules)
{
return runRule(s, 0, rules);
}
private String runRule(String s, int curRule, Rule... rules)
{
if (curRule == rules.length)
{
return s;
}
else
{
Rule r = rules[curRule];
int index = s.indexOf(r.lhs);
if (index != -1)
{
return runRule(s.substring(0, index), curRule + 1, rules) + r.rhs
+ runRule(s.substring(index + r.rhs.length()), curRule + 1, rules);
}
else
{
return runRule(s, curRule + 1, rules);
}
}
}
public static class Rule
{
public String lhs;
public String rhs;
public Rule(String lhs, String rhs)
{
this.lhs = lhs;
this.rhs = rhs;
}
}
public static void main(String[] args)
{
String s = "foo1bar2";
ParallelReplace pr = new ParallelReplace();
System.out.println(pr.replace(s, new Rule("foo", "bar"), new Rule("bar", "foo")));
}
}

Looking to associate strings to ints in a cleaner/more efficient way

How can I improve this?
The relationship is one to one and continuous on [-1,5] so i was thinking of using enum, but I'm not sure how to compare a string value to an enum value.
If there is any better way to do this, please suggest.
Thanks!
private int evaluateWord(String sval) {
if (sval.equals("program"))
return 1;
else if (sval.equals("begin"))
return 2;
else if (sval.equals("end"))
return 3;
else if (sval.equals("int"))
return 4;
else if (sval.equals("if"))
return 5;
else
System.exit(0);

Have you considered stuffing the mapping into a HashMap once, and then just querying the map?
For example, something like this:
private static final Map<String,Integer> m_map = new HashMap<String,Integer>();
static {
m_map.put( "program", 1 );
m_map.put( "begin", 2 );
m_map.put( "end", 3 );
m_map.put( "int", 4 );
m_map.put( "if", 5 );
}
private int evaluateWord(String sval) {
Integer value = m_map.get( sval );
if ( null != value ) {
return value;
}
else {
System.exit(0);
}
}
By the way, it looks as if you're writing a parser. It can be reasonable to write a parser by hand. Another option to consider, unless you have a good reason to write it by hand, is a parser generator like ANTLR.

Using an enumeration:
enum Word {
PROGRAM(1,"program"),
BEGIN(2,"begin"),
END(3,"end"),
INT(4,"int"),
IF(5,"if");
private final int value;
private final String representation;
Word(int value, String representation)
{
this.value = value;
this.representation = representation;
}
public int value()
{ return value; }
private static Map<String, Word> fromRep =
new HashMap<String, EnumExample2.Word>();
public static Word fromRepresentation(String rep) {
if (!validRep(rep)) {
throw new IllegalArgumentException("No rep: "+rep);
}
return fromRep.get(rep);
}
public static boolean validRep(String rep)
{ return fromRep.get(rep) != null; }
static {
for (Word word : Word.values()) {
fromRep.put(word.representation, word);
}
}
}
Then your logic is:
private int evaluateWord(String sval) {
if (!Word.validRep(sval)) {
System.exit(0);
}
return Word.fromRepresentation(sval).value();
}

A hashmap could work:
private static HashMap<String, Integer> lookup = new HashMap<String, Integer>();
static {
lookup.put("program", 1);
lookup.put("being", 2);
lookup.put("end", 3);
lookup.put("int", 4);
lookup.put("if", 5);
}
private int evaluateWord(String sval) {
if ( lookup.containsKey(sval) ) {
return lookup.get(sval);
}
System.exit(0);
}

This is what a map is for;
Create a HashMap, add key and values to the map like
wordMap.put("program", Integer.valueOf(1));
....
then, to get the value do
Integer val = wordMap.get(sval);

Honestly, I wouldn't worry about keeping something like this ultra efficient, but there is a change you could make. If the word you pass is the last word you check for then your program ends up performing all of the checks in your function. This shouldn't be a problem in this case, but generally you don't want to flood your program with if statements, especially if you have a lot of cases.
Use a hashtable and just insert pairs. This way, all of your evaluateWord calls will return in amortized constant time. :)
Good luck!

Why do you need a (very subjective) "cleaner" way?
You could get more efficiency from using a hash lookup but you'd want to be certain it's called quite a bit to make the extra coding effort worthwhile. If it's something that happens infrequently (and, by that, I mean something like less than once a second), it's not worth doing (YAGNI).
One thing you might want to do for better looking code (if that's important) is to ditch the else bits, they're totally unnecessary:
private int evaluateWord(String sval) {
if (sval.equals("program")) return 1;
if (sval.equals("begin")) return 2;
if (sval.equals("end")) return 3;
if (sval.equals("int")) return 4;
if (sval.equals("if")) return 5;
System.exit(0);
}

You could just use an array or hashmap to map the enum values to the string values.

Inspired by your enum comment, I present the following. It's a bit hackish, but:
enum Word
{
PROGRAM (1), BEGIN (2), END (3), INT (4), IF (5);
public int value;
public Word (int value)
{
this.value = value;
}
};
int evaluateWord (String word)
{
return Word.valueOf(word.toUpperCase( )).value;
}
I love Java enums because you can do things like this. This is especially useful if you later want to (for example) add a unique behaviour for each word, or to maintain a long list of words. Note though that it is case insensitive.
Or, alternately:
enum Word
{
PROGRAM, BEGIN, END, INT, IF;
};
int evaluateWord (String word)
{
return Word.valueOf(word.toUpperCase( )).ordinal( ) + 1;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - Abstract Syntax Tree with grammar - java

Related

Sonarqube: How to get the expression string when writing custom java rules?

Most efficient way to convert Enum values into comma seperated String

Collect HashSet / Java 8 / Regex Pattern / Stream API

Cyclic string replacements

Looking to associate strings to ints in a cleaner/more efficient way

Categories

Resources