grammar compiler compiler for Java

grammar compiler compiler for Java - java

My company is trying to write some software for Android. We would like to work with Java, and there is a component of the company's software that is c++ and so needs to be ported (or at least porting needs to be tried before trying NDK stuff). This code was created using Accent, and it defines a grammar grammar. As near as I can tell, the original writer (now gone) wrote a grammar to specify how to specify a grammar, then compiled a compiler-compiler with that grammar and Accent. The compiler-compiler takes a grammar of the specified format and produces a binary code to parse strings conforming to that grammar. Here's an example snippet of the grammar:
//include rules from from this file (such as <alpha>)
include "alphabet.bnf"
<<topSymbol>> = <alpha> <alpha> <alpha>? .//two letters with an optional third
//square brackets enclose an XML statement clarifying semantics of the rule
[
<topSymbol>
<letter>
<command val="doSomethingToLetter"/>
</letter>
<!--etc.-->
</topSymbol>
]
My question is how to do this with Java, using Antlr or some other tool. A compiler-compiler-compiler seems rather complicated to me. Alternatively, I would like to know how to easily compile/parse this type of grammar, which contains a grammatical and semantic XML information.

If the original designer knew what he was doing, and it is warranted, then you want to preserve that concept. Going with another parser generator (or at least a parsing scheme of some kind) is the right approach. Either JavaCC or ANTLR would be fine as parser generators; you'll have to hand-translate the grammar. You might hand code a recursive descent parser if the grammar is simple enough.
If the original designer was simply over the top, then you can probably replace the grammar-driven aspect, but you won't be able to do that without understanding what he was achieving. The fact that this "seems rather complicated to me" suggests you don't really understand parsing/parser generator technology, and you are driven by a desire to do something you understand than preserve something you don't. But its a bad idea to tear apart something that is well designed/implemented just because you don't understand it. I strongly suggest you learn more about these kinds of technologies, and ask why was it implemented this way? Ultimately you may be right and should replace his approach by something else, but make that choice based on knowledge, not fear.

My question is how to do this with Java, using Antlr or some other tool. A compiler-compiler-compiler seems rather complicated to me.
It sounds complicated to me too!
Alternatively, I would like to know how to easily compile/parse this type of grammar, which contains a grammatical and semantic XML information.
No ... there is no easy answer to this. It sounds like your ex-colleague has gone over the top on the complexity front. You are going to have to:
either get your head around what his code does, and how it does it, learn how Antlr works, and hand translate,
or ditch his code AND design and find a simpler way to do what it is doing.
Good luck!
(Actually, there is a good chance that the code is not as complicated as it seems ... once you get your head around it, and compiler-compiler technology.)

Your best bet is to translate the grammar you have into ANTLR or Java CC or some other tool.
Another possibility is to call your C++ code using JNI, but that's fraught with peril.
I'm not aware of anything that can help. You'll just have to get a shovel and start digging.

Related

PHP to Java (using PtoJ)

I would like to transition our codebase from poorly written PHP code to poorly written Java, since I believe Java code is easier to tidy up. What are the pros and cons, and for those who have done it yourselves, would you recommend PtoJ for a project of about 300k ugly lines of code? Tips and tricks are most welcome; thanks!

Poorly written PHP is likely to be very hard to convert because a lot of the bad stuff in PHP just doesn't exist in Java (the same is true vice versa though, so don't take that as me saying Java is better - I'm going to keep well clear of that flame-war).
If you're talking about a legacy PHP app, then its highly likely that your code contains a lot of procedural code and inline HTML, neither of which are going to convert well to Java.
If you're really unlucky, you'll have things like eval() statements, dynamic variable names (using $$ syntax), looped include() statements, reliance on the 'register_globals' flag, and worse. That kind of stuff will completely thwart any conversion attempt.
Your other major problem is that debugging the result after the conversion is going to be hell, even if you have beautiful code to start with. If you want to avoid regressions, you will basically need to go through the entire code base on both sides with a fine comb.
The only time you're going to get a satisfactory result from an automated conversion of this type is if you start with a reasonably tide code base, written at least mainly in up-to-date OOP code.
In my opinion, you'd be better off doing the refacting excersise before the conversion. But of course, given your question, that would rather defeat the point. Therefore my recommendation is to stick it in PHP. PHP code can be very good, and even bad PHP can be polished up with a bit of refactoring.
[EDIT]
In answer to #Jonas's question in the comments, 'what is the best way to refactor horrible PHP code?'
It really depends on the nature of the code. A large monolithic block of code (which describes a lot of the bad PHP I've seen) can be very hard (if not imposible) to implementunit tests for. You may find that functional tests are the only kind of tests you can write on the old code base. These would use Selenium or similar tools to run the code through the browser as if it were a user. If you can get a set of reliable functional tests written, it is good for helping you remain confident that you aren't introducing regressions.
The good news is that it can be very easy - and satisfying - to rip apart bad code and rebuild it.
The way I've approached it in the past is to take a two-stage approach.
Stage one rewrites the monolithic code into decent quality procedural code. This is relatively easy, and the new code can be dropped into place as you go. This is where the bulk of the work happens, but you'll still end up with procedural code. Just better procedural code.
Stage two: once you've got a critical mass of reasonable quality procedural code, you can then refactor it again into an OOP model. This has to wait until later, because it is typically quite hard to convert old bad quality PHP straight into a set of objects. It also has to be done in fairly large chunks because you'll be moving large amounts of code into objects all at once. But if you did a good job in stage one, then stage two should be fairly straightforward.
When you've got it into objects, then you can start seriously thinking about unit tests.

I would say that automatic conversion from PHP to Java have the following:
pros:
quick and dirty, possibly making happy some project manager concerned with short-time delivery (assuming that you're lucky and the automatically generated code works without too much debugging, which I doubt)
cons:
ugly code: I doubt that automatic conversion from ugly PHP will generate anything but ugly Java
unmaintainable code: the automatically generate code is likely to be unmaintainable, or, at least, very difficult to maintain
bad approach: I assume you have a PHP Web application; in this case, I think that the automatic translation is unlikely to use Java best practices for Web application, or available frameworks
In summary
I would avoid automatic translation from PHP to Java, and I woudl at least consider rewriting the application from the ground up using Java. Especially if you have a Web application, choose a good Java framework for webapps, do a careful design, and proceed with an incremental implementation (one feature of your original PHP webapp at a time). With this approach, you'll end up with cleaner code that is easier to maintain and evolve ... and you may find out that the required time is not that bigger that what you'd need to clean/debug automatically generated code :)

P2J appears to be offline now, but I've written a proof-of-concept that converts a subset of PHP into Java. It uses the transpiler library for SWI-Prolog:
:- use_module(library(transpiler)).
:- set_prolog_flag(double_quotes,chars).
:- initialization(main).
main :-
Input = "function add($a,$b){ print $a.$b; return $a.$b;} function squared($a){ return $a*$a; } function add_exclamation_point($parameter){return $parameter.\"!\";}",
translate(Input,'php','java',X),
atom_chars(Y,X),
writeln(Y).
This is the program's output:
public static String add(String a,String b){
System.out.println(a+b);
return a+b;
}
public static int squared(int a){
return a*a;
}
public static String add_exclamation_point(String parameter){
return parameter+"!";
}

In contrast to other answers here, I would agree with your strategy to convert "PHP code to poorly written Java, since I believe Java code is easier to tidy up", but you need to make sure the tool that you are using doesn't introduce more bugs than you can handle.
An optimum stategy would be:
1) Do automated conversion
2) Get an MVP running with some basic tests
3) Start using the amazing Eclipse/IntelliJ refractoring tool to make the code more readable.
A modern Java IDE can refactor code with zero bugs when done properly. It can also tell you which functions are never called and a lot of other inspections.
I don't know how "PtoJ" was, since their website has vanished, but you ideally want something that doesn't just translate the syntax, but the logic. I used php2java.com recently and it worked very well. I've also used various "syntax" converters (not just for PHP to Java, but also ObjC -> Swift, Java -> Swift), and even they work just fine if you put in the time to make things work after.
Also, found this interesting blog entry about what might have happened to numiton PtoJ (http://www.runtimeconverter.com/single-post/2017/11/14/What-happened-to-numition).

http://www.numiton.com/products/ntile-ptoj/translation-samples/web-and-db-access/mysql.html
Would you rather not use Hibernate ?

Evaluating creation of GUI via file vs coding

I'm working on a utility that will be used to test the project I'm currently working on. What the utility will do is allow user to provide various inputs and it will sends out requests and provide the response as output.
However, at this point the exact format (which input is required and what is optional) has yet to be fleshed out. In addition, coding in Swing is somewhat repetitive since the overall work is simple though this should be the safest route to go as I have more or less full control and every component can be tweaked as I want. I'm considering using a configuration file that's in XML to describe the GUI (at least one part of it) and then coding the event handling part (in addition to validation, etc). The GUI itself shouldn't be too complicated. For each type of request to make there's a tab for the request and within each tab are various inputs.
There seems to be quite a few questions about this already but I'm not asking for a 3rd party library to do this. I'm looking to do this myself, since I don't think it'll be too overly complicated (hopefully). My main consideration for using this is re-usability (later on, for other projects) and for simplifying the GUI work. My question is: are there other pros/cons that I'm overlooking? Is it worth the (unknown) time to do this?
I've built GUI in VB.NET and with Flex3 before.

XML is so 2000. It's code, put it in real source files. If it really is so simple that it could be XML, all you are doing is removing the XML handling step and using a clearer syntax. If it turns out to be a little more complicated than you first expected, then you have the full power of your favourite programming language to hand.

In my experience, if your layout really is simple, something like the non-visual builders in FormLayout can lead to really concise code with a minimum of repetition.

If you have to specify the precise location of every control you might look at a declarative swing helper toolkit that can minimize boilerplate and simplify layout. Groovy supports this as does JavaFX, and both are simple library extensions to Java (give or take).
If the form is laid out in a pattern, using a definition file in a format like XML or YAML will work. I've done that and have even set up data bindings in that file so that you don't even have to deal with listeners or initial values...
If you are sure you want XML, I'd seriously consider YAML though, it's really close but instead of:
<outer>
<inner a=1> abc </inner>
</outer>
I think it's a lot more like:
outer
inner a=1
abc
(I may have that a bit wrong, but that's close I think. Anyway, you should never force anyone to type XML--if you are set on XML, provide a GUI with which to edit it!)

xml.etree.ElementTree equivalent in Java

I've been doing quite a bit of simple XML-processing in python and grown to like the ElementTree way of doing things.
Is there something similar and as easy to use in Java? I find the DOM model a bit cumbersome and find myself writing much more code than I would like to do simple things.
Or am I asking the wrong thing?
Maybe my question is: Is there a better option than the "XMLUtils" classes I see people implementing in some places to simplify their code when dealing with DOM?
Adding a litte bit here about why I like ElementTree since the question was asked.
Simplicity (I guess anything seems simple after working with DOM though)
Feels like a natural fit in python
Requires very little code on my part.
I'm trying to come up with a simple code example to illustrate, but it's sort of hard to give a good example.
Here's an attempt though. This just adds a tag with a value and an attribute to an existing xml string.
from xml.etree.ElementTree import *
xml_string = '<top><sub a="x"></sub></top>'
parsed = fromstring(xmlstring)
se = SubElement(parsed, "tag")
se.text = "value"
se.attrib["a"] = "x"
new_xml_string = tostring(parsed)
After that, the new_xml_string is
<top><sub a="x" /><tag a="x">value</tag></top>
Not an example that really covers everything, but still. There's also the fairly simple looping over tags when you want to do stuff, easy testing for presence of tags and attributes and other things.

To be honest, all XML APIs in Java suck, you just can vary the level of suckage you push yourself into which may turn horrible/slow to manageable/decent to even suprisingly OK at times.
This all mostly stems from the fact that Java APIs try to be as W3C DOM compliant as possible, in fact Xerces (Java's current native XML solution) prides itself on being compliant to a whole bunch of XML related W3C specifications as you can see from their front page.
The actual Xerces API is very unpleasant to work with, though, and because of that multiple other Java XML libraries have popped out over the years. Currently most popular ones are
JDOM, simplifies DOM operations a lot and do I dare to say even pleasant at times, works like a charm when mixed with Jaxen - well, unless you hit this problem with namespaces.
XOM which has a wonderful presentation about what's wrong with Java's XML right now and how they propose their way of doing things as a solution. In part it is actually better than JDOM, but it's not widespread enough yet so can't really say how it behaves in the real world out there. Definitely worth a check though.
dom4j, well-rounded library, supports all kinds of important features and plays out as a down-to-earth solution for XML. dom4j is basically the "old, proven and reliable" option of the popular ones.
Last but definitely not least I just have to mention StAX just because it's different, it's actually event-driven streaming API for XML. Definitely worth a look just out of curiosity.
PS. I'm currently actually writing my own XML parser/navigator as an exercise but haven't decided on what kind of API it will have. I'm really aiming for ease of use which seems to be quite rare in Java XML APIs so far, but I'm not entirely sure what kind of API I am going to provide. Python's ElementTree seems interesting, but since I'm not entirely familiar with it, would you like to maybe give a short summary on what exactly in it you find enjoyable?

You might look into the following alternatives:
dom4j
xom
jdom
Since I never used ElementTree I don't know wich one is the closest.
If you can use Groovy inside your project, it offers a set of classes that helps a lot when processing XML.

We find XOM (http://www.xom.nu) to provide simple subclassable Element functionality.

It is true the Java XML APIs are not the greatest in terms of usability. My prefered options would be XOM, JDOM then the built in JAXP in that order. There were some rumbling about native XML in the language (Begin Product Tab Sub Links
Integrating XML into the Java Programming Language) as a new data-type but that seems to have stalled.

Syntax analysis question

In school we were assigned to design a language and then to implement it, (I'm having so much fun implementing it =)). My teacher told us to use yacc/lex, but i decided to go with java + regex API, here is how the the language I designed looks:
Program "my program"
var yourName = read()
if { equals("guy1" to yourName) }
print("hello my friend")
else
print("hello extranger")
end
Program End
Well, as you can see, its a pretty basic language =).
I thought I could implement it in a very OOP fashion, like make an abstract class Sentence and then have subclasses like VariableAssignment, IfSentence etc. and have a class Program which is only a bunch of sentences right? And then call an abstract method eval on all Sentences, so my initial approach to complie the language consisted only of two phases:
Identify syntax of seach line
Create the correspondig class for each line
of course, if something goes wrong on any phase Ii could raise an error.
My question is, am I doing it wrong? Should I go over all phases like the theory says (lexical, syntactical, semantical)? Should I continue with my naive two-phase compiler?

I won't ask the obvious question of why you're not following the advice of your instructor and using yacc/lex because I know the answer. You wanted to go off and do something that you thought was cool and would help you learn. Unfortunately, that approach was recommended by your professor because as another posted stated, a lot of very smart people before you have explored multiple approaches and spent vast quantities of time trying to find a good solution.
You can make a two-phase compiler work, but you will need to accept that it will never be as good as going through the full process because it's harder to detect errors. A lot harder in fact. In some cases, you won't even be able to tell that there's an error until it's too late. ie: already compiled and attempting to run.
If you want to learn a lot more about it, go with the two phase approach and you will run into the same problems that the people before you ran into. Just be sure to understand that it will take you a lot longer to get to a final solution, you might be docked points on your project, and it might not work right.
That said, you're going to learn more about it than anyone else in the class. If you have the time to spare, I'd do it the way you are now. The knowledge might come in handy down the road. I would also talk to your professor and tell him that you're going to do it another way against his recommendations because you want to have a more thorough understanding. Perhaps he won't knock points off from your project for being ambitious, even if it turns out wrong.
After all, the point of doing projects in college is to learn.

A lot of smart people thought about this, and from your post I take, they came to the conclusion that all the phases are needed.
So if you want your compiler to work, go the way the theory dictates.
If you want to understand, why it dictates the phases, try the short cut. It will probably take a lot longer.
Disclaimer: I have no idea about compiler theory
Another note: You have a problem; You decide to solve it using regexps; Now you have two problems

If you use regexes to parse each line your language would have a very limited syntax.
You would not be able to parse each line using just a regular expression API if your syntax becomes more complex. Even the if { equals("guy1" to yourName) } would become impossible to parse with regexes if you start adding AND and OR operators, and what would happen if you start supporting escape characters like \n in your string literals?
The Java Regex API would be able to help you with the lexical analyzer, but you would have to write the parser from there. You could take one of several approaches:
If you're using Java, you could look at Antlr (which negates the need for writing a lexicall analyzer with Java's regex library), or
You could write a recursive descent parser by hand
among others
(also, "Statement" is a synonym for "Sentence" that is more common in compiler texts)

If you want to use only regular expressions to parse your language, your language can only be regular. This is a big constriction, for example, arbitrarily deep nesting would be impossible, as you would have to teach your parser each nesting combination separately. I am not sure if building a Turing-complete regular language is even possible.

If u really want to dirty ur hands code a recursive descent parser. If you want to understand compiler theory use antlr and concentrate on the principles leaving the implementation for the parser generator.
BTW, why would wnat to complicate your life with regex?!

replacement project for existing school assignment

I have a school assignment which consists of programming a scanner/lexical analyzer for a specified simple language. The scanner has to be programmed in C++.
This type of assignment has been used since the 90's and, although still a valid excersise, I consider it to be a little antiquated and a little boring.
I have gotten permission to come up with a new programming assignment.
It has to be of equal difficulty and it can be in C++, Objective C or Java.
What direction should I go that has the same level of difficulty but is a little bit more modern and applicable to modern CS/life.
Thanks

This type of assignment... is considered to be a little antiquated and a little boring.
I'm curious: who considers this antiquated? Your professor? Somebody notable in the parsing community? Or you?
Scanners and parsers are still relevant to professional software development and, more importantly, relevant to the science of computation. If you wish to understand computers, then you should understand scanners and parsers.
Still, if you are convinced that you should do some other assignment, why not write a tool to generate a scanner in C++? You could supply, as input, a set of regular expressions that define the tokens of the grammar, and it would produce a C++ program that would recognize the input tokens. Then, you will never need to write a scanner ever again!

Why do you think that Lexers / Parsers are not relevant anymore? I find that I write something along those lines at least once a year.

As a software engineer, I would say whatever code you write during the CS courses would be the best ones that you may probably write in your life. Once you come into the industry, you will probably write only modules and not the entire thing.
Believe me. Once you come into the industry and has spend some time here, you will just want to write those compilers, assemblers, lexical analyzers. So please don't miss the chance.
I believe the challenges in writing this "boring" stuffs are just worth it and you will find them truly interesting once you start designing the stuff.

Writing a scanner/lexical analyzer was one of my favorite assignments. I would argue that it was also one of the most relevant. It is a real world problem.
My guess is that it does not feel modern because of the simple programming language you are scanning. I personally would change out the simple programming language for something like Markdown or Textile. Both of these are used in modern programming, and will teach you similar concepts.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.