I need to develop a c parser in order to extract the function names, macros and its definitions. my approach was not to start from scratch, just access any c program editors like geany which already parses the functions and macros ,
may be a simple api to this editor will get all my requirements, I have googled it, but most of the solutions are to use javacc, so some other parser ...
As this job is already done by editors, so it would be easy , not taking pain to start with building grammar.
this approach would be simple, but unable to find any such editors which have any apis to access it through java.
What you are looking for is existing parser-generator
You could see,
ANTLR
Lex
Yacc
JavaCC
I've already used lex, flex, yacc, bison etc. But nothing can beat Perl for doing it. Moreover Perl regular expressions can be used in Java, PHP.
At least use Perl like regular expression to get it done rather than writing in yacc is very difficult to maintain and which can easily be done in a few lines in Perl or PHP.
Another possibility could be to develop a GCC plugin or a MELT extension to customize the GCC compiler for your needs. (MELT is a domain specific language that I developed to easily extend GCC).
The advantage of customizing GCC for your purposes is that you'll work on the exact internal representations of GCC. However, GCC being complex, extending it requires some work (in particular, partly understanding the complex GCC internal representations and passes).
(It is possible for functions, variables and classes, perhaps not for macros today with GCC 4.7, since GCC preprocessor don't have yet any plugin hooks)
And I am not sure you are right in believing that geany has a complete C parser. I believe it has some regexpr based thing, which e.g. ignores any preprocessor tricks. I don't think that geany is aware of e.g. functions or variables created by expanding complex macros (like some GTK implementation macros for instance).
There are several IDE's or programmers-editors with C parsers, written in Java. So getting at them shouldn't be too horrific (famous last words :-)
Eclipse CDT which has several books on how to write and use plugins/extensions
NetBeans
to mention just two. They both have active user communities who might be able to help too.
Their C editors, have a pretty good grasp of C syntax because they can fold functions. Eclipse's C editor keeps track of definitions, and I think NetBeans does too.
Personally, if I needed to parse C to get function bodies, and the code is syntactically correct, it wouldn't be too hard to use parser-development-tools. IIRC ANTLR might have a C grammar already.
Related
I need to implement two tools for a single DSL: UI editor in Java and interpreter in C/C++. My first idea was to use ANTLR, since it can generate parsers for both Java and C/C++. But all ANTLR examples that I've seen contains some language-specific code or settings.
Is there any way to generate two parsers for a single DSL?
Does this even make sense to generate two parsers from a single grammar?
Is there any commonly used approaches for this problem?
bison can produce C++ and Java parsers, at least according to the documentation (I've never used the Java interface, and only used the C++ interface once, but I'm told that they work). The grammar will not be a problem, but the actions will be, particularly since you're presumably doing different things in the two parsers, and not just using different languages. But you should be able to make every action a simple $$ = method($1, $2, ...); statement.
bison doesn't use the C(++) preprocessor (and it couldn't really, because it's common to put preprocessor directives into the bison input files), but you could use some other macro system -- I hesitate to recommend m4 but it would work if you know how to use it -- or a shell script to assemble the different input files.
The other possibility would be to just create an AST in the parser. You could use any parser generator, including Antlr or bison, to build the AST parser in C or C++, and then wrap the result for use with JNI for Java. If you use Antlr, you can produce an AST generator with very little language-specific code, so with a simple macro processor, you could build native AST parsers in both C++ and Java, I think. But that depends on your language being fairly simple.
I don't know if there is a "commonly used approach" for this problem, but it's certainly a problem that comes up pretty regularly; lots of grammars get shared between different projects, but from what I've seen, the most common approach is to cut-and-paste the grammar, and rewrite the actions. I've done the macro approach a couple of times, and it can be made to work but it's never as elegant as you'd like it to be.
You can try yacc and jacc.
http://web.cecs.pdx.edu/~mpj/jacc/
http://dinosaur.compilertools.net/#yacc
They have very similar syntax, may be with help some
hand maid preprocessing tool you can use one source file.
PS
But why not write parser once in C++ and use it via JNI?
You certainly can use ANTLR. The language specific parts are actions or predicates. If you don't need them then you won't have any language specific stuff in the grammar. Btw. regardless of the parser generator you use (including yacc, bison etc.) you would always have language specific stuff in the grammar if you need that.
I work with a language similar to JavaScript that is used for point-of-sale device programming. This language really s*cks and I'm trying to build some kind of framework in Java that "converts" Java code into this language.
I did this using some Regex and parsed the Java files directly. Now I found that this may be not the right/better way and I'm searching for alternatives. Are there any tools for helping me doing so?
I thought I should use some advanced reflection utilities like ASM (http://asm.ow2.org/index.html). Performance is not crucial, so that may be the way.
What do you think?
ANTLR is a terrific parser-generator. I'd look into it. It has a Java grammar already available; I'm not sure if it's Java 5, 6, or 7 (I'm guessing it's 5).
Once you have the AST, your problem will be walking the tree and generating the target code. Good luck.
I suggest to parse Java syntax with JavaCC or similar tool, Java grammar description written long time ago. It can be used to write compiler so probably can also be used to write a converter. Regular expressions are not very good at parsing programming languages.
I've never done anything with it myself, but you could take a look at one of the framework listed at altjs.org, specifically under the Java Ports section, and take a look at one of those frameworks and modify them to your specific needs.
There are at least three ways:
a) Interpret the bytecode. There are some existing interpreters in JS, e.g. DoppioVM. They can be very slow.
b) Compile bytecode to JS. I've seen at least one such attempt and the resulting JS was ugly and not very fast. But this approach can have a good performance (well, it may result in using HashMap instead of JS object and so on). The biggest issue is IMHO while/if reconstruction.
EDIT: OK, is possibly is not so slow, but it is ugly and contains garbage like j2js.invokeStatic("j2js.client.Engine", "getEngine()j2js.client.Engine", null);. The one compiler was https://github.com/decatur/j2js-compiler .
c) Compile Java to JS. You can try Google Web Toolkit or http://j2s.sourceforge.net/ .
I am making some kind of sandbox-engine, the end-users are able to make scripts to make their world dynamic etc, currently I am only looking at LuaJava because I have quitte some experience with Lua and find it a very readable/easy language. But I also understand that it might be a bad idea to choose only based on personal preference, after all Lua is meant to be embedded into C so performance wont be the best I imagine.
But after a look at some of alternatives (Groovy, Clojure) I find the syntax just unreadable/too abstract, Lua was my first programming experience and even that was quite hard to 'get' at first, I'm afraid these languages would just have scared the crap out of me and I would never have looked at scripting again.
Are there scripting languages that can be embedded in Java that compete with Lua on simplicity?
Edit
My problem with JavaScript, JPython is all the braces etc, as a starting user symbols tend to look 'hard'. Also for python there is the concept of Object's that the user needs to comprehend and isn't that useful in this case.
func = function(arg)
print(arg)
end
Is so simple...
I think JavaScript is very simple, and it can be embedded in Java quite easily via Rhino. Scripts can be both pre-compiled, or compiled on-the-fly via the javax.script classes, which are used to connect to script engines for the Java platform (of which Rhino is one).
If you like Lua, though, there's a Lua for Java project called — *cough* — Kahlua. They list "Fast runtime of the most common operations" as one of their goals.
Edit: Re your edit, I'm not immediately seeing why this:
func = function(arg)
print(arg)
end
is substantially easier to understand than this:
func = function(arg) {
print(arg);
};
...which is the literal translation from Lua to JavaScript, pre-supposing a function called print exists on your platform. I would normally write that like this instead:
function func(arg) {
print(arg);
}
...but the other way is fine for most purposes.
But you should use what you're comfortable with.
You should try Jython and JRuby
I am asked to develop a software which should be able to create Flow chart/ Control Flow of the input Java source code. So I started researching on it and arrived at following solutions:
To create flow chart/control flow I have to recognize controlling statements and function calls made in the given source code Now I have two ways of recognizing:
Parse the Source code by writing my own grammars (A complex solution I think). I am thinking to use Antlr for this.
Read input source code files as text and search for the specific patterns (May become inefficient)
Am I right here? Or I am missing something very fundamental and simple? Which approach would take less time and do the work efficiently? Any other suggestions in this regard will be welcome too. Any other efficient approach would help because the input source code may span multiple files and can be fairly complex.
I am good in .NET languages but this is my first big project in Java. I have basic knowledge of Compiler Design so writing grammars should not be impossible for me.
Sorry If I am being unclear. Please ask for any clarifications.
I'd go with Antlr and use an existing Java grammar: https://github.com/antlr/grammars-v4
All tools handling Java code usually decide first whether they want to process the language Java or Java byte code files. That is a strategic decision and depends on your use case. I could image both for flow chart generation. When you have decided that question. There are already several frameworks or libraries, which could help you on that. For byte code engineering there are: ASM, JavaAssist, Soot, and BCEL, which seems to be dead. For Java language parsing and analyzing, there are: Polyglot, the eclipse compiler, and javac. All of these include a complete compiler frontend for Java and are open source.
I would try to avoid writing my own parser for Java. I did that once. Java has a rather complex grammar, but which can be found elsewhere. The real work begins with name and type resolution. And you would need both, if you want to generate graphs which cover more than one method body.
Eclipse has a library for parsing the source code and creating Abstract Syntax Tree from it which would let you extract what you want.
See here for a tutorial
http://www.vogella.de/articles/EclipseJDT/article.html
See here for api
http://help.eclipse.org/indigo/topic/org.eclipse.jdt.doc.isv/reference/api/org/eclipse/jdt/core/dom/package-summary.html#package_description
Now I have two ways of recognizing:
You have many more ways than that. JavaCC ships with a Java 1.5 grammar already built. I'm sure other parser generators ditto. There is no reason for you to either have to write your own grammar or construct your own parser.
And specifically 'read[ing] input source code files as text and search for the specific patterns' isn't a viable choice at all, as it isn't parsing, and therefore cannot possibly recognize Java programs correctly.
Your input files are written in Java, and the software should be written in Java, but this is your first project in Java? First of all, I'd suggest learning the language with smaller projects. Also you need to learn how to use graphics in Java (there are various libraries). Then, you should focus on what you want to show on your graphs. Or is text sufficient?
The way I would do it is to analyse compiled code. This would allow you to read jars without source and avoid parsing the code yourself. I would use Objectwebs ASM to read the class files.
Smarter solution is to use Eclipse's java parser. Read more here: http://www.ibm.com/developerworks/opensource/library/os-ast/
Or even more easy: Use reflection. You should be able to compile the sources, load the classes with java classloader and analyse them from there. I think this is far more easy than any parsing.
Our DMS Software Reengineering Toolkit is general purpose program analysis and transformation machinery, with built in capability for parsing, building ASTs, constructing symbol tables, extracting control and data flow, transforming the ASTs, prettyprinting ASTs back to text, etc.
DMS is parameterized by an explicit language definition, and has a large set of preexisting definitions.
DMS's Java Front End already computes control and data flow graphs, so your problem would be reduced to exporting them.
EDIT 7/19/2014: Now handles Java 8.
I am looking for Perl implementation in Java. Something like Jython is for Python.
I found PLJava but it needs both JVM and Perl compiler installed. I need something which does not need a Perl compiler.
I need to run some Perl code in a Java class.
UPDATES:
I figured out that PLJAVA is what I need. Does anybody know some tutorial?
Has anybody played with the Inline::Java module.
I also could not install Inline::Java.
Jython isn't fully compatible with CPython (or whatever you would rather call the original C++ Python interpreter), but wherever either differs from the language spec is a bug. Unfortunately, Perl 5 is much more complex and lacks any formal language specifications at all -- the language effectively being defined as "what does the perl executable do" -- so there exists no other implementation of the Perl 5 language aside from the Perl 5 interpreter. Unfortunate, but that's history. Perl 6 does have a language spec and multiple (incomplete) implementations, but that's not likely to be useful to you.
PLJava was an attempt to do exactly what you want, call Perl from Java. It does so via JNI (stuffing native code into Java) linking to libperl. However, it's not been updated since 2004 and I don't know how well it works.
Edit
I hadn't seen Inline::Java::PerlInterpreter before -- unfortunately it doesn't seem to work with my system Perl.
Update: There is another options that could be viable: Jerl. Jerl translates a micro-Perl interpreter into Java bycode using NestedVM. In this sense Jerl is almost a Java implementation of Perl. I haven't tested it though and it is reasonable to expect a loss in performances . Nonetheless it's a solution worth of investigation.
Jerl is hosted here: https://code.google.com/p/jerl/.
Sadly not, at least not a complete and usable one. Perl is a difficult language to port to other VMs mainly because of its highly dynamic nature and for historical reasons related to how the language has been developed over the years; theoretical issues about perl parsability are, in my humble opinion, of secondary importance. Perl does not have an a formal specification nor an official grammar: Perl implementation is Perl's own formal specification. This means that to write an alternative implementation of Perl one has to know intimately the internals of the current one and this is obviously a big barrier to the development of such a project. Here lies the real difficulty in porting Perl to other VMs. Additionaly, the dynamic nature of Perl poses other technical problems related to an efficient implementation on the Java virtual machine that is engineered to support statically typed languages. There have been some efforts like this one for example: http://www.ebb.org/perljvm/. A newer one is cited here: http://use.perl.org/~Ovid/journal/38837. Both were abandoned at one point or another not because of infeasability but only because the effort required was too big for a research/hobby project. A new interesting alternative that is proceeding steadly is language-P by Mattia Barbon: http://search.cpan.org/dist/Language-P/. It is an implementation of Perl on the NET clr. The implementation is still incomplete but I know that the man behind the project is a very persistent one and that the project has been going forward slowly but steadily. Maybe Perl on the CLR will come first. :D
If you are not going to use a Perl compiler, exactly what are you looking for?
What do you mean by a Perl implementation for Java? If you want to embed Perl in your Java programs, you are going to need a Perl compiler.
It sounds to me like the problem you are having is that you do not have a Perl compiler/interpreter available, yet you need to execute some Perl code. Unfortunately, I don't think that there exists anything like Jython for Perl. The only projects that I know of that can do what you are asking is PLJava and JPL. Unfortunately, it looks like both projects are abandoned.
It would be a cool project though, as I believe there is a need for something like this.
Look at Sleep
is a multi-paradigm scripting language for the Java Platform
easy to learn with Perl and Objective-C inspired syntax
executes scripts fast with a small package size (~250KB)
excels at data manipulation, component integration, and distributed communication
seamlessly uses Java objects and 3rd party libraries
Rakudo is a JVM implementation of Perl 6 (which is quite different from standard Perl). Now there is Jerl (which runs in JVM but is compiled based on microperl). Both have their limitations but are solid contenders for most uses.
You can use par including modules (or even as an executable) if you don't have perl installed on the target platform: http://metacpan.org/pod/PAR