This question already has answers here:
Java : parse java source code, extract methods
(2 answers)
Closed 1 year ago.
I have tried to develop a regex that captures a method and its body (The modifier is not important), but I could not develop a solid solution. The regex that I came up with so far is this: \\b\\w*\\s*\\w*\\s*\\(.*?\\)\\s*\\{([^}]+)\\}
It does not capture the methods correctly because it does not consider matching balanced Curley braces. Thus, sometimes it captures part of the method and not all. What am I doing wrong or what could I do to improve the solution that can capture the whole method!
You can't do this. It's impossible.
The 'regular' in 'Regular Expression' refers to a certain subset of grammars; the so-called 'Regular Grammars'.
Here's the thing:
Non-Regular Grammars cannot be parsed with regular expressions.
Java (the language) is Non-Regular.
Thus, you can't use regular expressions for this, QED.
So, how do you parse java?
There are many ways; so far, java is still so-called LL(k) parseable, which means that just about every 'parser/grammar' library out there will be capable of parsing java code, and many such libraries ship with a java grammar as an example. These usually aren't quite perfect, but pretty good.
A basic web search gets you many options. Alternatively, javac is free (but GPL, you'd have to GPL anything you build with it), and ecj (the parser that powers eclipse, amongst other things) is open source with a more permissive license. It's also faster. It's also far harder to use, so there's that.
These are fairly complex tools. However, java is a very complex language (much programming languages are). Parsing them is decidedly non-trivial.
Before you think: Geez, surely it can't be this hard, consider:
public void test {
{}
String x = "{";
}
Which is legal java.
Or:
public void test() {
// method body
\u007D
That really is legal java, that \u007D thing closes it. Of course...
public void test() {
//{} \u007D
}
Here the \u thing doesn't. It is a real closing brace, but, that is in a comment.
Another one to consider:
public void test() {
class Foo {
String y = """
}
""";
}
}
Hopefully, considering the above, you realize you stand absolutely no chance whatsoever unless you use a parser that knows about the entire language spec.
Related
I am trying to develop an advanced math expression calculator in Java. My goal is that calculator can determinate expressions like these:
2 * 6^( log(123) - sin(7)^cos(2) )
lim( (5x^2) / x , x -> 1)
.
.
.
Is there anything (default function, external library) for this in Java or C++?
You could do this...
import javax.script.ScriptEngineManager;
import javax.script.ScriptEngine;
public class Test {
public static void main(String[] args) throws Exception{
ScriptEngineManager mgr = new ScriptEngineManager();
ScriptEngine engine = mgr.getEngineByName("JavaScript");
String foo = "2 * 6^( log(123) - sin(7)^cos(2) )";
System.out.println(engine.eval(foo));
}
}
In addition if you want to solve the limits, you could always use an API.
You can send the request with the right syntax for that api, which will give you the
result you can parse. This is however harder to program and API's are often not free.
Another option is calculating limits yourself, I'm not aware of APIs for that or standard
Java functions.
If you really want to start from scratch and re-invent the weel I would suggest you build your own eval function. This is a good starting point for that: Java/c++ example.
If your expressions are not very complex, I think this answer will guide you in the right way:
Built-in method for evaluating math expressions in Java
If your expressions are rather sophisticated, using limits as in your example, I advice you to take a look at the Dragon Book. It should be easier to implement an easy expression parser and use some strong math library below.
This is less of a Java domain question and more of a language parsing problem. The traditional way to approach this is to build a lexer and parser, sometimes in another language, that generates the code in the language you want. This way you separate the parsing concerns from the actual "client" program concerns.
This can be easier, in the long run, than trying to write what is going to be a pretty complicated regular expression state machine that will be hard to prove is correct in all cases. An infix math parser/processor is interesting enough that having a "little language" version of the rules and definitions makes it a lot easier to prove your program is correct.
In Java you might want to consider ANTLR to generate the parser, though I admit I have never had to use it. But my understanding is that ANTLR is familiar enough if you have used Lex & Yacc.
[UPDATE]
I don't know if infix is a hard requirement, but parsing complex math using a stack and postfix operations can be much easier to implement if you don't want to generate a parser. As an added bonus, this allows you to do math like Yoda.
In general, what you need is called a computer algebra system. I don't know what requirements you have but there are at least 2 general ways to go about it.
(1) link your program to a library to do the algebraic stuff. For C++ you can try Yacas and for Python you can try Sympy. Dunno about Java.
(2) write your program separately and talk to a CAS via a socket. In that case the CAS could be anything, e.g., Maxima, Sage, etc etc. A socket interface is maybe less work than you might think -- it is certainly much, much less work than reimplementing CAS functions.
My advice, without knowing your requirements, is to write your program in Python and use Sympy.
I am currently trying to create my own check in Checkstyle.
It's supposed to throw a warning for commented Code inside a class.
Now, as far as the recognition of comments goes, I got it all figured out, but now I'm facing the problem of how to make it recognize Java Code.
Are there any collections which provide these features already? Just checking for certain keywords like modifiers, types, scopes, etc. would be too vague in some situations.
tl;dr: Looking for a way to find out if a string is java code or not (pattern matching)
It would be very hard to determine if a line is Java code or not, as a line can be as little as a single }. That said, if you want to check if a FILE is java, there are some good Regex options for you, mostly because you can look at the context of a certain line.
Even if you use those you could craft a specific file that will be detected as if it were Java, while it actually isn't. That said, it would work for most if not all "normal" files.
If the Regex is what you're looking for, you might want to look for similar threats on StackOverflow, because there should be a few around (I used one myself a while ago). If you really want to do this in Checkstyle however, you might be out of luck...
A good heuristic method to determine large blocks of commented code is to check for preceding spaces. A "valid" comment will usually be indented with the actual code:
public class A {
public void a() {
// valid comment
...
}
}
Whereas a code block that has been commented with ctrl-7 will directly start with the // characters:
public class A {
// public void a() {
// // valid comment
// ...
// }
}
Thus, your regular expression would look something like this
^//.*
This question already has answers here:
What Are The Benefits Of Scala? [closed]
(5 answers)
Java 8 and Scala [closed]
(8 answers)
Closed 9 years ago.
I am reading about Scala here and there but I could not understand what would a Java developer gain from jumping on Scala.
I think it is something to do with functional programming.
Could someone please give me a concrete example of something I can not do in Java and going to Scala would save me?
This is not intented to be a critique on Java or something similar.I only need to understand the usage of Scala
I'm also jumping from Java's world, the first thing that I think will save you is that Scala has a lot of compiler magic that helps you to keep your code simple and clean.
For example, the following is how Scala's case class will help you, we could define a class using this:
case class Student(name: String, height: Double, weight: Double)
instead of this:
class Student {
public final String name;
public final String height;
public final String weight;
public Student(String name, double height, double weight) {
this.name = name;
this.height = height;
this.weight = weight;
}
}
Yes, that is all, you don't need write constructor yourself, and you have all those equals, hasCode, toString method for free.
It may looks like not a big deal in this simple case, but Scala really make you to model things a lot easier and quicker in OO, even if you are not using functional programming construct.
Also, high-order function and other functional programming construct will also give you powerful tools to solve your problem in Scala.
Update
OK, the following is a example of how functional programming will make your code more easier to understand and clean.
Functional programming is a large topic, but I found that even I'm from Java's world and does not understand what is monad or typeclass or whatever, Scala's functional programming construct is still help me to solve problem more easily and more expressive.
There are many times we need to iterate over a collection and do something to the elements in the collection, depends on a condition to decide what to do.
For a simple example, if we want to iterate over a List in java, and delete all file that size are zero. What we will do in Java maybe looks like the following:
List<File> files = getFiles()
for (File file: files) {
if (file.size() == 0) {
file.delete();
}
}
It's very easy and concise, isn't it? But with functional programming construct in Scala, we could do the following:
val files = getFiles()
val emptyFiles = files.filter(_.size == 0)
emptyFiles.foreach(_.delete())
As you can see, it has less code than the Java's version, and out intention is even clear -- we want filter out all files that size is 0, and for all of it, call File.delete() on it.
It may looks weird at first, but once you get used to it and use it in right way, it will make your code a lot easier to read.
The technique is possible in Java (Function Java), but in the end it will looks like the following code:
list.filter(new Predicate<File>() {
public boolean predicate(File f) {
return f.size() == 0;
}
})
Which I'll just stick to the origin for-loop version, IMHO.
To get you started, this article by Graham Lea lists many ways that Scala can boost your productivity:
A New Java Library for Amazing Productivity.
It begins with:
a broad and powerful collections framework
collection methods that greatly reduce boilerplate
immutable collections that don’t have mutation methods (unlike java.util classes where e.g. List.add() throws an exception if the list is immutable)
an awesome switch-like function that doesn’t just match numbers, enums, chars and strings, but can succinctly match all kinds of patterns in lots of different classes, even in your own classes
an annotation that automatically writes meaningful equals, hashCode and toString methods for classes whose fields don’t change (without using reflection)
...
and the list goes on.
To be specific on your question:
in scala you can pattern match
scala has higher order functions (which is not the same as java8 lambdas)
anonymous function literals
strong type inference
lazy evaluation
variance annotation
higher kinded types
mixin behavior with traits (stackable composition)
implicit definition and conversion
xml literals
REPL
I could go on but I urge you to read at least some documentation.
For instance you can start here
Functional programming is the most brief and accurate statement. This and the shorter more expressive code that scala promotes translates to higher productivity.
Scala is also more Conscisce
There are also several other examples of this.
Being a Java programmer, I don't really have a Groovy background, but I use Groovy a lot lately to extend Maven (using GMaven). So far, I could use all the Java code I need in Groovy with the added Groovy sugar (metaclass methods, more operators, closures). My knowledge of Groovy is far from complete, but I like it, especially for Scripting purposes (I'm a bit careful about using a non-static typed language in an enterprise scenario, but that's not the topic here).
Anyway, the question is:
Is every bit of valid Java code automatically valid Groovy code? (I am talking about Source code, not compiled classes, I know Groovy can interact with Java classes.) Or are there Java constructs that are illegal in Groovy? Perhaps a reserved Groovy keyword that could be used as an identifier in Java, or something else? Or has Groovy deliberately been designed to be 100%-source compatible with Java?
Nope. The following are keywords in groovy, but not Java:
any as def in with
Additionally, while not keywords, delegate and owner have special meaning in closures and can trip you up if you're not careful.
Additionally, there are some minor differences in the language syntax. For one thing, Java is more flexible about where array braces occur in declarations:
public static void main(String args[]) // valid java, error in groovy
Groovy is parsed differently, too. Here's an example:
public class Test {
public static void main(String[] args) {
int i = 0;
i = 5
+1;
System.out.println(i);
}
}
Java will print 6, groovy will print 5.
While groovy is mostly source compatible with java, there are lots of corner cases that aren't the same. That said, it is very compatible with the code people actually write.
It isn't.
My favorite incompatibility: literal arrays:
String[] s = new String[] {"a", "b", "c"};
In Groovy, curly braces in this context would be expected to contain a closure, not a literal array.
There's a page on the Groovy site which documents some of the differences, and another page which lists gotchas (such as the newline thing)
There are other things as well, one example being that Groovy doesn't support the do...while looping construct
Others have already given examples of Java syntax that is illegal in Groovy (e.g. literal arrays). It is also worth remembering that some syntax which is legal in both, does not mean the same thing in both languages. For example in Java:
foo == bar
tests for identity, i.e. do foo and bar both refer to the same object? In Groovy, this tests for object equality, i.e. it returns the result of foo.equals(bar)
I'm a java programmer, but now entering the "realm of python" for some stuff for which Python works better. I'm quite sure a good portion of my code would look weird for a Python programmer (e.g. using parenthesis on every if).
I know each language has its own conventions and set of "habits". So, from a readability standpoint what are conventions and practices which is "the way to go" in Java, but are not really the "pythonic way" to do stuff?
There's no simple answer to that question. It takes time for your code to be "Pythonic". Don't try and recreate Java idioms in Python. It will just take time to learn Python idioms.
Take a look at Code Like a Pythonista: Idiomatic Python, Style Guide for Python Code and Python for Java Programmers (archived).
Jacob Hallén once observed that the best Python style follows Tufte's rejection of decoration (though Tufte's field is not programming languages, but visual display of information): don't waste "ink" (pixels) or "paper" (space) for mere decoration.
A lot follows from this principle: no redundant parentheses, no semicolons, no silly "ascii boxes" in comments and docstrings, no wasted space to "align" things on different rows, single quotes unless you specifically need double quotes, no \ to continue lines except when mandatory, no comments that merely remind the reader of the language's rules (if the reader does not know the language you're in trouble anyway;-), and so forth.
I should point out that some of these consequences of the "Tufte spirit of Python" are more controversial than others, within the Python community. But the language sure respects "Tufte's Spirit" pretty well...
Moving to "more controversial" (but sanctioned by the Zen of Python -- import this at an interpreter prompt): "flat is better than nested", so "get out as soon as sensible" rather than nesting. Let me explain:
if foo:
return bar
else:
baz = fie(fum)
return baz + blab
this isn't terrible, but neither is it optimal: since "return" ``gets out'', you can save the nesting:
if foo:
return bar
baz = fie(fum)
return baz + blab
A sharper example:
for item in container:
if interesting(item):
dothis(item)
dothat(item)
theother(item)
that large block being double-nested is not neat... consider the flatter style:
for item in container:
if not interesting(item):
continue
dothis(item)
dothat(item)
theother(item)
BTW, and an aside that's not specifically of Python-exclusive style -- one of my pet peeves (in any language, but in Python Tufte's Spirit supports me;-):
if not something:
this()
that()
theother()
else:
blih()
bluh()
blah()
"if not ... else" is contorted! Swap the two halves and lose the not:
if something:
blih()
bluh()
blah()
else:
this()
that()
theother()
The best place to start is probably PEP-8, which is the official Python style guide. It covers a lot of the basics for what is considered standard.
In addition, some previous stackoverflow questions:
What are the important language features idioms of python to learn early on?
What does pythonic mean?
What defines “pythonian” or “pythonic”?
Python: Am I missing something?
Zen of python
"Everything is a class" is a Java idiom that's specifically not a Python idiom. (Almost) everything can be a class in Python, and if that's more comfortable for you then go for it, but Python doesn't require such a thing. Python is not a purely object-oriented language, and in my (limited) experience it's good to take that to heart.
Syntax is only the tip of an iceberg. There are a number of different language construct that Java programmers should be aware of, e.g. Python do not need to use interface
Creating an interface and swappable implementations in python - Stack Overflow
The other really useful idiom is everything can be convert to a boolean value with an intuitive meaning in Python. For example, to check for an empty array, you simply do
if not my_array:
return
...process my_array...
The first condition is equivalent to Java's
if ((my_array == null) || (my_array.length == 0)) {
return
}
This is a godsend in Python. Not only is it more concise, it also avoid a Java pitfall where many people do not check for both conditions consistently. Countless NullPointerException are averted as a result.