Wanted: a very simple Java RegExp API - java

I'm tired of writing
Pattern p = Pattern.compile(...
Matcher m = p.matcher(str);
if (m.find()) {
...
Over and over again in my code. I was going to write a helper class to make it neater, but I then I wondered: is there a library that tries to provide a simpler facade for Regular Expressions in Java?
I'm thinking something in the style of commons-lang and Guava.
CLARIFICATION: I am actually hoping for some general library that would make working with regular expression a more streamlined experience, kind of like how perl does it. The code above was just an example.
I was thinking of something I could use like this:
for (int question : RegEx.findAllInts("SO question #(\\d+)", str)) {
// do something with int
}
Again, this is just an example of one of the many things I'd like to have. Probably not even a good example. APIs are hard.
UPDATE: I guess the answer is "No". Thanks for all the answers, have an upvote.

Why not just write your own wrapper method? Sure, you should not reinvent the wheel but another library also means another dependency.

Pattern should only be compiled once; save it in a static final field. This at least saves you from repeating, at coding time an runtime, this step. That is to say, this step ought not always go hand-in-hand with creating a Matcher for performance reasons.
In your example, it seems RegEx plays the role of a Matcher object anyway. I hope it's not supposed to be a class with a static method since this would not work in a multithreaded environment -- the find and getInt calls are not connected then. So you need a Matcher of some sort anyway.
And so you're back to precisely the Java API, when design considerations are factored in. No I don't think there's a shorter way to do this correctly and efficiently.

There is a java library which has extend feature over the built-in java regex library . Have a look at RegExPlus. I haven't tried it personally.But hope this helps.

Yeah, it's always bugged me, too, having to write so much boilerplate to perform such common tasks. I think it would help a lot if String had a pair of methods like
public String findFirst(String regex)
public String[] findAll(String regex)
These represent the two most commonly performed regex operations that aren't already supported by String methods. If we had those, plus a dynamic replacement facility like Rewriter, we could almost forget about Pattern and Matcher. We would only need them when we're writing something really complicated, like a findAllInts() method. :D

There is Jakarta Regexp (see the RE class). Have a look at this old thread for advantages of Jakarta's RegExp package over the Java built-in RegEx.

Since Java 1.4, you can also use String.matches(String regex). Which precisely is a facade to the aforementionned code.

For the specific example you give, you might be able to improvise something using Guava's splitter:
for (String number : Splitter.onPattern("[^\d]+").split(input)) {
// Do something with the number
}
or more specifically, if you had input like
SO question #1234, SO Question #3456, SO Question #5678
you might do
for (String number : Splitter.onPattern("(, )? SO Question #").split(input)) {
// Do something
}
It's a bit hacky, but in specific cases it may do what you're after.

Related

RegEx that captures a method and its body [duplicate]

This question already has answers here:
Java : parse java source code, extract methods
(2 answers)
Closed 1 year ago.
I have tried to develop a regex that captures a method and its body (The modifier is not important), but I could not develop a solid solution. The regex that I came up with so far is this: \\b\\w*\\s*\\w*\\s*\\(.*?\\)\\s*\\{([^}]+)\\}
It does not capture the methods correctly because it does not consider matching balanced Curley braces. Thus, sometimes it captures part of the method and not all. What am I doing wrong or what could I do to improve the solution that can capture the whole method!
You can't do this. It's impossible.
The 'regular' in 'Regular Expression' refers to a certain subset of grammars; the so-called 'Regular Grammars'.
Here's the thing:
Non-Regular Grammars cannot be parsed with regular expressions.
Java (the language) is Non-Regular.
Thus, you can't use regular expressions for this, QED.
So, how do you parse java?
There are many ways; so far, java is still so-called LL(k) parseable, which means that just about every 'parser/grammar' library out there will be capable of parsing java code, and many such libraries ship with a java grammar as an example. These usually aren't quite perfect, but pretty good.
A basic web search gets you many options. Alternatively, javac is free (but GPL, you'd have to GPL anything you build with it), and ecj (the parser that powers eclipse, amongst other things) is open source with a more permissive license. It's also faster. It's also far harder to use, so there's that.
These are fairly complex tools. However, java is a very complex language (much programming languages are). Parsing them is decidedly non-trivial.
Before you think: Geez, surely it can't be this hard, consider:
public void test {
{}
String x = "{";
}
Which is legal java.
Or:
public void test() {
// method body
\u007D
That really is legal java, that \u007D thing closes it. Of course...
public void test() {
//{} \u007D
}
Here the \u thing doesn't. It is a real closing brace, but, that is in a comment.
Another one to consider:
public void test() {
class Foo {
String y = """
}
""";
}
}
Hopefully, considering the above, you realize you stand absolutely no chance whatsoever unless you use a parser that knows about the entire language spec.

Equivalent of Ruby #map or #collect in Java?

Lets say I have an array movies = get_movies()
In ruby I often do
movies.map {|movie| movie.poster_image_url } or somesuch.
What can I do that is similar in Java? And by similarly elegant and terse and readable. I know there are a bazillion ways I can do this but if there's a nice way to do this that will make me not want to use Groovy or something let me know. I'm sure Java has some awesome ways to do things like this.
This is my Java code so far using TheMovieDB API Java wrapper from https://github.com/holgerbrandl/themoviedbapi/.
TmdbMovies movies = new TmdbApi(BuildConfig.MOVIEDB_API_KEY).getMovies();
MovieResultsPage results = movies.getPopularMovieList("en", 1);
// The following line is RubyJava and needs to your help!
results.getResults().map {|e| e.getPosterPath() };
// or ... more RubyJava results.getResults().map(&:getPosterPath());
A little more about #map/#collect in Ruby in case you know a lot of Java, but aren't familiar with ruby. http://ruby-doc.org/core-1.9.3/Array.html#method-i-collect
Closest thing I've seen to answering this from some quick browsing so far... https://planet.jboss.org/post/java_developers_should_learn_ruby
These look close, too. http://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html
So many options: Functional Programming in Java
This is Android as well... Anything good things that are available for Android devs out of the box and make this kind of programming easy? This is a functional programming style, right?
--
After getting replies with really good insights like: 'there is nothing wrong with a for loop' and (basically) 'syntax isn't everything', I am deciding that I will not try to make all my Java look like Ruby! I read this and then imagined an alternate future where 'future me' made a whole bunch of bad style decisions: https://github.com/google/guava/wiki/FunctionalExplained. <-- (A good read. TL;DR 'when you go to preposterous lengths to make your code "a one-liner," the Guava team weeps')
There's the map method on streams which takes a method argument.
collection.stream()
.map(obj -> obj.someMethod())
.collect(Collectors.toList()));
map returns another stream so in order to retrieve the list you have call the collect method.
Too much to explain in a post, but you can visit this link which helped me out a lot:
http://winterbe.com/posts/2014/03/16/java-8-tutorial/
I think (a collections class).foreach() is what you want, requires java8, and often makes use of lambda expressions to implement the class that fulfills the required input for the 'foreach()' method.
http://www.mkyong.com/java8/java-8-foreach-examples/
To address your change to that this is Android, then NO, you WILL NOT get java8 class changes such as .foreach(), you CAN get lambda expressions by making use of the retrolambda android variant. But this only gives you some Java8 'syntax' not 'classes', you won't get access to the Streams classes either.

Pattern Matching On Java Syntax (Checkstyle)

I am currently trying to create my own check in Checkstyle.
It's supposed to throw a warning for commented Code inside a class.
Now, as far as the recognition of comments goes, I got it all figured out, but now I'm facing the problem of how to make it recognize Java Code.
Are there any collections which provide these features already? Just checking for certain keywords like modifiers, types, scopes, etc. would be too vague in some situations.
tl;dr: Looking for a way to find out if a string is java code or not (pattern matching)
It would be very hard to determine if a line is Java code or not, as a line can be as little as a single }. That said, if you want to check if a FILE is java, there are some good Regex options for you, mostly because you can look at the context of a certain line.
Even if you use those you could craft a specific file that will be detected as if it were Java, while it actually isn't. That said, it would work for most if not all "normal" files.
If the Regex is what you're looking for, you might want to look for similar threats on StackOverflow, because there should be a few around (I used one myself a while ago). If you really want to do this in Checkstyle however, you might be out of luck...
A good heuristic method to determine large blocks of commented code is to check for preceding spaces. A "valid" comment will usually be indented with the actual code:
public class A {
public void a() {
// valid comment
...
}
}
Whereas a code block that has been commented with ctrl-7 will directly start with the // characters:
public class A {
// public void a() {
// // valid comment
// ...
// }
}
Thus, your regular expression would look something like this
^//.*

Standard Java Class for common URL/URI manipulation

This question has almost certainly been asked before, but I ask it anyway because I couldn't find an answer.
Generally, is there a utility class of some sort that assists in common String manipulations associated with URL/URIs?
I'm thinking something like Java SE's URL Class, but maybe a little beefier. I'm looking for something that will let you do simple things, like:
Get a List of query string parameters
An "addParameter" method to add a
query string parameter, and it will
take care of adding "&", "?", and "="
where necessary
Also, encoding
parameter values would be ideal...
Let me know, thanks!
There isn't really (oddly enough) any standard that does it all. There are some bits and pieces, usually buried in various util packages:
I've used http://java.net/projects/urlencodedquerystring/pages/Home to decent effect (for extraction of parameters).
Atlassian's JIRA has http://docs.atlassian.com/jira/4.2/index.html?com/atlassian/jira/util/UrlBuilder.html, which I've actually extracted from the jar and used.
On Android, http://developer.android.com/reference/android/net/Uri.Builder.html is a Uri builder that works pretty well as far as building a url with ease.
And finally, in a classic case of history repeating itself: A good library to do URL Query String manipulation in Java.
I'd really just rip out the android.net.Uri.Builder class and pair that with the urlencodedquerystring class and then carry those around with you, but this does seem like a good candidate for an Apache commons package.
I personnaly like UriBuilder from jax-rs
This does not answer OP's question directly (i.e. it's not a generic, all-around library for URL manipulation), but: if you're going to be using Spring anyway, you might as well consider the ServletUriComponentsBuilder and UriComponentsBuilder classes (see here and here for javadocs).
I believe they are bundled with the spring-web dependency. IMHO, these offer quite a few convenient utility methods for working with URIs, URLs and query parameters.

Shouldn't "static" patterns always be static?

I just found a bug in some code I didn't write and I'm a bit surprised:
Pattern pattern = Pattern.compile("\\d{1,2}.\\d{1,2}.\\d{4}");
Matcher matcher = pattern.matcher(s);
Despite the fact that this code fails badly on input data we get (because it tries to find dates in the 17.01.2011 format and gets back things like 10396/2011 and then crashed because it can't parse the date but that really ain't the point of this question ; ) I wonder:
isn't one of the point of Pattern.compile to be a speed optimization (by pre-compiling regexps)?
shouldn't all "static" pattern be always compiled into static pattern?
There are so many examples, all around the web, where the same pattern is always recompiled using Pattern.compile that I begin to wonder if I'm seeing things or not.
Isn't (assuming that the string is static and hence not dynamically constructed):
static Pattern pattern = Pattern.compile("\\d{1,2}.\\d{1,2}.\\d{4}");
always preferrable over a non-static pattern reference?
Yes, the whole point of pre-compiling a Pattern is to only do it once.
It really depends on how you're going to use it, but in general, pre-compiled patterns stored in static fields should be fine. (Unlike Matchers, which aren't threadsafe and therefore shouldn't really be stored in fields at all, static or not.)
The only caveat with compiling patterns in static initializers is that if the pattern doesn't compile and the static initializer throws an exception, the source of the error can be quite annoying to track down. It's a minor maintainability problem but it might be worth mentioning.
first, the bug in pattern is because dot (.) matches everything. If you want to match dot (.) you have to escape it in regex:
Pattern pattern = Pattern.compile("\\d{1,2}\\.\\d{1,2}\\.\\d{4}");
Second, Pattern.compile() is a heavy method. It is always recommended to initialize static pattern (I mean patterns that are not being changed or not generated on the fly) only once. One of the popular ways to achieve this is to put the Pattern.compile() into static initializer.
You can use other approach. For example using singleton pattern or using framework that creates singleton objects (like Spring).
Yes, compiling the Pattern on each use is wasteful, and defining it statically would result in better performance. See this SO thread for a similar discussion.
Static Patterns would remain in memory as long as the class is loaded.
If you are worried about memory and want a throw-away Pattern that you use once in a while and that can get garbage collected when you are finished with it, then you can use a non-static Pattern.
It is a classical time vs. memory trade-off.
If you are compiling a Pattern only once, don't stick it in a static field.
If you measured that compiling Patterns is slow, pre-compile it and put it in a static field.

Categories