Regex not capturing matching in expected groups - java

I have been working on requirement and I need to create a regex on following string:
startDate:[2016-10-12T12:23:23Z:2016-10-12T12:23:23Z]
There can be many variations of this string as follows:
startDate:[*;2016-10-12T12:23:23Z]
startDate:[2016-10-12T12:23:23Z;*]
startDate:[*;*]
startDate in above expression is a key name which can be anything like endDate, updateDate etc. which means we cant hardcode that in a expression. The key name can be accepted as any word though [a-zA-Z_0-9]*
I am using the following compiled pattern
Pattern.compile("([[a-zA-Z_0-9]*):(\\[[[\\*]|[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}[Z]];[[\\*]|[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}[Z]]\\]])");
The pattern matches but the groups created are not what I expect. I want the group surrounded by parenthesis below:
(startDate):([*:2016-10-12T12:23:23Z])
group1 = "startDate"
group2 = "[*;2016-10-12T12:23:23Z]"
Could you please help me with correct expression in Java and groups?

You are using [ rather than ( to wrap options (i.e. using |).
For example, the following code works for me:
Pattern pattern = Pattern.compile("(\\w+):(\\[(\\*|\\d{4}):\\*\\])");
Matcher matcher = pattern.matcher(text);
if (matcher.matches()) {
for (int i = 0; i < matcher.groupCount() + 1; i++) {
System.out.println(i + ":" + matcher.group(i));
}
} else {
System.out.println("no match");
}
To simplify things I just use the year but I'm sure it'll work with the full timestamp string.
This expression captures more than you need in groups but you can make them 'non-capturing' using the (?: ) construct.
Notice in this that I simplified some of your regexp using the predefined character classes. See http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html for more details.

Here is a solution which uses your original regex, modified so that it actually returns the groups you want:
String content = "startDate:[2016-10-12T12:23:23Z:2016-10-12T12:23:23Z]";
Pattern pattern = Pattern.compile("([a-zA-Z_0-9]*):(\\[(?:\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}Z|\\*):(?:\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}Z|\\*)\\])");
Matcher matcher = pattern.matcher(content);
// remember to call find() at least once before trying to access groups
matcher.find();
System.out.println("group1 = " + matcher.group(1));
System.out.println("group2 = " + matcher.group(2));
Output:
group1 = startDate
group2 = [2016-10-12T12:23:23Z:2016-10-12T12:23:23Z]
This code has been tested on IntelliJ and appears to be working correctly.

Related

Regex to get value between two colon excluding the colons

I have a string like this:
something:POST:/some/path
Now I want to take the POST alone from the string. I did this by using this regex
:([a-zA-Z]+):
But this gives me a value along with colons. ie I get this:
:POST:
but I need this
POST
My code to match the same and replace it is as follows:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
System.out.println(matcher.group());
ss = ss.replaceFirst(":([a-zA-Z]+):", "*");
}
System.out.println(ss);
EDIT:
I've decided to use the lookahead/lookbehind regex since I did not want to use replace with colons such as :*:. This is my final solution.
String s = "something:POST:/some/path/";
String regex = "(?<=:)[a-zA-Z]+(?=:)";
Matcher matcher = Pattern.compile(regex).matcher(s);
if (matcher.find()) {
s = s.replaceFirst(matcher.group(), "*");
System.out.println("replaced: " + s);
}
else {
System.out.println("not replaced: " + s);
}
There are two approaches:
Keep your Java code, and use lookahead/lookbehind (?<=:)[a-zA-Z]+(?=:), or
Change your Java code to replace the result with ":*:"
Note: You may want to define a String constant for your regex, since you use it in different calls.
As pointed out, the reqex captured group can be used to replace.
The following code did it:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
ss = ss.replaceFirst(matcher.group(1), "*");
}
System.out.println(ss);
UPDATE
Looking at your update, you just need ReplaceFirst only:
String result = s.replaceFirst(":[a-zA-Z]+:", ":*:");
See the Java demo
When you use (?<=:)[a-zA-Z]+(?=:), the regex engine checks each location inside the string for a * before it, and once found, tries to match 1+ ASCII letters and then assert that there is a : after them. With :[A-Za-z]+:, the checking only starts after a regex engine found : character. Then, after matching :POST:, the replacement pattern replaces the whole match. It is totlally OK to hardcode colons in the replacement pattern since they are hardcoded in the regex pattern.
Original answer
You just need to access Group 1:
if (matcher.find()) {
System.out.println(matcher.group(1));
}
See Java demo
Your :([a-zA-Z]+): regex contains a capturing group (see (....) subpattern). These groups are numbered automatically: the first one has an index of 1, the second has the index of 2, etc.
To replace it, use Matcher#appendReplacement():
String s = "something:POST:/some/path/";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile(":([a-zA-Z]+):").matcher(s);
while (m.find()) {
m.appendReplacement(result, ":*:");
}
m.appendTail(result);
System.out.println(result.toString());
See another demo
This is your solution:
regex = (:)([a-zA-Z]+)(:)
And code is:
String ss = "something:POST:/some/path/";
ss = ss.replaceFirst("(:)([a-zA-Z]+)(:)", "$1*$3");
ss now contains:
something:*:/some/path/
Which I believe is what you are looking for...

regex for removing zeros in decimal string

I need to remove zeros from decimal string
eg: 007.004(100.007) should be transformed to 7.4(100.7)
I tried using a matcher based on the pattern "0+(\d)":
Pattern p = Pattern.compile(regex);
Matcher m = null;
try {
m = p.matcher(version);
while (m.find()) {
System.out.println("Group : " + m.group());
System.out.println("Group 1 : " + m.group(1));
version = version.replaceFirst(m.group(), m.group(1));
System.out.println("Version: " + version);
}
but this results in 7.4(10.7). Any thoughts on this ?
You need to do a replacement with this pattern:
(\\([^)]+\\))|0+
and this replacement string
\\1
In other words, you need to capture all that is between parenthesis first and then looking for zeros. use the replaceAll method.
There is no need to perform a replacement in another string while matching another:
while (m.find()) {
version = version.replaceFirst(m.group(), m.group(1));
You can instead use this replacement:
version = version.replaceAll("(^|\\.)0+", "$1");
If you are trying to remove leading zeroes before a nonzero digit, then you can match such runs with this pattern: "(?<!\\d)0+(?=[1-9])". That even uses a zero-length lookahead, as your tags suggest you might have wanted to do. It would be simpler to use than yours, too, because it doesn't match anything you want to keep:
Pattern p = Pattern.compile("(?<!\\d)0+(?=[1-9])");
Matcher m = p.matcher(version);;
version = matcher.replaceAll("");
If you're only going to do this once, then you can simplify to a one-liner:
version = version.replaceAll("(?<!\\d)0+(?=[1-9])", "");

How to ignore characters before and after my pattern?

I need some help creating a regex (in java) to find a pattern like:
(30.6284, -27.3493)
It's roughly a latitude longitude pair. Building from smaller pieces, I've come up with this:
String def = "\\((\\-?\\d+\\.\\d+),\\s*(\\-?\\d+\\.\\d+)\\)";
which works ok if I don't have any characters before or after the parenthesis. So this fails:
"hello (30.6284, -27.3493) "
but it'll work if I remove the "hello " before and the trailing whitespace. How can I ignore any other sequence of characters before and after the expression?
Thanks
You can use the following piece of code to find and extract multiple instances of the pattern in your text.
String def = "\\((\\-?\\d+\\.\\d+),\\s*(\\-?\\d+\\.\\d+)\\)";
String text = "hello (30.6284, -27.3493) (30.6284, -27.3493) ";
Pattern p = Pattern.compile(def);
Matcher m = p.matcher(text);
while (m.find()) {
System.out.println(text.substring(m.start(), m.end()));
}
I came up with this using this website: http://regexpal.com/ and http://www.regextester.com/
\(-?\d+\.?\d+, -?\d+\.?\d+\)
This will match, but not capture, and probably isn't in your language specific format (but should be easily modifiable. To support capturing you could use this one:
\((-?\d+\.?\d+), (-?\d+\.?\d+)\)
String s = "hello (30.6284, -27.3493) ";
System.out.println(s.replaceAll(".*(\\((\\-?\\d+\\.\\d+),\\s*(\\-?\\d+\\.\\d+)\\)).*","$1"));
output:
(30.6284, -27.3493)
Note that if you're going to be looping through to find things, I would use something like this:
Matcher m = Pattern.compile(".*(\\((\\-?\\d+\\.\\d+),\\s*(\\-?\\d+\\.\\d+)\\)).*").matcher(s);
while(m.find()){
System.out.println(m.start()+ " " + m.group(1));
}

How to use multiple different patterns?

how to check strings for multi-pattern regex not for single pattern if tried for one pattern but I need it for multi-pattern and i tried but it doesn't work.
when I running these codes just I can get one of them (time or price ) that is in the String but when I combine them don't show me any output.
thanks for your help....
here is my code :
String line = "This order was places for QT 30.00$ ! OK? and time is 2:45";
String pattern = "\\d+[.,]\\d+.[$]"+"\\d:\\d\\d";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
} else {
System.out.println("NO MATCH");
}
The "+" operator does not separate patterns - it concatenates strings.
What you can do is provide a pattern that accepts characters in between the two groups.
String pattern = "(\\d+[.,]\\d+.[$]).*(\\d:\\d\\d)";
The parentheses above are optional. If you include them, you can get the matched price and time as separate strings:
if (m.find( )) {
System.out.println("Found value: " + m.group(1) + " with time: " + m.group(2));
}
EDIT:
Just noticed your comment that you're looking for OR, not AND.
You can do that with an expression of the form X | Y:
String pattern = "\\d+[.,]\\d+.[$]|\\d:\\d\\d";
This will match either a price or a time, whichever occurs first. You can get the match with m.group(0).

Retrieving Regex matched pattern

I need to retrieve a regex pattern matched strings from the given input.
Lets say, the pattern I need to get is like,
"http://mysite.com/<somerandomvalues>/images/<againsomerandomvalues>.jpg"
Now I created the following regex pattern for this,
http:\/\/.*\.mysite\.com\/.*\/images\/.*\.jpg
Can anybody illustrate how to retrieve all the matched pattern with this regx expression using Java?
You don't mask slashes but literal dots:
String regex = "http://(.*)\\.mysite\\.com/(.*)/images/(.*)\\.jpg";
String url = "http://www.mysite.com/work/images/cat.jpg";
Pattern pattern = Pattern.compile (regex);
Matcher matcher = pattern.matcher (url);
if (matcher.matches ())
{
int n = matcher.groupCount ();
for (int i = 0; i <= n; ++i)
System.out.println (matcher.group (i));
}
Result:
www
work
cat
Some simple Java example:
String my_regex = "http://.*.mysite.com/.*/images/.*.jpg";
Pattern pattern = Pattern.compile(my_regex);
Matcher matcher = pattern.matcher(string_to_be_matched);
// Check all occurance
while (matcher.find()) {
System.out.print("Start index: " + matcher.start());
System.out.print(" End index: " + matcher.end() + " ");
System.out.println(matcher.group());
}
In fact, it is not clear if you want the whole matching string or only the groups.
Bogdan Emil Mariesan's answer can be reduced to
if ( matcher.matches () ) System.out.println(string_to_be_matched);
because you know it is mathed and there are no groups.
IMHO, user unknown's answer is correct if you want to get matched groups.
I just want to add additional information (for others) that if you need matched group you can use replaceFirst() method too:
String firstGroup = string.replaceFirst( "http://mysite.com/(.*)/images/", "$1" );
But performance of Pattern.compile approach if better if there are two or more groups or if you need to do that multiple times (on the other hand in programming contests, for example, it is faster to write replaceFirst()).

Categories