Getting specific portion of string from a Matcher - java

I'd like to get a portion of a matched string coming from a Matcher, like this:
Pattern pat = Pattern.compile("a.*l.*z");
Matcher match = pat.matcher("abcdlmnoz"); // I'd want to get bcd AND mno
ArrayList<String> values = match.magic(); //here is where your magic happens =)
ArrayList<String> is only for this example, I could be happy to recieve either a List or individual String items. The best would be what.htaccess files and RewriteRule's do:
RewriteRule (.*)/path?(.*) $1/$2/modified-path/
Well, putting those (.*) into $arguments would be as cool as an ArrayList or accessing String separately. I've been looking for something at Java Matcher API, but I didn't happen to see anything useful inside.
Thanks in advance, guys.

You can capture groups in a regexp match using (_):
Pattern pat = Pattern.compile("a(.*)l(.*)z");
boolean b = match.matches(); // don't forget to attempt the match
Then use match.group(n) to get that portion of the capture. The groups are stored in the match object.
Capturing GroupsOracle

Look at the matcher's "group" method and peruse the doc you linked to for references to groups, which is what the parentheses in the regex do :)

...
String testStr = "abcdlmnoz";
String myRE = "a(.*)l(.*)z";
Pattern myRECompiled = Pattern.compile (myRE,
DOTALL);
Matcher myMatcher = myRECompiled.matcher (testStr);
myMatcher.find ();
System.out.println (myMatcher.group (1));
System.out.println (myMatcher.group (2));
...

Related

Java: Need to extract a number from a string

I have a string containing a number. Something like "Incident #492 - The Title Description".
I need to extract the number from this string.
Tried
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(theString);
String substring =m.group();
By getting an error
java.lang.IllegalStateException: No match found
What am I doing wrong?
What is the correct expression?
I'm sorry for such a simple question, but I searched a lot and still not found how to do this (maybe because it's too late here...)
You are getting this exception because you need to call find() on the matcher before accessing groups:
Matcher m = p.matcher(theString);
while (m.find()) {
String substring =m.group();
System.out.println(substring);
}
Demo.
There are two things wrong here:
The pattern you're using is not the most ideal for your scenario, it's only checking if a string only contains numbers. Also, since it doesn't contain a group expression, a call to group() is equivalent to calling group(0), which returns the entire string.
You need to be certain that the matcher has a match before you go calling a group.
Let's start with the regex. Here's what it looks like now.
Debuggex Demo
That will only ever match a string that contains all numbers in it. What you care about is specifically the number in that string, so you want an expression that:
Doesn't care about what's in front of it
Doesn't care about what's after it
Only matches on one occurrence of numbers, and captures it in a group
To that, you'd use this expression:
.*?(\\d+).*
Debuggex Demo
The last part is to ensure that the matcher can find a match, and that it gets the correct group. That's accomplished by this:
if (m.matches()) {
String substring = m.group(1);
System.out.println(substring);
}
All together now:
Pattern p = Pattern.compile(".*?(\\d+).*");
final String theString = "Incident #492 - The Title Description";
Matcher m = p.matcher(theString);
if (m.matches()) {
String substring = m.group(1);
System.out.println(substring);
}
You need to invoke one of the Matcher methods, like find, matches or lookingAt to actually run the match.

regular expression text between two sign

I have a text and I want to replace variables in it with proper values and my variables located between two #. When I use [/(?m)#.*?#/] to get these texts it also returns texts before and after first and last #. how could I get texts only between these two # sign. thanks in advance.
I use String.split("") method in Java.
for example I want use on the following String:
this is #the best# possible way #t#o do result!!!
and I wanna get these two results:
the best
t
In Java you can use this regex to grab value between first and second #:
String repl = input.replaceFirst("(?m)^[^#]*#([^#]*)#.*$" "$1");
To grab value between first and last #:
String repl = input.replaceFirst("(?m)^[^#]*#(.*?)#[^#]*$" "$1");
To find multiple matches use Pattern, Matcher:
Pattern p = Pattern.compile("#([^#]*)#"):
Matcher m = p.matcher(p);
while (m.find()) {
System.out.prinln(m.group(1));
}
RegEx Demo
Split() is the wrong tool to use here, use the Matcher() method to do this instead.
String s = "this is #the best# possible way #t#o do result!!!";
Pattern p = Pattern.compile("#([^#]*)#");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(1));
}
Output
the best
t

java regular expression word without ending with dot

I need to print the simple bind variable names in the SQL query.
I need to print the words starting with : character But NOT ending with dot . character.
in this sample I need to print pOrg, pBusinessId but NOT the parameter.
The regular expression ="(:)(\\w+)^\\." is not working.
Could you help in correcting the regular expression.
Thanks
Peddi
public void testMethod(){
String regEx="(:)(\\w+)([^\\.])";
String input= "(origin_table like 'I%' or (origin_table like 'S%' and process_status =5))and header_id = NVL( :parameter.number1:NULL, header_id) and (orginization = :pOrg) and (businsess_unit = :pBusinessId";
Pattern pattern;
Matcher matcher;
pattern = Pattern.compile(regEx);
matcher = pattern.matcher(input);
String grp = null;
while(matcher.find()){
grp = matcher.group(2);
System.out.println(grp);
}
}
You can try with something like
String regEx = "(:)(\\w+)\\b(?![.])";
(:)(\\w+)\\b will make sure that you are matching only entire words starting with :
(?![.]) is look behind mechanism which makes sure that after found word there is no .
This regex will also allow :NULL so if there is some reason why it shouldn't be matched share it with us.
Anyway to exclude NULL from results you can use
String regEx = "(:)(\\w+)\\b(?![.])(?<!:NULL)";
To make regex case insensitive so NULL could also match null compile this pattern with Pattern.CASE_INSENSITIVE flag like
Pattern pattern = Pattern.compile(regEx,Pattern.CASE_INSENSITIVE);
Since it looks like you're using camelcase, you can actually simplify things a bit when it comes to excluding :NULL:
:([a-z][\\w]+)\\b(?!\\.)
And $1 will return your variable names.
Alternative that doesn't rely on negative lookahead:
:([a-z][\\w]+)\\b(?:[^\\.]|$)
You can try:
Pattern regex = Pattern.compile("^:.*?[^.]$");
Demo

Java replaceAll use found pattern in replacement

I want to replace all links on a HTML page with a defined URL and the original link in the query string.
Here is an example:
"http://www.ex.com abc http://www.anotherex.com"
Should be replaced by:
"http://www.newex.com?old=http://www.ex.com ABC http://www.newex.com?old=http://www.anotherex.com"
I thought about using replaceAll, but I dont know exactly how to reuse the regex pattern in the replacement.
something like
String processed = yourString.replaceAll([ugly url regexp],"http://www.newex.com?old=$0")
$0 being a reference to the main capture group of the regexp. see the documentation for Matcher.appendReplacement
for a worthy regexp, you can have your pick from here for example
I would go about this by doing something like:
List<String> allMatches = new ArrayList<String>();
Matcher m = Pattern.compile("regex here")
.matcher(StringHere);
while (m.find()) {
allMatches.add(m.group());
}
for(String myMatch : allMatches)
{
finalString = OriginalString.replace(myMatch, myNewString+myMatch);
}
I didn't test any of this, but it should give you an idea of how to approach it

Replace string with part of the matching regex

I have a long string. I want to replace all the matches with part of the matching regex (group).
For example:
String = "This is a great day, is it not? If there is something, THIS IS it. <b>is</b>".
I want to replace all the words "is" by, let's say, "<h1>is</h1>". The case should remain the same as original. So the final string I want is:
This <h1>is</h1> a great day, <h1>is</h1> it not? If there <h1>is</h1> something,
THIS <h1>IS</h1> it. <b><h1>is</h1></b>.
The regex I was trying:
Pattern pattern = Pattern.compile("[.>, ](is)[.<, ]", Pattern.CASE_INSENSITIVE);
The Matcher class is commonly used in conjunction with Pattern. Use the Matcher.replaceAll() method to replace all matches in the string
String str = "This is a great day...";
Pattern p = Pattern.compile("\\bis\\b", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(str);
String result = m.replaceAll("<h1>is</h1>");
Note: Using the \b regex command will match on a word boundary (like whitespace). This is helpful to use in order to ensure that only the word "is" is matched and not words that contain the letters "i" and "s" (like "island").
Like this:
str = str.replaceAll(yourRegex, "<h1>$1</h1>");
The $1 refers to the text captured by group #1 in your regex.
Michael's answer is better, but if you happen to specifically only want [.>, ] and [.<, ] as boundaries, you can do it like this:
String input = "This is a great day, is it not? If there is something, THIS IS it. <b>is</b>";
Pattern p = Pattern.compile("(?<=[.>, ])(is)(?=[.<, ])", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(input);
String result = m.replaceAll("<h1>$1</h1>");
yourStr.replaceAll("(?i)([.>, ])(is)([.<, ])","$1<h1>$2</h1>$3")
(?i)to indicate ignoring case; wrap everything your want to reuse with brackets, reuse them with $1 $2 and $3, concatenate them into what you want.
Simply use a backreference for that.
"This is a great day, is it not? If there is something, THIS IS it. <b>is</b>".replaceAll("[.>, ](is)[.<, ]", "<h1>$2</h1>"); should do.
It may be a late addition, but if anyone is looking for this like
Searching for 'thing' and also he needs 'Something' too to be taken as result,
Pattern p = Pattern.compile("([^ ])is([^ \.])");
String result = m.replaceAll("<\h1>$1is$2</h1>");
will result <\h1>Something</h1> too

Categories