Replace string with part of the matching regex

Replace string with part of the matching regex - java

I have a long string. I want to replace all the matches with part of the matching regex (group).
For example:
String = "This is a great day, is it not? If there is something, THIS IS it. <b>is</b>".
I want to replace all the words "is" by, let's say, "<h1>is</h1>". The case should remain the same as original. So the final string I want is:
This <h1>is</h1> a great day, <h1>is</h1> it not? If there <h1>is</h1> something,
THIS <h1>IS</h1> it. <b><h1>is</h1></b>.
The regex I was trying:
Pattern pattern = Pattern.compile("[.>, ](is)[.<, ]", Pattern.CASE_INSENSITIVE);

The Matcher class is commonly used in conjunction with Pattern. Use the Matcher.replaceAll() method to replace all matches in the string
String str = "This is a great day...";
Pattern p = Pattern.compile("\\bis\\b", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(str);
String result = m.replaceAll("<h1>is</h1>");
Note: Using the \b regex command will match on a word boundary (like whitespace). This is helpful to use in order to ensure that only the word "is" is matched and not words that contain the letters "i" and "s" (like "island").

Like this:
str = str.replaceAll(yourRegex, "<h1>$1</h1>");
The $1 refers to the text captured by group #1 in your regex.

Michael's answer is better, but if you happen to specifically only want [.>, ] and [.<, ] as boundaries, you can do it like this:
String input = "This is a great day, is it not? If there is something, THIS IS it. <b>is</b>";
Pattern p = Pattern.compile("(?<=[.>, ])(is)(?=[.<, ])", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(input);
String result = m.replaceAll("<h1>$1</h1>");

yourStr.replaceAll("(?i)([.>, ])(is)([.<, ])","$1<h1>$2</h1>$3")
(?i)to indicate ignoring case; wrap everything your want to reuse with brackets, reuse them with $1 $2 and $3, concatenate them into what you want.

Simply use a backreference for that.
"This is a great day, is it not? If there is something, THIS IS it. <b>is</b>".replaceAll("[.>, ](is)[.<, ]", "<h1>$2</h1>"); should do.

It may be a late addition, but if anyone is looking for this like
Searching for 'thing' and also he needs 'Something' too to be taken as result,
Pattern p = Pattern.compile("([^ ])is([^ \.])");
String result = m.replaceAll("<\h1>$1is$2</h1>");
will result <\h1>Something</h1> too

Related

Java Pattern matcher not matching for HTTP response code [duplicate]

I have this small piece of code
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]"))
{
System.out.println(s);
}
}
Supposed to print
dkoe
but it prints nothing!!

Welcome to Java's misnamed .matches() method... It tries and matches ALL the input. Unfortunately, other languages have followed suit :(
If you want to see if the regex matches an input text, use a Pattern, a Matcher and the .find() method of the matcher:
Pattern p = Pattern.compile("[a-z]");
Matcher m = p.matcher(inputstring);
if (m.find())
// match
If what you want is indeed to see if an input only has lowercase letters, you can use .matches(), but you need to match one or more characters: append a + to your character class, as in [a-z]+. Or use ^[a-z]+$ and .find().

[a-z] matches a single char between a and z. So, if your string was just "d", for example, then it would have matched and been printed out.
You need to change your regex to [a-z]+ to match one or more chars.

String.matches returns whether the whole string matches the regex, not just any substring.

java's implementation of regexes try to match the whole string
that's different from perl regexes, which try to find a matching part
if you want to find a string with nothing but lower case characters, use the pattern [a-z]+
if you want to find a string containing at least one lower case character, use the pattern .*[a-z].*

Used
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]+"))
{
System.out.println(s);
}
}

I have faced the same problem once:
Pattern ptr = Pattern.compile("^[a-zA-Z][\\']?[a-zA-Z\\s]+$");
The above failed!
Pattern ptr = Pattern.compile("(^[a-zA-Z][\\']?[a-zA-Z\\s]+$)");
The above worked with pattern within ( and ).

Your regular expression [a-z] doesn't match dkoe since it only matches Strings of lenght 1. Use something like [a-z]+.

you must put at least a capture () in the pattern to match, and correct pattern like this:
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("(^[a-z]+$)"))
{
System.out.println(s);
}
}

You can make your pattern case insensitive by doing:
Pattern p = Pattern.compile("[a-z]+", Pattern.CASE_INSENSITIVE);

how to exclude "<" in regex match

I have a String which looks like "<name><address> and <Phone_1>". I have get to get the result like
1) <name>
2) <address>
3) <Phone_1>
I have tried using regex "<(.*)>" but it returns just one result.

The regex you want is
<([^<>]+?)><([^<>]+?)> and <([^<>]+?)>
Which will then spit out the stuff you want in the 3 capture groups. The full code would then look something like this:
Matcher m = Pattern.compile("<([^<>]+?)><([^<>]+?)> and <([^<>]+?)>").matcher(string);
if (m.find()) {
String name = m.group(1);
String address = m.group(2);
String phone = m.group(3);
}

The pattern .* in a regex is greedy. It will match as many characters as possible between the first < it finds and the last possible > it can find. In the case of your string it finds the first <, then looks for as much text as possible until a >, which it will find at the very end of the string.
You want a non-greedy or "lazy" pattern, which will match as few characters as possible. Simply <(.+?)>. The question mark is the syntax for non-greedy. See also this question.

This will work if you have dynamic number of groups.
Pattern p = Pattern.compile("(<\\w+>)");
Matcher m = p.matcher("<name><address> and <Phone_1>");
while (m.find()) {
System.out.println(m.group());
}

Java: Need to extract a number from a string

I have a string containing a number. Something like "Incident #492 - The Title Description".
I need to extract the number from this string.
Tried
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(theString);
String substring =m.group();
By getting an error
java.lang.IllegalStateException: No match found
What am I doing wrong?
What is the correct expression?
I'm sorry for such a simple question, but I searched a lot and still not found how to do this (maybe because it's too late here...)

You are getting this exception because you need to call find() on the matcher before accessing groups:
Matcher m = p.matcher(theString);
while (m.find()) {
String substring =m.group();
System.out.println(substring);
}
Demo.

There are two things wrong here:
The pattern you're using is not the most ideal for your scenario, it's only checking if a string only contains numbers. Also, since it doesn't contain a group expression, a call to group() is equivalent to calling group(0), which returns the entire string.
You need to be certain that the matcher has a match before you go calling a group.
Let's start with the regex. Here's what it looks like now.
Debuggex Demo
That will only ever match a string that contains all numbers in it. What you care about is specifically the number in that string, so you want an expression that:
Doesn't care about what's in front of it
Doesn't care about what's after it
Only matches on one occurrence of numbers, and captures it in a group
To that, you'd use this expression:
.*?(\\d+).*
Debuggex Demo
The last part is to ensure that the matcher can find a match, and that it gets the correct group. That's accomplished by this:
if (m.matches()) {
String substring = m.group(1);
System.out.println(substring);
}
All together now:
Pattern p = Pattern.compile(".*?(\\d+).*");
final String theString = "Incident #492 - The Title Description";
Matcher m = p.matcher(theString);
if (m.matches()) {
String substring = m.group(1);
System.out.println(substring);
}

You need to invoke one of the Matcher methods, like find, matches or lookingAt to actually run the match.

java regular expression word without ending with dot

I need to print the simple bind variable names in the SQL query.
I need to print the words starting with : character But NOT ending with dot . character.
in this sample I need to print pOrg, pBusinessId but NOT the parameter.
The regular expression ="(:)(\\w+)^\\." is not working.
Could you help in correcting the regular expression.
Thanks
Peddi
public void testMethod(){
String regEx="(:)(\\w+)([^\\.])";
String input= "(origin_table like 'I%' or (origin_table like 'S%' and process_status =5))and header_id = NVL( :parameter.number1:NULL, header_id) and (orginization = :pOrg) and (businsess_unit = :pBusinessId";
Pattern pattern;
Matcher matcher;
pattern = Pattern.compile(regEx);
matcher = pattern.matcher(input);
String grp = null;
while(matcher.find()){
grp = matcher.group(2);
System.out.println(grp);
}
}

You can try with something like
String regEx = "(:)(\\w+)\\b(?![.])";
(:)(\\w+)\\b will make sure that you are matching only entire words starting with :
(?![.]) is look behind mechanism which makes sure that after found word there is no .
This regex will also allow :NULL so if there is some reason why it shouldn't be matched share it with us.
Anyway to exclude NULL from results you can use
String regEx = "(:)(\\w+)\\b(?![.])(?<!:NULL)";
To make regex case insensitive so NULL could also match null compile this pattern with Pattern.CASE_INSENSITIVE flag like
Pattern pattern = Pattern.compile(regEx,Pattern.CASE_INSENSITIVE);

Since it looks like you're using camelcase, you can actually simplify things a bit when it comes to excluding :NULL:
:([a-z][\\w]+)\\b(?!\\.)
And $1 will return your variable names.
Alternative that doesn't rely on negative lookahead:
:([a-z][\\w]+)\\b(?:[^\\.]|$)

You can try:
Pattern regex = Pattern.compile("^:.*?[^.]$");
Demo

Getting specific portion of string from a Matcher

I'd like to get a portion of a matched string coming from a Matcher, like this:
Pattern pat = Pattern.compile("a.*l.*z");
Matcher match = pat.matcher("abcdlmnoz"); // I'd want to get bcd AND mno
ArrayList<String> values = match.magic(); //here is where your magic happens =)
ArrayList<String> is only for this example, I could be happy to recieve either a List or individual String items. The best would be what.htaccess files and RewriteRule's do:
RewriteRule (.*)/path?(.*) $1/$2/modified-path/
Well, putting those (.*) into $arguments would be as cool as an ArrayList or accessing String separately. I've been looking for something at Java Matcher API, but I didn't happen to see anything useful inside.
Thanks in advance, guys.

You can capture groups in a regexp match using (_):
Pattern pat = Pattern.compile("a(.*)l(.*)z");
boolean b = match.matches(); // don't forget to attempt the match
Then use match.group(n) to get that portion of the capture. The groups are stored in the match object.
Capturing GroupsOracle

Look at the matcher's "group" method and peruse the doc you linked to for references to groups, which is what the parentheses in the regex do :)

...
String testStr = "abcdlmnoz";
String myRE = "a(.*)l(.*)z";
Pattern myRECompiled = Pattern.compile (myRE,
DOTALL);
Matcher myMatcher = myRECompiled.matcher (testStr);
myMatcher.find ();
System.out.println (myMatcher.group (1));
System.out.println (myMatcher.group (2));
...

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Replace string with part of the matching regex - java

Like this: str = str.replaceAll(yourRegex, "<h1>$1</h1>"); The $1 refers to the text captured by group #1 in your regex.

yourStr.replaceAll("(?i)([.>, ])(is)([.<, ])","$1<h1>$2</h1>$3") (?i)to indicate ignoring case; wrap everything your want to reuse with brackets, reuse them with $1 $2 and $3, concatenate them into what you want.

Simply use a backreference for that. "This is a great day, is it not? If there is something, THIS IS it. <b>is</b>".replaceAll("[.>, ](is)[.<, ]", "<h1>$2</h1>"); should do.

It may be a late addition, but if anyone is looking for this like Searching for 'thing' and also he needs 'Something' too to be taken as result, Pattern p = Pattern.compile("([^ ])is([^ \.])"); String result = m.replaceAll("<\h1>$1is$2</h1>"); will result <\h1>Something</h1> too

Related

Java Pattern matcher not matching for HTTP response code [duplicate]

how to exclude "<" in regex match

Java: Need to extract a number from a string

java regular expression word without ending with dot

Getting specific portion of string from a Matcher

Categories

Resources