One common usage for regex is the replacement of the matches with something that is based on the matches.
For example a commit-text with ticket numbers ABC-1234: some text (ABC-1234) has to be replaced with <ABC-1234>: some text (<ABC-1234>) (<> as example for some surroundings.)
This is very simple in Java
String message = "ABC-9913 - Bugfix: Some text. (ABC-9913)";
String finalMessage = message;
Matcher matcher = Pattern.compile("ABC-\\d+").matcher(message);
if (matcher.find()) {
String ticket = matcher.group();
finalMessage = finalMessage.replace(ticket, "<" + ticket + ">");
}
System.out.println(finalMessage);
results in<ABC-9913> - Bugfix: Some text. (<ABC-9913>).
But if there are different matches in the input String, this is different. I tried a slightly different code replacing if (matcher.find()) { with while (matcher.find()) {. The result is messed up with doubled replacements (<<ABC-9913>>).
How can I replace all matching values in an elegant way?
You can simply use replaceAll:
String input = "ABC-1234: some text (ABC-1234)";
System.out.println(input.replaceAll("ABC-\\d+", "<$0>"));
prints:
<ABC-1234>: some text (<ABC-1234>)
$0 is a reference to the matched string.
Java regex reference (see "Groups and capturing").
The problem is that the replace() method transforms the string over and over again.
A better way is to replace one match at a time. The matcher class has an appendReplacement-method for this.
String message = "ABC-9913, ABC-9915 - Bugfix: Some text. (ABC-9913,ABC-9915)";
Matcher matcher = Pattern.compile("ABC-\\d+").matcher(message);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
String ticket = matcher.group();
matcher.appendReplacement(sb, "<" + ticket + ">");
}
matcher.appendTail(sb);
System.out.println(sb);
Related
I'm trying to extract CANseIqFMnf from the URL https://www.instagram.com/p/CANseIqFMnf/ using regex in Android studio. Please help me to get a regex expression eligible for Android Studio.
Here is the code for my method:
String url = "https://www.instagram.com/p/CANseIqFMnf/";
String REGEX = "/p\//";
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(url);
boolean match = matcher.matches();
if (match){
Log.e("success", "start = " + matcher.start() + " end = " + matcher.end() );
}else{
Log.e("failed", "failed");
}
But it gives me failed in return!
Method 1
You just need to use replaceAll method in String, no need to compile a pattern and complicate things:
String input = "https://www.instagram.com/p/CANseIqFMnf/";
String output = input.replaceAll("https://www.instagram.com/p/", "").replaceAll("/", "");
Log.v(TAG, output);
Note that the first replaceAll is to remove the url and the second replaceAll is to remove any slashes /
Method 2
Pattern pattern = Pattern.compile("https://www.instagram.com/p/(.*?)/");
Matcher matcher = pattern.matcher("https://www.instagram.com/p/CANseIqFMnf/");
while(matcher.find()) {
System.out.println(matcher.group(1));
}
Note that if matcher.find() returns true then if you used modifiers like this in your REGEX (.*?) then the part found there will be in group(1), and group(0) will hold the entire regex match which is in your case the entire url.
Alternate option w/o regex can be implemented in a simpler manner as below using java.nio.file.Paths APIs
public class Url {
public static void main(String[] args) {
String url = "https://www.instagram.com/p/CANseIqFMnf/";
String name = java.nio.file.Paths.get(url).getFileName().toString();
System.out.println(name);
}
}
I would like to be able to find the first occurrence of m² and then numbers in front of it, could be integers or decimal numbers.
E.g.
"some text" 38 m² "some text" ,
"some text" 48,8 m² "some text",
"some text" 48 m² "some text", etc..
What I have so far is:
\d\d,\d\s*(\m\u00B2)|\d\d\s*(\m\u00B2)
This right now finds all occurrences, although I guess it could be fixed with findFirst(). Any ideas how to improve the Regex part?
To get the first match, you just need to use Matcher#find() inside an if block:
String rx = "\\d+(?:,\\d+)?\\s*m\\u00B2";
Pattern p = Pattern.compile(rx);
Matcher matcher = p.matcher("E.g. : 4668,68 m² some text, some text 48 m² etc");
if (matcher.find()){
System.out.println(matcher.group());
}
See IDEONE demo
Note that you can get rid of the alternation group using an optional non-capturing group (?:..)?
Pattern breakdown:
\d+ - 1+ digits
(?:,\d+)? - 0+ sequences of a comma followed with 1+ digits
\s* - 0+ whitespace symbols
m\u00B2 - m2.
This is what I came up with you help :) (work in progress, later it should return BigDecimal value), for now it seems to work:
public static String findArea(String description) {
String tempString = "";
Pattern p = Pattern.compile("\\d+(?:,\\d+)?\\s*m\\u00B2");
Matcher m = p.matcher(description);
if(m.find()) {
tempString = m.group();
}
//remove the m and /u00B2 to parse it to BigDecimal later
tempString = tempString.replaceAll("[^0-9|,]","");
System.out.println(tempString);
return tempString;
}
One simple way of doing it!
description.replaceFirst(#NotNull String regex,
#NotNull String replacement)
JAVADoc: Replaces the first substring of this string that matches the given regular expression with the given replacement.
To find only last one:
#Test
public void testFindFirstRegExp() {
String pattern = ".* (\\d+,\\d+) .*";
Pattern r = Pattern.compile(pattern);
String line = "some text 44,66 m² some 33,11 m² text 11,22 m² some text";
Matcher m = r.matcher(new StringBuilder(line).reverse().toString());
String expected = "44,66";
String actual = null;
if (m.find()) {
actual = new StringBuilder(m.group(1)).reverse().toString();
}
System.out.println("got first:" + actual);
Assert.assertEquals(expected, actual);
m = r.matcher(line);
expected = "11,22";
actual = null;
if (m.find()) {
actual = m.group(1);
}
System.out.println("got last:" + actual);
Assert.assertEquals(expected, actual);
}
prints:
got first:44,66
got last:11,22
Note: think that you need to reverse pattern when needed for ex:
pattern = ".* (\\d+,\\d+-?) .*"; //reverse for (-?\\d+,\\d+)
but this will work as waited:
pattern = " (\\-?\\d+,\\d+) ";
you get all of them in loop:
while (m.find()) {
actual = m.group(1);
System.out.println("got last:" + actual);
}
Will print:
got last:44,66
got last:33,11
got last:11,22
How can I replace this
String str = "KMMH12DE1433";
String pattern = "^[a-z]{2}([0-9]{2})[a-z]{1,2}([0-9]{4})$";
String str2 = str.replaceAll(pattern, "repl");
Log.e("Founded_words2",str2);
What I got: KMMH12DE1433
What I want: MH12DE1433
Try it like this using a proper java.util.regex.Pattern and a java.util.regex.Matcher:
String str = "KMMH12DE1433";
//Make the pattern, case-insensitive using (?i)
Pattern pattern = Pattern.compile("(?i)[a-z]{2}([0-9]{2})[a-z]{1,2}([0-9]{4})");
//Create the Matcher
Matcher m = pattern.matcher(str);
//Check if we find anything
if(m.find()) {
//Use what you found - with proper capturing groups you
//gain access to parts of your pattern as needed
System.out.println("Found this: " + m.group());
}
If you just want to remove the first two characters and if the first two characters will always be uppercase letters:
String str = "KMMH12DE1433";
String pattern = "^[A-Z]{2}";
String str2 = str.replaceAll(pattern, "");
Log.e("Output string: ", str2);
try this :
String a = "KMMH12DE1433";
String pattern = "^[A-Z]{2}";
String rs = a.replaceAll(pattern,"");
Please change like this
String ans=str.substring(0);
I would like to be able to find the first occurrence of m² and then numbers in front of it, could be integers or decimal numbers.
E.g.
"some text" 38 m² "some text" ,
"some text" 48,8 m² "some text",
"some text" 48 m² "some text", etc..
What I have so far is:
\d\d,\d\s*(\m\u00B2)|\d\d\s*(\m\u00B2)
This right now finds all occurrences, although I guess it could be fixed with findFirst(). Any ideas how to improve the Regex part?
To get the first match, you just need to use Matcher#find() inside an if block:
String rx = "\\d+(?:,\\d+)?\\s*m\\u00B2";
Pattern p = Pattern.compile(rx);
Matcher matcher = p.matcher("E.g. : 4668,68 m² some text, some text 48 m² etc");
if (matcher.find()){
System.out.println(matcher.group());
}
See IDEONE demo
Note that you can get rid of the alternation group using an optional non-capturing group (?:..)?
Pattern breakdown:
\d+ - 1+ digits
(?:,\d+)? - 0+ sequences of a comma followed with 1+ digits
\s* - 0+ whitespace symbols
m\u00B2 - m2.
This is what I came up with you help :) (work in progress, later it should return BigDecimal value), for now it seems to work:
public static String findArea(String description) {
String tempString = "";
Pattern p = Pattern.compile("\\d+(?:,\\d+)?\\s*m\\u00B2");
Matcher m = p.matcher(description);
if(m.find()) {
tempString = m.group();
}
//remove the m and /u00B2 to parse it to BigDecimal later
tempString = tempString.replaceAll("[^0-9|,]","");
System.out.println(tempString);
return tempString;
}
One simple way of doing it!
description.replaceFirst(#NotNull String regex,
#NotNull String replacement)
JAVADoc: Replaces the first substring of this string that matches the given regular expression with the given replacement.
To find only last one:
#Test
public void testFindFirstRegExp() {
String pattern = ".* (\\d+,\\d+) .*";
Pattern r = Pattern.compile(pattern);
String line = "some text 44,66 m² some 33,11 m² text 11,22 m² some text";
Matcher m = r.matcher(new StringBuilder(line).reverse().toString());
String expected = "44,66";
String actual = null;
if (m.find()) {
actual = new StringBuilder(m.group(1)).reverse().toString();
}
System.out.println("got first:" + actual);
Assert.assertEquals(expected, actual);
m = r.matcher(line);
expected = "11,22";
actual = null;
if (m.find()) {
actual = m.group(1);
}
System.out.println("got last:" + actual);
Assert.assertEquals(expected, actual);
}
prints:
got first:44,66
got last:11,22
Note: think that you need to reverse pattern when needed for ex:
pattern = ".* (\\d+,\\d+-?) .*"; //reverse for (-?\\d+,\\d+)
but this will work as waited:
pattern = " (\\-?\\d+,\\d+) ";
you get all of them in loop:
while (m.find()) {
actual = m.group(1);
System.out.println("got last:" + actual);
}
Will print:
got last:44,66
got last:33,11
got last:11,22
I am trying to use regex to remove nbsp; from my string . Following is the program.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class MyTest {
private static final StringBuffer testRegex =
new StringBuffer("<FONT style=\"BACKGROUND-COLOR: #ff6600\">Test</font></p><br><p>" +
"<FONT style=\"BACKGROUND-COLOR: #ff6600\">Test</font></p><br><p>" +
"<FONT style=\"BACKGROUND-COLOR: #ff6600\">Test</font>" +
"<BLOCKQUOTE style=\"MARGIN-RIGHT: 0px\" dir=ltr><br><p>Test</p><strong>" +
"<FONT color=#333333>TestTest</font></strong></p><br><p>Test</p></blockquote>" +
"<br><p>TestTest</p><br><BLOCKQUOTE style=\"MARGIN-RIGHT: 0px\" dir=ltr><br><p>" +
"<FONT style=\"BACKGROUND-COLOR: #ffcc66\">TestTestTestTestTest</font><br>" +
"<p>TestTestTestTest</p></blockquote><br><p>" +
"<FONT style=\"BACKGROUND-COLOR: #003333\">TestTestTest</font></p><p>" +
"<FONT style=\"BACKGROUND-COLOR: #003399\">TestTest</font></p><p> </p>");
//"This is test<P>Tag Tag</P>";
public static void main(String[] args) {
System.out.println("***Testing***");
String temp = checkRegex(testRegex);
System.out.println("***FINAL = "+temp);
}
private static String checkRegex(StringBuffer sample){
Pattern pattern = Pattern.compile("<[^>]+? [^<]+?>");
Matcher matcher = pattern.matcher(sample);
while (matcher.find()) {
int start = matcher.start();
int end = matcher.end();
String group = matcher.group();
System.out.println("start = "+start+" end = "+end+"" +"***GROUP = "+group);
String substring = sample.substring(start, end);
System.out.println(" Substring = "+substring);
String replacedSubString = substring.replaceAll(" "," ");
System.out.println("Replaced Substring = "+replacedSubString);
sample.replace(start, end, replacedSubString);
System.out.println(" NEW SAMPLE = "+sample);
}
System.out.println("********WHILE OVER ********");
return sample.toString();
}
}
I am getting java.lang.StringIndexOutOfBoundsException at line while (matcher.find()). I am currently using java Pattern and Matcher to find nbsp; and the replace it with " ". Does anyone know what causes this ? What should I do to remove the extra nbsp; from my string ?
Thanks
Use matcher.reset(); after sample.replace(start, end, replacedSubString);
This is because when you replace the string sample, the end would point to an invalid position.So,you need to use matcher.reset(); after every replace.
For example if start is 0 and end is 5 and when you replace with ,the end would point to an invalid position and then find method would throw a StringIndexOutOfBoundsException exception if end points to position outside the string length.
If string is huge,reset can cause a major performance bottleneck because reset would again start matching from beginning.You can instead use
matcher.region(start,sample.length());
This would start matching from the last matched position!
You need to create a new StringBuffer to hold the replaced string, then use appendReplacement(StringBuffer sb, String replacement) and appendTail(StringBuffer sb) methods in Matcher class to do the replacement. There is probably way to do this in-place, but the approach above is the most straight-forward way to do this.
This is your checkRegex method re-written:
private static String checkRegex(String inputString){
Pattern pattern = Pattern.compile("<[^>]+? [^<]+?>");
Matcher matcher = pattern.matcher(inputString);
// Create a new StringBuffer to hold the string after replacement
StringBuffer replacedString = new StringBuffer();
while (matcher.find()) {
// matcher.group() returns the substring that matches the whole regex
String substring = matcher.group();
System.out.println(" Substring = "+substring);
String replacedSubstring = substring.replaceAll(" "," ");
System.out.println("Replaced Substring = "+replacedSubstring);
// appendReplacement is a clean approach to append the text which comes
// before a match, and append the replacement text for the matched text
// Note that appendReplacement will interpret $ in the replacement string
// with special meaning (for referring to text matched by capturing group).
// Matcher.quoteReplacement is necessary to provide a literal string as
// replacement
matcher.appendReplacement(replacedString, Matcher.quoteReplacement(replacedSubstring));
System.out.println(" NEW SAMPLE = "+replacedString);
}
// appendTail is used to append the text after the last match to the
// replaced string.
matcher.appendTail(replacedString);
System.out.println("********WHILE OVER ********");
return replacedString.toString();
}
// change the group and it is source string is automatically updated
There is no way what so ever to change any string in Java, so what you're asking for is impossible.
To remove or replace a pattern with a string can be achieved with a call like
someString = someString.replaceAll(toReplace, replacement);
To transform the matched substring, as seems to be indicated by your line
m.group().replaceAll("something","");
The best solution is probably to use a StringBuffer for the result
Matcher.appendReplacement and Matcher.appendTail.
Example:
String regex = "ipsum";
String sourceString = "lorem ipsum dolor sit";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(sourceString);
StringBuffer sb = new StringBuffer();
while (m.find()) {
// For example: transform match to upper case
String replacement = m.group().toUpperCase();
m.appendReplacement(sb, replacement);
}
m.appendTail(sb);
sourceString = sb.toString();
System.out.println(sourceString); // "lorem IPSUM dolor sit"