Find the first occurrence with Regex - java

I would like to be able to find the first occurrence of m² and then numbers in front of it, could be integers or decimal numbers.
E.g.
"some text" 38 m² "some text" ,
"some text" 48,8 m² "some text",
"some text" 48 m² "some text", etc..
What I have so far is:
\d\d,\d\s*(\m\u00B2)|\d\d\s*(\m\u00B2)
This right now finds all occurrences, although I guess it could be fixed with findFirst(). Any ideas how to improve the Regex part?

To get the first match, you just need to use Matcher#find() inside an if block:
String rx = "\\d+(?:,\\d+)?\\s*m\\u00B2";
Pattern p = Pattern.compile(rx);
Matcher matcher = p.matcher("E.g. : 4668,68 m² some text, some text 48 m² etc");
if (matcher.find()){
System.out.println(matcher.group());
}
See IDEONE demo
Note that you can get rid of the alternation group using an optional non-capturing group (?:..)?
Pattern breakdown:
\d+ - 1+ digits
(?:,\d+)? - 0+ sequences of a comma followed with 1+ digits
\s* - 0+ whitespace symbols
m\u00B2 - m2.

This is what I came up with you help :) (work in progress, later it should return BigDecimal value), for now it seems to work:
public static String findArea(String description) {
String tempString = "";
Pattern p = Pattern.compile("\\d+(?:,\\d+)?\\s*m\\u00B2");
Matcher m = p.matcher(description);
if(m.find()) {
tempString = m.group();
}
//remove the m and /u00B2 to parse it to BigDecimal later
tempString = tempString.replaceAll("[^0-9|,]","");
System.out.println(tempString);
return tempString;
}

One simple way of doing it!
description.replaceFirst(#NotNull String regex,
#NotNull String replacement)
JAVADoc: Replaces the first substring of this string that matches the given regular expression with the given replacement.

To find only last one:
#Test
public void testFindFirstRegExp() {
String pattern = ".* (\\d+,\\d+) .*";
Pattern r = Pattern.compile(pattern);
String line = "some text 44,66 m² some 33,11 m² text 11,22 m² some text";
Matcher m = r.matcher(new StringBuilder(line).reverse().toString());
String expected = "44,66";
String actual = null;
if (m.find()) {
actual = new StringBuilder(m.group(1)).reverse().toString();
}
System.out.println("got first:" + actual);
Assert.assertEquals(expected, actual);
m = r.matcher(line);
expected = "11,22";
actual = null;
if (m.find()) {
actual = m.group(1);
}
System.out.println("got last:" + actual);
Assert.assertEquals(expected, actual);
}
prints:
got first:44,66
got last:11,22
Note: think that you need to reverse pattern when needed for ex:
pattern = ".* (\\d+,\\d+-?) .*"; //reverse for (-?\\d+,\\d+)
but this will work as waited:
pattern = " (\\-?\\d+,\\d+) ";
you get all of them in loop:
while (m.find()) {
actual = m.group(1);
System.out.println("got last:" + actual);
}
Will print:
got last:44,66
got last:33,11
got last:11,22

Related

Alternative to positive lookbehind when there are unknown number of spaces

My replacerRegex is
("schedulingCancelModal": \{\s*? "title": ")(.+?)(?=")
The right value is getting picked up, i.e. valueToBePicked:
But how do I get ("schedulingCancelModal": \{\s*? "title": ") not to be included in the result like positive lookbehind does?
My Java code so far:
Pattern replacerPattern = Pattern.compile(replacerRegex);
Matcher matcher = replacerPattern.matcher(value);
while (matcher.find()) {
String valueToBePicked = matcher.group();
}
You can simply select matcher.group(2) which will give you the contents of the second capture group. For example:
String replacerRegex = "(\"schedulingCancelModal\": \\{\\s*? \"title\": \")(.+?)(?=\")";
String value = "\"valueToBePicked\": \"schedulingCancelModal\": {\n \"title\": \"Are you sure you want to leave scheduling?\", ... }";
Pattern replacerPattern = Pattern.compile(replacerRegex);
Matcher matcher = replacerPattern.matcher(value);
while (matcher.find()) {
String valueToBePicked = matcher.group(2);
System.out.println(valueToBePicked);
}
Output:
Are you sure you want to leave scheduling?
Demo on rextester

Java Regex jumps to next match with if clause [duplicate]

I would like to be able to find the first occurrence of m² and then numbers in front of it, could be integers or decimal numbers.
E.g.
"some text" 38 m² "some text" ,
"some text" 48,8 m² "some text",
"some text" 48 m² "some text", etc..
What I have so far is:
\d\d,\d\s*(\m\u00B2)|\d\d\s*(\m\u00B2)
This right now finds all occurrences, although I guess it could be fixed with findFirst(). Any ideas how to improve the Regex part?
To get the first match, you just need to use Matcher#find() inside an if block:
String rx = "\\d+(?:,\\d+)?\\s*m\\u00B2";
Pattern p = Pattern.compile(rx);
Matcher matcher = p.matcher("E.g. : 4668,68 m² some text, some text 48 m² etc");
if (matcher.find()){
System.out.println(matcher.group());
}
See IDEONE demo
Note that you can get rid of the alternation group using an optional non-capturing group (?:..)?
Pattern breakdown:
\d+ - 1+ digits
(?:,\d+)? - 0+ sequences of a comma followed with 1+ digits
\s* - 0+ whitespace symbols
m\u00B2 - m2.
This is what I came up with you help :) (work in progress, later it should return BigDecimal value), for now it seems to work:
public static String findArea(String description) {
String tempString = "";
Pattern p = Pattern.compile("\\d+(?:,\\d+)?\\s*m\\u00B2");
Matcher m = p.matcher(description);
if(m.find()) {
tempString = m.group();
}
//remove the m and /u00B2 to parse it to BigDecimal later
tempString = tempString.replaceAll("[^0-9|,]","");
System.out.println(tempString);
return tempString;
}
One simple way of doing it!
description.replaceFirst(#NotNull String regex,
#NotNull String replacement)
JAVADoc: Replaces the first substring of this string that matches the given regular expression with the given replacement.
To find only last one:
#Test
public void testFindFirstRegExp() {
String pattern = ".* (\\d+,\\d+) .*";
Pattern r = Pattern.compile(pattern);
String line = "some text 44,66 m² some 33,11 m² text 11,22 m² some text";
Matcher m = r.matcher(new StringBuilder(line).reverse().toString());
String expected = "44,66";
String actual = null;
if (m.find()) {
actual = new StringBuilder(m.group(1)).reverse().toString();
}
System.out.println("got first:" + actual);
Assert.assertEquals(expected, actual);
m = r.matcher(line);
expected = "11,22";
actual = null;
if (m.find()) {
actual = m.group(1);
}
System.out.println("got last:" + actual);
Assert.assertEquals(expected, actual);
}
prints:
got first:44,66
got last:11,22
Note: think that you need to reverse pattern when needed for ex:
pattern = ".* (\\d+,\\d+-?) .*"; //reverse for (-?\\d+,\\d+)
but this will work as waited:
pattern = " (\\-?\\d+,\\d+) ";
you get all of them in loop:
while (m.find()) {
actual = m.group(1);
System.out.println("got last:" + actual);
}
Will print:
got last:44,66
got last:33,11
got last:11,22

How to find match for exact word using pattern matcher in java

I have shared my sample code here. here i am trying to find word "engine" with different strings. i used word boundary to match the words in string.
it matches word if it starts with #engine(example).
it should only match with exact word.
private void checkMatch() {
String source1 = "search engines has ";
String source2 = "search engine exact word";
String source3 = "enginecheck";
String source4 = "has hashtag #engine";
String key = "engine";
System.out.println(isContain(source1, key));
System.out.println(isContain(source2, key));
System.out.println(isContain(source3, key));
System.out.println(isContain(source4, key));
}
private boolean isContain(String source, String subItem) {
String pattern = "\\b" + subItem + "\\b";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(source);
return m.find();
}
**Expected output**
false
true
false
false
**actual output**
false
true
false
true
For this case, you have to use regex OR instead of word boundary. \\b matches between a word char and non-word char (vice-versa). So your regex should find a match in #engine since # is a non-word character.
private boolean isContain(String source, String subItem) {
String pattern = "(?m)(^|\\s)" + subItem + "(\\s|$)";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(source);
return m.find();
}
or
String pattern = "(?<!\\S)" + subItem + "(?!\\S)";
Change your pattern as below.
String pattern = "\\s" + subItem + "\\b";
If you are looking for a literal text enclosed with spaces or start/end of the string, you can split the string with a mere whitespace pattern like \s+ and check if any of the chunks equals the search text.
Java demo:
String s = "Can't start the #engine here, but this engine works";
String searchText = "engine";
boolean found = Arrays.stream(s.split("\\s+"))
.anyMatch(word -> word.equals(searchText));
System.out.println(found); // => true
Change the regexp to
String pattern = "\\s"+subItem + "\\s";
I'm using the
\s A whitespace character: [ \t\n\x0B\f\r]
For more info look into the java.util.regex.Pattern javadoc
Also if you want to support strings like these:
"has hashtag engine"
"engine"
You can improve it by adding the ending/starting line terminators (^ and $)
by using this pattern:
String pattern = "(^|\\s)"+subItem + "(\\s|$)";

Replace different Regex-Matches with Match-based results in Java

One common usage for regex is the replacement of the matches with something that is based on the matches.
For example a commit-text with ticket numbers ABC-1234: some text (ABC-1234) has to be replaced with <ABC-1234>: some text (<ABC-1234>) (<> as example for some surroundings.)
This is very simple in Java
String message = "ABC-9913 - Bugfix: Some text. (ABC-9913)";
String finalMessage = message;
Matcher matcher = Pattern.compile("ABC-\\d+").matcher(message);
if (matcher.find()) {
String ticket = matcher.group();
finalMessage = finalMessage.replace(ticket, "<" + ticket + ">");
}
System.out.println(finalMessage);
results in<ABC-9913> - Bugfix: Some text. (<ABC-9913>).
But if there are different matches in the input String, this is different. I tried a slightly different code replacing if (matcher.find()) { with while (matcher.find()) {. The result is messed up with doubled replacements (<<ABC-9913>>).
How can I replace all matching values in an elegant way?
You can simply use replaceAll:
String input = "ABC-1234: some text (ABC-1234)";
System.out.println(input.replaceAll("ABC-\\d+", "<$0>"));
prints:
<ABC-1234>: some text (<ABC-1234>)
$0 is a reference to the matched string.
Java regex reference (see "Groups and capturing").
The problem is that the replace() method transforms the string over and over again.
A better way is to replace one match at a time. The matcher class has an appendReplacement-method for this.
String message = "ABC-9913, ABC-9915 - Bugfix: Some text. (ABC-9913,ABC-9915)";
Matcher matcher = Pattern.compile("ABC-\\d+").matcher(message);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
String ticket = matcher.group();
matcher.appendReplacement(sb, "<" + ticket + ">");
}
matcher.appendTail(sb);
System.out.println(sb);

Java URL regex not matching

I am trying to count the number of URLs in a Java string:
String test = "This http://example.com is a sentence https://secure.whatever.org that contains 2 URLs.";
String urlRegex = "<\\b(https?|ftp|file)://[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|]>";
int numUrls = 0;
pattern = Pattern.compile(urlRegex);
matcher = pattern.matcher(test);
while(matcher.find())
numUrls++;
System.err.println("numUrls = " + numUrls);
When I run this it tells me I have zero (not 2) URLs in the string. Any ideas as to why? Thanks in advance!
The < and > characters in urlRegex are causing a mismatch between your pattern and your input test String. Removing them will yield a numUrls value of 2 as intended.
Try this code :
String data = "This http://example.com is a sentence https://secure.whatever.org that contains 2 URLs.";
Pattern pattern = Pattern.compile("[hH][tT]{2}[Pp][sS]?://(\\w+(\\.\\w+?)?)+");
Matcher matcher = pattern.matcher(data);
while (matcher.find()) {
System.out.println(matcher.group());
}
Hopefully it will work.

Categories