Get the index of a pattern in a string using regex - java

I want to search a string for a specific pattern.
Do the regular expression classes provide the positions (indexes within the string) of the pattern within the string?
There can be more that 1 occurences of the pattern.
Any practical example?

Use Matcher:
public static void printMatches(String text, String regex) {
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
// Check all occurrences
while (matcher.find()) {
System.out.print("Start index: " + matcher.start());
System.out.print(" End index: " + matcher.end());
System.out.println(" Found: " + matcher.group());
}
}

special edition answer from Jean Logeart
public static int[] regExIndex(String pattern, String text, Integer fromIndex){
Matcher matcher = Pattern.compile(pattern).matcher(text);
if ( ( fromIndex != null && matcher.find(fromIndex) ) || matcher.find()) {
return new int[]{matcher.start(), matcher.end()};
}
return new int[]{-1, -1};
}

import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches
{
public static void main( String args[] ){
// String to be scanned to find the pattern.
String line = "This order was places for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
System.out.println("Found value: " + m.group(1) );
System.out.println("Found value: " + m.group(2) );
} else {
System.out.println("NO MATCH");
}
}
}
Result
Found value: This order was places for QT3000! OK?
Found value: This order was places for QT300
Found value: 0

Related

Check a string contains at least one Unicode letter using regex

I want such a validation that My String must be contains at least one Unicode letter. The character that will evaluate Character.isLetter() to true.
for example i want
~!##$%^&*(()_+<>?:"{}|\][;'./-=` : false
~~1_~ : true
~~k_~ : true
~~汉_~ : true
I know i can use for-loop with Character.isLetter(), but i just don't want to do it.
And This is totally different from this since it only checks for the English alphabets, but in my case is about one unicode letter. It's not a same at all.
You can try to use this regex "\\p{L}|[0-9]"
To better understand Unicode in Regex read this.
Usage code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String args[]) {
// String to be scanned to find the pattern.
String line = "~!##$%^&*(()_+<>?:\"{}|\\][;'./-=`";
String pattern = "\\p{L}|[0-9]"; // regex needed
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
System.out.print("String \"" + line + "\" results to ");
if (m.find()) {
System.out.println("TRUE -> Found value: " + m.group(0));
} else {
System.out.println("FALSE");
}
line = "~~1_~";
m = r.matcher(line);
System.out.print("String \"" + line + "\" results to ");
if (m.find()) {
System.out.println("TRUE -> Found value: " + m.group(0));
} else {
System.out.println("FALSE");
}
line = "~~k_~";
m = r.matcher(line);
System.out.print("String \"" + line + "\" results to ");
if (m.find()) {
System.out.println("TRUE -> Found value: " + m.group(0));
} else {
System.out.println("FALSE");
}
line = "~~汉_~";
m = r.matcher(line);
System.out.print("String \"" + line + "\" results to ");
if (m.find()) {
System.out.println("TRUE -> Found value: " + m.group(0));
} else {
System.out.println("FALSE");
}
}
}
Result:
String "~!##$%^&*(()_+<>?:"{}|\][;'./-=`" results to FALSE
String "~~1_~" results to TRUE
String "~~k_~" results to TRUE -> Found value: k
String "~~汉_~" results to TRUE -> Found value: 汉

parse filename using string [java]

What regex pattern do I need to parse a filename like this: "Ab12_Cd9023-2000-12-04-No234.nekiRtt3434GGG", where the parsed elements are: "Ab12_Cd9023"(name), "2000"(year), "12"(month), "04"(day), "234"(number), "nekiRtt3434GGG"(suffix). The sequence is always the same: name-yyyy-MM-dd-NoNN.suffix.
I want to use the pattern + matcher objects to solve that.
This is the most nice looking solution that I found:
private static final Pattern PATTERN = Pattern.compile("^(?<name>\\w+)-"
+ "(?<year>\\d{4})-"
+ "(?<month>\\d{2})-"
+ "(?<day>\\d{2})-"
+ "No(?<number>\\d+)."
+ "(?<suffix>\\w+)$");
Matcher m = PATTERN.matcher(file.getName());
if(!m.matches())
//some code if the pattern doesnt match
//this is how you acces the parsed strings:
m.group("year")
This regex should do the trick:
([a-bA-B0-9_])-([0-9]{4})-([0-9]{2})-([04]{2})-No(.+)\.(.+)$
If you use this as pattern, each of the () signifies one part of the string you want to capture.
This could work:
private static final Pattern PATTERN = Pattern.compile("^(.+)-([0-9]{4})-([0-9]{2})-([0-9]{2})-No(.+)\.(.+)$");
...
Matcher matcher = PATTERN.matcher(string);
if (matcher.matches()) {
String name = matcher.group(1);
int year = Integer.parseInt(matcher.group(2));
int month = Integer.parseInt(matcher.group(3));
int day = Integer.parseInt(matcher.group(4));
String number = matcher.group(5);
String suffix = matcher.group(6);
System.out.println("name: " + name);
System.out.println("year: " + year);
System.out.println("month: " + month);
System.out.println("day: " + day);
System.out.println("number: " + number);
System.out.println("suffix: " + suffix);
} else {
// error: does not match
}
If the sequence is always the same why not simply split it using - or . like this:
String filename = "Ab12_Cd9023-2000-12-04-No234.nekiRtt3434GGG";
String[] parts = filename.split("-|\\.");
for(String p : parts)
System.out.println(p);

How to extract a number from string using regex in java

I have tried using
title.substring(title.lastIndexOf("(") + 1, title.indexOf(")"));
I only want to extract year like 1899.
It works well for string like "hadoop (1899)" but is throwing errors for string "hadoop(yarn)(1980)"
Simply replace all but the digits within parenthesis with a regex
String foo = "hadoop (1899)"; // or "hadoop(yarn)(1980)"
System.out.println(foo.replaceAll(".*\\((\\d+)\\).*", "$1"));
Hi check this example. This is regex for extracting numbers surrounded by brackets.
Here is usable code you can use:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "(?<=\\()\\d+(?=\\))";
final String string = "\"hadoop (1899)\" \"hadoop(yarn)(1980)\"";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}

A regular expression to match zero or one occurrence of a word in Java

I have written a regex to match following pattern:
Any characters followed by hyphen followed by number followed by space followed by an optional case insensitive keyword followed by space followed by any char.
E.g.,
TXT-234 #comment anychars
TXT-234 anychars
The regular expression I have written is as follows:
(?<issueKey>^((\\s*[a-zA-Z]+-\\d+)\\s+)+)((?i)?<keyWord>#comment)?\\s+(?<comment>.*)
But the above doesn't capture the zero occurrence of '#comment', even though I have specified the '?' for the regular expression. The case 2 in the above example always fails and the case 1 succeeds.
What am I doing wrong?
#comment won't match #keyword. That is why you don't have a match try. This one it should work:
([a-zA-Z]*-\\d*\\s(((?i)#comment|#transition|#keyword)+\\s)?[a-zA-Z]*)
This may help;
String str = "1. TXT-234 #comment anychars";
String str2 = "2. TXT-234 anychars";
String str3 = "3. TXT-2a34 anychars";
String str4 = "4. TXT.234 anychars";
Pattern pattern = Pattern.compile("([a-zA-Z]*-\\d*\\s(#[a-zA-Z]+\\s)?[a-zA-Z]*)");
Matcher m = pattern.matcher(str);
if (m.find()) {
System.out.println("Found value: " + m.group(0));
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
}
m = pattern.matcher(str2);
if (m.find()) {
System.out.println("Found value: " + m.group(0));
}
m = pattern.matcher(str3);
if (m.find()) {
System.out.println("Found value: " + m.group(0));
} else {
System.out.println("str3 not match");
}
m = pattern.matcher(str4);
if (m.find()) {
System.out.println("Found value: " + m.group(0));
} else {
System.out.println("str4 not match");
}

RegEx in java to replace a String

I've been trying to replace this mathematical function x^2*sqrt(x^3) to this pow(x,2)*Math.sqrt(pow(x,3))
so this is the regex
/([0-9a-zA-Z\.\(\)]*)^([0-9a-zA-Z\.\(\)]*)/ pow(\1,\2)
it works in ruby, but I can't find a way to do it in java, I tried this method
String function= "x^2*sqrt(x^3)";
Pattern p = Pattern.compile("([a-z0-9]*)^([a-z0-9]*)");
Matcher m = p.matcher(function);
String out = function;
if(m.find())
{
System.out.println("GRUPO 0:" + m.group(0));
System.out.println("GRUPO 1:" + m.group(1));
out = m.replaceFirst("pow(" + m.group(0) + ", " + m.group(1) + ')');
}
String funcformat = out;
funcformat = funcformat.replaceAll("sqrt\\(([^)]*)\\)", "Math.sqrt($1)");
System.out.println("Return Value :"+ funcion );
System.out.print("Return Value :"+ funcformat );
But still doesn´t work, the output is: pow(x, )^2*Math.sqrt(x^3) as I said before it should be pow(x,2)*Math.sqrt(pow(x,3)).
Thank you!!
As others have commented, regex is not the way to go. You should use a parser. But if you want some quick and dirty:
From Matcher:
Capturing groups are indexed from left to right, starting at one.
Group zero denotes the entire pattern, so the expression m.group(0)
is equivalent to m.group().
So you need to use m.group(1) and m.group(2). And escape the caret ^ in your regex.
import java.util.regex.*;
public class Replace {
public static void main(String[] args) {
String function= "x^2*sqrt(3x)";
Pattern p = Pattern.compile("([a-z0-9]*)\\^([0-9]*)");
Matcher m = p.matcher(function);
String out = function;
if (m.find()) {
System.out.println("GRUPO 0:" + m.group(1));
System.out.println("GRUPO 1:" + m.group(2));
out = m.replaceFirst("pow(" + m.group(1) + ", " + m.group(2) + ')');
}
String funcformat = out;
funcformat = funcformat.replaceAll("sqrt\\(([a-z0-9]*)\\^([0-9]*)]*\\)", "Math.sqrt(pow($1, $2))");
System.out.println("Return Value :"+ function );
System.out.print("Return Value :"+ funcformat );
}
}

Categories