Index of each matcher group of a pattern in Java - java

I am matching certain contents of a file against a regex and getting groups out of it. How can I get the start and the end positions of each matched group?
Need the positions to replace those parts
Any suggestions please ?

You're looking for methods m.start(int groupId) and m.end(int groupId)
Java Docs:
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#start(int)
In this case I would consider using named capture groups (?<GROUP-NAME>YOUR_REGEX) and methods m.start("GROUP-NAME") and m.end("GROUP-NAME"). This way when you change your input text or add/remove some groups, your group names are staying the same. :)

The following code prints the text matching the regular expression and the start and end position within the text:
String text = "a long text regex to match";
Matcher m = Pattern.compile("regex").matcher(text);
while (m.find()){
String found = m.group();
System.out.println(found + " " + m.start() + " " + m.end());
}

You can directly replace your desired content with replaceAll function:
This method replaces each substring of this string that matches the given regular expression with the given replacement.
Then, you can use it like:
replaceAll("[0-9]", "X");
Hope it helps you!

Related

java regex- get specific index where not have a specific word before

Im trying to add the double quote on a xml string only on specific place.
Here an example of xml content
<opr:sec name=display>
<opr:fld name=fieldName>Value1</opr:fld>
<opr:fld name=someName>value2</opr:fld>
I need to add double quote like : name="fieldName" and the field names are different each line.
The first double quote are simple using the name= that need to be before
But for the closing double quote i think to use the > sign, but need to avoid the fld at end.
How i regex a letter that don't have a specific text before
Here is a simpler way to do what you want.
Use this regex :
name=([^>]*)>
And replace it by :
name="$1">
You can use capturing blocks, split your line into 3 blocks and reconstruct it from the pieces:
String line = "<opr:fld name=fieldName>Value1</opr:fld>";
String regex = "(.*name=)(.*)(>.*>)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(line);
matcher.matches();
String result = matcher.group(1) + "\"" + matcher.group(2) + "\"" + matcher.group(3);
System.out.println(result);

split string based on text qualifier regex java

I want to split a string based on text qualifier for example
"1","10411721","MikeTison","08/11/2009","21/11/2009","2800.00","002934538","051","New York","10411720-002",".\Images\b.jpg",".\RTF\b.rtf"
Qualifer="
Spliter = ,
I want to split string based on Spliter , but if Spliter comes inside qualifier " than ignore it and return string including Spliter .
Regular expression i am using is (?:|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)
but this regular expression only returns commas,please help me in this perspective as i am new to regular expressions
please note that if we have newline characters in string ie \r\n than it should ignore newline character
"1","10411","Muis","a","21/11/2009","2800.06","0029683778","03005136851","Awan","10411720-001",".\Images\a.jpg",".\RTF\a.rtf"
"2","08/10/2009","07:32","Call","On-Net","030092343242342376543","Monk","00:00","1.500","0.000","10.000","0.200"
"2","08/10/2009","02:50","Call","Off-Net","030092343242342376543","Une","08:00","1.500","2.000","20.000","3.500"
"2","09/10/2009","03:55","SMS","On-Net","030092343242342376543","Mink","00:00","1.500","0.000","5.000","100.500"
"2","09/10/2009","12:30","Call","Off-Net","030092343242342376543","Zog","01:01","3.500","3.000","70.000","6.500"
"2","09/10/2009","09:11","Call","On-Net","030092343242342376543","Monk","02:30","2.00","2.000","90.000","4.000"
Probably easiest solution is not searching for place to split, but finding elements which you want to return. In your case these elements
starts "
ends with "
have no " inside.
So you try with something like
String data = "\"1\",\"10411721\",\"MikeTison\",\"08/11/2009\",\"21/11/2009\",\"2800.00\",\"002934538\",\"051\",\"New York\",\"10411720-002\",\".\\Images\\b.jpg\",\".\\RTF\\b.rtf\"";
Pattern p = Pattern.compile("\"([^\"]+)\"");
Matcher m = p.matcher(data);
while(m.find()){
System.out.println(m.group(1));
}
Output:
1
10411721
MikeTison
08/11/2009
21/11/2009
2800.00
002934538
051
New York
10411720-002
.\Images\b.jpg
.\RTF\b.rtf
You can split using this regex:
String[] arr = input.split( "(?=(([^\"]*\"){2})*[^\"]*$),+" );
This regex will split on commas if those are outside double quotes by using a lookahead to make sure there are even number of quotes after a comma.
Remove the first and the last character of the whole string. Then split with ","
String test = "\"1\",\"10411721\",\"MikeTison\",\"08/11/2009\",\"21/11/2009\",\"2800.00\",\"002934538\",\"051\",\"New York\",\"10411720-002\",\".\\Images\\b.jpg\",\".\\RTF\\b.rtf\"";
if (test.length() > 0)
test = test.substring(1, test.length()-1);
System.out.println(Arrays.toString(test.split("\",\"")));
This works even if you have new line character..try it out
String str="\"1\",\"10411721\",\"MikeTison\",\"08/11/2009\",\"21/11/2009\",\"2800.00\",\"002934538\",\"051\",\"New York\",\"10411720-002\",\".\\Images\\b.jpg\",\".\\RTF\\b.rtf\"";
System.out.println(Arrays.toString(str.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)")));

Java - Regex Match Multiple Words

Lets say that you want to match a string with the following regex:
".when is (\w+)." - I am trying to get the event after 'when is'
I can get the event with matcher.group(index) but this doesnt work if the event is like Veteran's Day since it is two words. I am only able to get the first word after 'when is'
What regex should I use to get all of the words after 'when is'
Also, lets say I want to capture someones bday like
'when is * birthday
How do I capture all of the text between is and birthday with regex?
You could try this:
^when is (.*)$
This will find a string that starts with when is and capture everything else to the end of the line.
The regex will return one group. You can access it like so:
String line = "when is Veteran's Day.";
Pattern pattern = Pattern.compile("^when is (.*)$");
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
System.out.println("group 1: " + matcher.group(1));
System.out.println("group 2: " + matcher.group(2));
}
And the output should be:
group 1: when is Veteran's Day.
group 2: Veteran's Day.
If you want to allow whitespace to be matched, you should explicitly allow whitespace.
([\w\s]+)
However, roydukkey's solution will work if you want to capture everything after when is.
Don't use regular expressions when you don't need to!! Although the theory of regular expressions is beautiful in the thought that you can have a string do code operations for you, it is very memory inefficient for simple use cases.
If you are trying to get the word after "when is" ending by a space, you could do something like this:
String start = "when is ";
String end = " ";
int startLocation = fullString.indexOf(start) + start.length();
String afterStart = fullString.substring(startLocation, fullString.length());
String word = afterStart.substring(0, afterStart.indexOf(end));
If you know the last word is Day, you can just make end = "Day" and add the length of that string of where to end the second substring.
You can express this as a character class and include spaces in it: when is ([\w ]+).
\w only includes word characters, which doesn't include spaces. Use [\w ]+ instead.

Regular expression help in java

I am lost when it comes to building regex strings. I need a regular expression that does the following.
I have the following strings:
[~class:obj]
[~class|class2|more classes:obj]
[!class:obj]
[!class|class2|more classes:obj]
[?method:class]
[text]
A string can have multiple of whats above. Example string would be "[if] [!class:obj]"
I want to know what is in between the [] and broken into match groups. For example, the first match group would be the symbol if present (~|!|?) next what is before the : so that could be class or class|class2|etc... then what is on the right of the : and stop before the ]. There may be no : and what goes before it, but just something between the [].
So, how would I go about writing this regex? And is it possible to give the match group names so I know what it matched?
This is for a java project.
If you're sure enough of your inputs, you can probably use something like /\[(\~|\!|\?)?(?:((?:[^:\]]*?)+):)?([^\]]+?)\]/. (to translate that into Java, you'll want to escape the backslashes and use quotation marks instead of forward slashes)
Here are some web sites that might be helpful:
http://www.cis.upenn.edu/~matuszek/General/RegexTester/regex-tester.html
http://txt2re.com/index.php3?s=Test+test+june+2011+test&submit=Show+Matches
http://www.regexplanet.com/simple/
I believe that this should work:
/[(.*?)(?:\|(.*?))*]/
Also:
[a-z]*
Try this code
final Pattern
outerP = Pattern.compile("\\[.*?\\]"),
innerP = Pattern.compile("\\[([~!?]?)([^:]*):?(.*)\\]");
for (String s : asList(
"[~class:obj]",
"[if][~class:obj]",
"[~class|class2|more classes:obj]",
"[!class:obj]",
"[!class|class2|more classes:obj]",
"[?method:class]",
"[text]"))
{
final Matcher outerM = outerP.matcher(s);
System.out.println("Input: " + s);
while (outerM.find()) {
final Matcher m = innerP.matcher(outerM.group());
if (m.matches()) System.out.println(
m.group(1) + ";" + m.group(2) + ";" + m.group(3));
else System.out.println("No match");
}
}

Help in writing a Regular expression for a string

Hi please help me out in getting regular expression for the
following requirement
I have string type as
String vStr = "Every 1 nature(s) - Universe: (Air,Earth,Water sea,Fire)";
String sStr = "Every 1 form(s) - Earth: (Air,Fire) ";
from these strings after using regex I need to get values as "Air,Earth,Water sea,Fire" and "Air,Fire"
that means after
String vStrRegex ="Air,Earth,Water sea,Fire";
String sStrRegex ="Air,Fire";
All the strings that are input will be seperated by ":" and values needed are inside brackets always
Thanks
The regular expression would be something like this:
: \((.*?)\)
Spelt out:
Pattern p = Pattern.compile(": \\((.*?)\\)");
Matcher m = p.matcher(vStr);
// ...
String result = m.group(1);
This will capture the content of the parentheses as the first capture group.
Try the following:
\((.*)\)\s*$
The ending $ is important, otherwise you'll accidentally match the "(s)".
If you have each string separately, try this expression: \(([^\(]*)\)\s*$
This would get you the content of the last pair of brackets, as group 1.
If the strings are concatenated by : try to split them first.
Ask yourself if you really need a regex. Does the text you need always appear within the last two parentheses? If so, you can keep it simple and use substring instead:
String vStr = "Every 1 nature(s) - Universe: (Air,Earth,Water sea,Fire)";
int lastOpeningParens = vStr.lastIndexOf('(');
int lastClosingParens = vStr.lastIndexOf(')');
String text = vStr.substring(lastOpeningParens + 1, lastClosingParens);
This is much more readable than a regex.
I assume that there are only whitespace characters between : and the opening bracket (:
Pattern regex = Pattern.compile(":\\s+\\((.+)\\)");
You'll find your results in capturing group 1.
Try this regex:
.*\((.*)\)
$1 will contain the required string

Categories