Multiple group match in regex - java

I have an input string
invalidsufix\nsubadatax\nsufixpart\nsubdata1\nsomerandomn\nsubdata2\nsubdatan\nend
I want to fetch only the subdata part of it, I tried,
Pattern p = Pattern.compile('(?<=sufixpart).*?(subdata.)+.*?(?=end)',Pattern.DOTALL);
Matcher m = p.matcher(inputString);
while(m.find()){
System.out.println(m.group(1));
}
But I get only the first match. How can i get all the subdata, such as [subdata1,subdata2,subdata3]?

I'd go for a simpler approach, get the blocks first with a regex like start(.*?)end and then extract all the matches from Group 1 with a mere subdata\S*-like regex.
See the Java demo:
String rx = "(?sm)^sufixpart$(.*?)^end$";
String s = "invalidsufix\nsubadatax\nsufixpart\nsubdata1\nsomerandomn\nsubdata2\nsubdatan\nend\ninvalidsufix\nsubadatax\nsufixpart\nsubdata001\nsomerandomn\nsubdata002\nsubdata00n\nend";
Pattern pattern_outer = Pattern.compile(rx);
Pattern pattern_token = Pattern.compile("(?m)^subdata\\S*$");
Matcher matcher = pattern_outer.matcher(s);
List<List<String>> res = new ArrayList<>();
while (matcher.find()){
List<String> lst = new ArrayList<>();
if (!matcher.group(1).isEmpty()) { // If Group 1 is not empty
Matcher m = pattern_token.matcher(matcher.group(1)); // Init the second matcher
while (m.find()) { // If a token is found
lst.add(m.group(0)); // add it to the list
}
}
res.add(lst); // Add the list to the result list
}
System.out.println(res); // => [[subdata1, subdata2, subdatan], [subdata001, subdata002, subdata00n]]
Another approach is to use a \G based regex:
(?sm)(?:\G(?!\A)|^sufixpart$)(?:(?!^(?:sufixpart|end)$).)*?(subdata\S*)(?=.*?^end$)
See the regex demo
Explanation:
(?sm) - enables DOTALL and MULTILINE modes
(?:\G(?!\A)|^sufixpart$) - matches either the end of the previous successful match (\G(?!\A)) or a whole line with sufixpart text on it (^sufixpart$)
(?:(?!^(?:sufixpart|end)$).)*? - matches any single char that is not the starting point of a sufixpart or end that are whole lines
(subdata\S*) - Group 1 matching subdata and 0+ non-whitespaces
(?=.*?^end$) - there must be a end line after any 0+ chars.
Java demo:
String rx = "(?sm)(\\G(?!\\A)|^sufixpart$)(?:(?!^(?:sufixpart|end)$).)*?(subdata\\S*)(?=.*?^end$)";
String s = "invalidsufix\nsubadatax\nsufixpart\nsubdata1\nsomerandomn\nsubdata2\nsubdatan\nend\ninvalidsufix\nsubadatax\nsufixpart\nsubdata001\nsomerandomn\nsubdata002\nsubdata00n\nend";
Pattern pattern = Pattern.compile(rx);
Matcher matcher = pattern.matcher(s);
List<List<String>> res = new ArrayList<>();
List<String> lst = null;
while (matcher.find()){
if (!matcher.group(1).isEmpty()) {
if (lst != null) res.add(lst);
lst = new ArrayList<>();
lst.add(matcher.group(2));
} else lst.add(matcher.group(2));
}
if (lst != null) res.add(lst);
System.out.println(res);

Related

Match everything after and before something regex Java

Here is my code:
String stringToSearch = "https://example.com/excludethis123456/moretext";
Pattern p = Pattern.compile("(?<=.com\\/excludethis).*\\/"); //search for this pattern
Matcher m = p.matcher(stringToSearch); //match pattern in StringToSearch
String store= "";
// print match and store match in String Store
if (m.find())
{
String theGroup = m.group(0);
System.out.format("'%s'\n", theGroup);
store = theGroup;
}
//repeat the process
Pattern p1 = Pattern.compile("(.*)[^\\/]");
Matcher m1 = p1.matcher(store);
if (m1.find())
{
String theGroup = m1.group(0);
System.out.format("'%s'\n", theGroup);
}
I want to to match everything that is after excludethis and before a / that comes after.
With "(?<=.com\\/excludethis).*\\/" regex I will match 123456/ and store that in String store. After that with "(.*)[^\\/]" I will exclude / and get 123456.
Can I do this in one line, i.e combine these two regex? I can't figure out how to combine them.
Just like you have used a positive look behind, you can use a positive look ahead and change your regex to this,
(?<=.com/excludethis).*(?=/)
Also, in Java you don't need to escape /
Your modified code,
String stringToSearch = "https://example.com/excludethis123456/moretext";
Pattern p = Pattern.compile("(?<=.com/excludethis).*(?=/)"); // search for this pattern
Matcher m = p.matcher(stringToSearch); // match pattern in StringToSearch
String store = "";
// print match and store match in String Store
if (m.find()) {
String theGroup = m.group(0);
System.out.format("'%s'\n", theGroup);
store = theGroup;
}
System.out.println("Store: " + store);
Prints,
'123456'
Store: 123456
Like you wanted to capture the value.
This may be useful for you :)
String stringToSearch = "https://example.com/excludethis123456/moretext";
Pattern pattern = Pattern.compile("excludethis([\\d\\D]+?)/");
Matcher matcher = pattern.matcher(stringToSearch);
if (matcher.find()) {
String result = matcher.group(1);
System.out.println(result);
}
If you don't want to use regex, you could just try with String::substring*
String stringToSearch = "https://example.com/excludethis123456/moretext";
String exclusion = "excludethis";
System.out.println(stringToSearch.substring(stringToSearch.indexOf(exclusion)).substring(exclusion.length(), stringToSearch.substring(stringToSearch.indexOf(exclusion)).indexOf("/")));
Output:
123456
* Definitely don't actually use this

How to extract id from url ? Google sheet

I have the follow urls.
https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY/edit#gid=1842172258
https://docs.google.com/a/example.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6PTKTzY0xOM5c6TXY/edit#gid=1842172258
https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY
Foreach url, I need to extract the sheet id: 1mrsetjgfZI2BIypz7SGHMOfHGv6PTKTzY0xOM5c6TXY into a java String.
I am thinking of using split but it can't work with all test cases:
String string = "https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY/edit#gid=1842172258";
String[] parts = string.split("/");
String res = parts[parts.length-2];
Log.d("hello res",res );
How can I that be possible?
You can use regex \/d\/(.*?)(\/|$) (regex demo) to solve your problem, if you look closer you can see that the ID exist between d/ and / or end of line for that you can get every thing between this, check this code demo :
String[] urls = new String[]{
"https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY/edit#gid=1842172258",
"https://docs.google.com/a/example.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6PTKTzY0xOM5c6TXY/edit#gid=1842172258",
"https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY"
};
String regex = "\\/d\\/(.*?)(\\/|$)";
Pattern pattern = Pattern.compile(regex);
for (String url : urls) {
Matcher matcher = pattern.matcher(url);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
}
Outputs
1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY
1mrsetjgfZI2BIypz7SGHMOfHGv6PTKTzY0xOM5c6TXY
1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY
it looks like the id you are looking for always follow "/spreadsheets/d/" if it is the case you can update your code to that
String string = "https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY/edit#gid=1842172258";
String[] parts = string.split("spreadsheets/d/");
String result;
if(parts[1].contains("/")){
String[] parts2 = parts[1].split("/");
result = parts2[0];
}
else{
result=parts[1];
}
System.out.println("hello "+ result);
Using regex
Pattern pattern = Pattern.compile("(?<=\\/d\\/)[^\\/]*");
Matcher matcher = pattern.matcher(url);
System.out.println(matcher.group(1));
Using Java
String result = url.substring(url.indexOf("/d/") + 3);
int slash = result.indexOf("/");
result = slash == -1 ? result
: result.substring(0, slash);
System.out.println(result);
Google use fixed lenght characters for its IDs, in your case they are 44 characters and these are the characters google use: alphanumeric, -, and _ so you can use this regex:
regex = "([\w-]){44}"
match = re.search(regex,url)

how get multiple value Regex expressions in Java

how to get all word if it has _I, im using "\S_I+\S".
I Have String :
the_B-NP camera_I-NP is_B-VP very_B-ADJP easy_I-ADJP to_B-VP use_I-VP ,_O in_B-PP fact_B-NP on_B-PP a_B-NP recent_I-NP trip_I-NP this_B-NP past_I-NP week_I-NP i_I-NP was_B-VP asked_I-VP to_B-VP take_I-VP a_B-NP picture_I-NP of_B-PP a_B-NP vacationing_I-NP elderly_I-NP group_I-NP ._O
this my code
Pattern p = Pattern.compile("\\S*_I+\\S*");
Matcher m = p.matcher(input);
while(m.find()){
hasilReg = m.group();
}
after compile i got only one value : group_I-NP
but i would like all word if it has _I
thanks
The group_I-NP is the last value and you only get this because you reassign the hasilReg value all the time. Add the results to a List<String>:
String str = "the_B-NP camera_I-NP is_B-VP very_B-ADJP easy_I-ADJP to_B-VP use_I-VP ,_O in_B-PP fact_B-NP on_B-PP a_B-NP recent_I-NP trip_I-NP this_B-NP past_I-NP week_I-NP i_I-NP was_B-VP asked_I-VP to_B-VP take_I-VP a_B-NP picture_I-NP of_B-PP a_B-NP vacationing_I-NP elderly_I-NP group_I-NP ._O ";
Pattern ptrn = Pattern.compile("\\S*_I+\\S*");
Matcher matcher = ptrn.matcher(str);
List<String> lst = new ArrayList<>();
while (matcher.find()) {
lst.add(matcher.group());
}
System.out.println(lst);
// => [camera_I-NP, easy_I-ADJP, use_I-VP, recent_I-NP, trip_I-NP, past_I-NP, week_I-NP, i_I-NP, asked_I-VP, take_I-VP, picture_I-NP, vacationing_I-NP, elderly_I-NP, group_I-NP]
See the Java demo

Matcher. How to get index of found group?

I have sentence and I want to calculate words, semiPunctuation and endPunctuation in it.
Command "m.group()" will show String result. But how to know which group is found?
I can use method with "group null", but it is sounds not good.
String input = "Some text! Some example text."
int wordCount=0;
int semiPunctuation=0;
int endPunctuation=0;
Pattern pattern = Pattern.compile( "([\\w]+) | ([,;:\\-\"\']) | ([!\\?\\.]+)" );
Matcher m = pattern.matcher(input);
while (m.find()) {
// need more correct method
if(m.group(1)!=null) wordCount++;
if(m.group(2)!=null) semiPunctuation++;
if(m.group(3)!=null) endPunctuation++;
}
You could use named groups to capture the expressions
Pattern pattern = Pattern.compile( "(?<words>\\w+)|(?<semi>[,;:\\-\"'])|(?<end>[!?.])" );
Matcher m = pattern.matcher(input);
while (m.find()) {
if (m.group("words") != null) {
wordCount++;
}
...
}

Getting an array of every string that matches a Regular expression

How would I parse a file like this:
Item costs $15 and is made up of --Metal--
Item costs $64 and is made up of --Plastic--
I can do
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
String result = m.group();
But how would I get EVERY result?
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
List<String> matches = new ArrayList<String>();
while(m.find()){
matches.add(m.group());
}

Categories