Java reg expression capture string - java

I have the following string:
"(1)name1:content1(2)name2:content2(3)name3:content3...(n)namen:contentn"
what I want to do is to capture each of the name_i and content_i, how can I do this? I should mention that name_i is unknown. For example name1 could be "abc", name2 could be "xyz".
What I have tried:
String regex = "\\(\\d\\)(.*):(.*)(?=\\(\\d\\))";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
System.out.println(matcher.group(0);
System.out.println(matcher.group(1);
System.out.println(matcher.group(2);
}
But the results is not very good. I also tried matcher.mathes(), nothing will be returned.

You may use
String s = "(1)name1:content1(2)name2:content2(3)name3:content3...(4)namen:content4";
Pattern pattern = Pattern.compile("\\(\\d+\\)([^:]+):([^(]*(?:\\((?!\\d+\\))[^(]*)*)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
See the Java demo
Details
\\(\\d+\\) - matches (x) substring where x is 1 or more digits
([^:]+) - Group 1: one or more chars other than :
: - a colon
([^(]*(?:\\((?!\\d+\\))[^(]*)*) - Group 2:
[^(]* - zero or more chars other than (
(?:\\((?!\\d+\\))[^(]*)* - zero or more sequences of:
\\((?!\\d+\\)) - a ( that is not followed with 1+ digits and )
[^(]* - 0+ chars other than (
See the regex demo.

This will work if your name and content does not include any non "word"-boundary characters:
public static void test(String input){
String regexpp = "\\(\\d+\\)(\\w+):(\\w+)";
Pattern p = Pattern.compile(regexpp);
Matcher m = p.matcher(input);
while(m.find()){
System.out.println("Name: " + m.group(1));
System.out.println("Content: " + m.group(2));
}
}
Output:
Name: name1
Content: content1
Name: name2
Content: content2
Name: name3
Content: content3
Name: name99
Content: content99

Your expression matches greedily - your first group eats up the colon first so it won't be possible to match the entire expression. You can use non-greedy matching (using the question mark as in *?) to make your pattern match.
String regex = "\\(\\d\\)(.*?):(.*?)(?=\\(\\d\\))";

Related

Regular Expression in Java. Splitting a string using pattern and matcher

I am trying to get all the matching groups in my string.
My regular expression is "(?<!')/|/(?!')". I am trying to split the string using regular expression pattern and matcher. string needs to be split by using /, but '/'(surrounded by ') this needs to be skipped. for example "One/Two/Three'/'3/Four" needs to be split as ["One", "Two", "Three'/'3", "Four"] but not using .split method.
I am currently the below
// String to be scanned to find the pattern.
String line = "Test1/Test2/Tt";
String pattern = "(?<!')/|/(?!')";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.matches()) {
System.out.println("Found value: " + m.group(0) );
} else {
System.out.println("NO MATCH");
}
But it always saying "NO MATCH". where i am doing wrong? and how to fix that?
Thanks in advance
To get the matches without using split, you might use
[^'/]+(?:'/'[^'/]*)*
Explanation
[^'/]+ Match 1+ times any char except ' or /
(?: Non capture group
'/'[^'/]* Match '/' followed by optionally matching any char except ' or /
)* Close group and optionally repeat it
Regex demo | Java demo
String regex = "[^'/]+(?:'/'[^'/]*)*";
String string = "One/Two/Three'/'3/Four";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Output
One
Two
Three'/'3
Four
Edit
If you do not want to split don't you might also use a pattern to not match / but only when surrounded by single quotes
[^/]+(?:(?<=')/(?=')[^/]*)*
Regex demo
Try this.
String line = "One/Two/Three'/'3/Four";
Pattern pattern = Pattern.compile("('/'|[^/])+");
Matcher m = pattern.matcher(line);
while (m.find())
System.out.println(m.group());
output:
One
Two
Three'/'3
Four
Here is simple pattern matching all desired /, so you can split by them:
(?<=[^'])\/(?=')|(?<=')\/(?=[^'])|(?<=[^'])\/(?=[^'])
The logic is as follows: we have 4 cases:
/ is sorrounded by ', i.e. `'/'
/ is preceeded by ', i.e. '/
/ is followed by ', i.e. /'
/ is sorrounded by characters other than '
You want only exclude 1. case. So we need to write regex for three cases, so I have written three similair regexes and used alternation.
Explanation of the first part (other two are analogical):
(?<=[^']) - positiva lookbehind, assert what preceeds is differnt frim ' (negated character class [^']
\/ - match / literally
(?=') - positiva lookahead, assert what follows is '\
Demo with some more edge cases
Try something like this:
String line = "One/Two/Three'/'3/Four";
String pattern = "([^/]+'/'\d)|[^/]+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
boolean found = false;
while(m.find()) {
System.out.println("Found value: " + m.group() );
found = true;
}
if(!found) {
System.out.println("NO MATCH");
}
Output:
Found value: One
Found value: Two
Found value: Three'/'3
Found value: Four

Return a substring using a regExp [Java]

I need to implement a function that, given as input a filename, returns a substring according to the specifications of a regular expression
Filenames are composed this way, I need to get the string in bold
Doc20191001119049_fotocontargasx_3962122_943000.jpg
Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg
Doc201910011214020_fotoesterna_ant_396024_947112.jpg
Doc201710071149010_foto_TargaMid_4007396_95010.jpg
I have currently implemented this:
Pattern rexExp = Pattern.compile("_[a-zA-Z0-9]+_");
But not work properly
Solution 1: Matching/extracting
You may capture \w+ pattern inside _s that are followed with [digits][_][digits][.][extension]:
Pattern rexExp = Pattern.compile("_(\\w+)_\\d+_\\d+\\.[^.]*$");
See the regex demo
Details
_ - an underscore
(\w+) - 1+ letters/digits/_
_ - an underscore
\d+ - 1+ digits
_\d+ - _ and 1+ digits
\. - a dot
[^.]* - 0+ chars other than .
$ - end of string.
Online Java demo:
String s = "Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg";
Pattern rexExp = Pattern.compile("_(\\w+)_\\d+_\\d+\\.[^.]*$");
Matcher matcher = rexExp.matcher(s);
if (matcher.find()){
System.out.println(matcher.group(1));
} // => fotoAssicurazioneCartaceo
Solution 2: Trimming out unnecessary prefix/suffix
You may remove all from the start up to the first _ including it, and [digits][_][digits][.][extension] at the end:
.replaceAll("^[^_]*_|_\\d+_\\d+\\.[^.]*$", "")
See this regex demo
Details
^[^_]*_ - start of string, 0+ chars other than _ and then _
| - or
_\d+_\d+\.[^.]*$ - _, 1+ digits, _, 1+ digits, . and then 0+ chars other than . to the end of the string.
To complement Wiktor's precise answer, here's a "quick-and-dirty" way of doing it that makes the following hacky assumption about your input: "Required string is only non-numbers, surrounded by numbers, and the input is always a valid filepath".
public static void main(String[] args) {
String[] strs = {"Doc20191001119049_fotocontargasx_3962122_943000.jpg", "Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg", "Doc201910011214020_fotoesterna_ant_396024_947112.jpg", "Doc201710071149010_foto_TargaMid_4007396_95010.jpg"};
var p = Pattern.compile("_([\\D_]+)_");
for(var str : strs) {
var m = p.matcher(str);
if(m.find()) {
System.out.println("found: "+m.group(1));
}
}
}
Output:
found: fotocontargasx
found: fotoAssicurazioneCartaceo
found: fotoesterna_ant
found: foto_TargaMid
Pattern: (?<=_).+(?=(_\d+){2}\.)
final String s = "Doc20191001119049_fotocontargasx_3962122_943000.jpg\n"
+ "\n"
+ "Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg\n"
+ "\n"
+ "Doc201910011214020_fotoesterna_ant_396024_947112.jpg\n"
+ "\n"
+ "Doc201710071149010_foto_TargaMid_4007396_95010.jpg";
Pattern pattern = Pattern.compile("(?<=_).+(?=(_\\d+){2}\\.)");
Matcher matcher = pattern.matcher(s);
List<String> allMatches = new ArrayList<>();
while (matcher.find()) {
allMatches.add(matcher.group());
}
Output: [fotocontargasx, fotoAssicurazioneCartaceo, fotoesterna_ant, foto_TargaMid]

Matcher returning not match

i'v tested my regex on Regex101 and all the groups was captured and matched my string. But now when i'm trying to use it on java, it returns to me a
java.lang.IllegalStateException: No match found on line 9
String subjectCode = "02 credits between ----";
String regex1 = "^(\\d+).*credits between --+.*?$";
Pattern p1 = Pattern.compile(regex1);
Matcher m;
if(subjectCode.matches(regex1)){
m = p1.matcher(regex1);
m.find();
[LINE 9]Integer subjectCredits = Integer.valueOf(m.group(1));
System.out.println("Subject Credits: " + subjectCredits);
}
How's that possible and what's the problem?
Here is a fix and optimizations (thanks go to #cricket_007):
String subjectCode = "02 credits between ----";
String regex1 = "(\\d+).*credits between --+.*";
Pattern p1 = Pattern.compile(regex1);
Matcher m = p1.matcher(subjectCode);
if (m.matches()) {
Integer subjectCredits = Integer.valueOf(m.group(1));
System.out.println("Subject Credits: " + subjectCredits);
}
You need to pass the input string to the matcher. As a minor enhancement, you can use just 1 Matcher#matches and then access the captured group if there is a match. The regex does not need ^ and $ since with matches() the whole input should match the pattern.
See IDEONE demo

Java Regex group matches spaces

I have this regex and my output seems to be matching each single space but the capturing group is only alpha chars. I must be missing something.
String regexstring = new String("1234567 Mike Peloso ");
Pattern pattern = Pattern.compile("[A-Za-z]*");
Matcher matcher = pattern.matcher(regexstring);
while(matcher.find())
{
System.out.println(Integer.toString(matcher.start()));
String someNumberStr = matcher.group();
System.out.println(someNumberStr);
}
There is no capturing group, but you need to use the + quantifier (meaning 1 or more times). The * quantifier matches the preceding element zero or more times and creates a disaster of output...
Pattern pattern = Pattern.compile("[A-Za-z]+");
And then print the match result:
while (matcher.find()) {
System.out.println(matcher.start());
System.out.println(matcher.group());
}
Working Demo

finding number between 2 parenthesis using regular expression

In a line I may have (123,456)
I want to find it using pattern in java. What I did is:
Pattern pattern = Pattern.compile("\\W");
Matcher matcher = pattern.matcher("(");
while (matcher.find()) {
System.out.print("Start index: " + matcher.start());
System.out.print(" End index: " + matcher.end() + " ");
}
Input: This is test (123,456)
Output:Start index: 0 End index: 1 (
Why??
I am not sure how \W is going to match it. \W matches a non word character.
You will also have to escape those backslashes.
Round brackets need to be escaped , as by default they are used for grouping.
Maybe the regex you meant was
Pattern pattern = Pattern.compile("\\([,\\d]+\\)");
Matcher matcher = pattern.matcher(inputString);
while (matcher.find()) {
String matched = matcher.group();
//Do something with it
}
Explanation:
\\( # Match (
[,\\d]+ # Match 1+ digits/commas. Don't be surprised if it matches (,,,,,,)
\\) # Match )
To do it in one line:
String num = str.replaceAll(".*\\(([\\d,]+)\\).*", "$1");

Categories