I have the following string:
"(1)name1:content1(2)name2:content2(3)name3:content3...(n)namen:contentn"
what I want to do is to capture each of the name_i and content_i, how can I do this? I should mention that name_i is unknown. For example name1 could be "abc", name2 could be "xyz".
What I have tried:
String regex = "\\(\\d\\)(.*):(.*)(?=\\(\\d\\))";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
System.out.println(matcher.group(0);
System.out.println(matcher.group(1);
System.out.println(matcher.group(2);
}
But the results is not very good. I also tried matcher.mathes(), nothing will be returned.
You may use
String s = "(1)name1:content1(2)name2:content2(3)name3:content3...(4)namen:content4";
Pattern pattern = Pattern.compile("\\(\\d+\\)([^:]+):([^(]*(?:\\((?!\\d+\\))[^(]*)*)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
See the Java demo
Details
\\(\\d+\\) - matches (x) substring where x is 1 or more digits
([^:]+) - Group 1: one or more chars other than :
: - a colon
([^(]*(?:\\((?!\\d+\\))[^(]*)*) - Group 2:
[^(]* - zero or more chars other than (
(?:\\((?!\\d+\\))[^(]*)* - zero or more sequences of:
\\((?!\\d+\\)) - a ( that is not followed with 1+ digits and )
[^(]* - 0+ chars other than (
See the regex demo.
This will work if your name and content does not include any non "word"-boundary characters:
public static void test(String input){
String regexpp = "\\(\\d+\\)(\\w+):(\\w+)";
Pattern p = Pattern.compile(regexpp);
Matcher m = p.matcher(input);
while(m.find()){
System.out.println("Name: " + m.group(1));
System.out.println("Content: " + m.group(2));
}
}
Output:
Name: name1
Content: content1
Name: name2
Content: content2
Name: name3
Content: content3
Name: name99
Content: content99
Your expression matches greedily - your first group eats up the colon first so it won't be possible to match the entire expression. You can use non-greedy matching (using the question mark as in *?) to make your pattern match.
String regex = "\\(\\d\\)(.*?):(.*?)(?=\\(\\d\\))";
Related
I am trying to get all the matching groups in my string.
My regular expression is "(?<!')/|/(?!')". I am trying to split the string using regular expression pattern and matcher. string needs to be split by using /, but '/'(surrounded by ') this needs to be skipped. for example "One/Two/Three'/'3/Four" needs to be split as ["One", "Two", "Three'/'3", "Four"] but not using .split method.
I am currently the below
// String to be scanned to find the pattern.
String line = "Test1/Test2/Tt";
String pattern = "(?<!')/|/(?!')";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.matches()) {
System.out.println("Found value: " + m.group(0) );
} else {
System.out.println("NO MATCH");
}
But it always saying "NO MATCH". where i am doing wrong? and how to fix that?
Thanks in advance
To get the matches without using split, you might use
[^'/]+(?:'/'[^'/]*)*
Explanation
[^'/]+ Match 1+ times any char except ' or /
(?: Non capture group
'/'[^'/]* Match '/' followed by optionally matching any char except ' or /
)* Close group and optionally repeat it
Regex demo | Java demo
String regex = "[^'/]+(?:'/'[^'/]*)*";
String string = "One/Two/Three'/'3/Four";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Output
One
Two
Three'/'3
Four
Edit
If you do not want to split don't you might also use a pattern to not match / but only when surrounded by single quotes
[^/]+(?:(?<=')/(?=')[^/]*)*
Regex demo
Try this.
String line = "One/Two/Three'/'3/Four";
Pattern pattern = Pattern.compile("('/'|[^/])+");
Matcher m = pattern.matcher(line);
while (m.find())
System.out.println(m.group());
output:
One
Two
Three'/'3
Four
Here is simple pattern matching all desired /, so you can split by them:
(?<=[^'])\/(?=')|(?<=')\/(?=[^'])|(?<=[^'])\/(?=[^'])
The logic is as follows: we have 4 cases:
/ is sorrounded by ', i.e. `'/'
/ is preceeded by ', i.e. '/
/ is followed by ', i.e. /'
/ is sorrounded by characters other than '
You want only exclude 1. case. So we need to write regex for three cases, so I have written three similair regexes and used alternation.
Explanation of the first part (other two are analogical):
(?<=[^']) - positiva lookbehind, assert what preceeds is differnt frim ' (negated character class [^']
\/ - match / literally
(?=') - positiva lookahead, assert what follows is '\
Demo with some more edge cases
Try something like this:
String line = "One/Two/Three'/'3/Four";
String pattern = "([^/]+'/'\d)|[^/]+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
boolean found = false;
while(m.find()) {
System.out.println("Found value: " + m.group() );
found = true;
}
if(!found) {
System.out.println("NO MATCH");
}
Output:
Found value: One
Found value: Two
Found value: Three'/'3
Found value: Four
I need to implement a function that, given as input a filename, returns a substring according to the specifications of a regular expression
Filenames are composed this way, I need to get the string in bold
Doc20191001119049_fotocontargasx_3962122_943000.jpg
Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg
Doc201910011214020_fotoesterna_ant_396024_947112.jpg
Doc201710071149010_foto_TargaMid_4007396_95010.jpg
I have currently implemented this:
Pattern rexExp = Pattern.compile("_[a-zA-Z0-9]+_");
But not work properly
Solution 1: Matching/extracting
You may capture \w+ pattern inside _s that are followed with [digits][_][digits][.][extension]:
Pattern rexExp = Pattern.compile("_(\\w+)_\\d+_\\d+\\.[^.]*$");
See the regex demo
Details
_ - an underscore
(\w+) - 1+ letters/digits/_
_ - an underscore
\d+ - 1+ digits
_\d+ - _ and 1+ digits
\. - a dot
[^.]* - 0+ chars other than .
$ - end of string.
Online Java demo:
String s = "Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg";
Pattern rexExp = Pattern.compile("_(\\w+)_\\d+_\\d+\\.[^.]*$");
Matcher matcher = rexExp.matcher(s);
if (matcher.find()){
System.out.println(matcher.group(1));
} // => fotoAssicurazioneCartaceo
Solution 2: Trimming out unnecessary prefix/suffix
You may remove all from the start up to the first _ including it, and [digits][_][digits][.][extension] at the end:
.replaceAll("^[^_]*_|_\\d+_\\d+\\.[^.]*$", "")
See this regex demo
Details
^[^_]*_ - start of string, 0+ chars other than _ and then _
| - or
_\d+_\d+\.[^.]*$ - _, 1+ digits, _, 1+ digits, . and then 0+ chars other than . to the end of the string.
To complement Wiktor's precise answer, here's a "quick-and-dirty" way of doing it that makes the following hacky assumption about your input: "Required string is only non-numbers, surrounded by numbers, and the input is always a valid filepath".
public static void main(String[] args) {
String[] strs = {"Doc20191001119049_fotocontargasx_3962122_943000.jpg", "Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg", "Doc201910011214020_fotoesterna_ant_396024_947112.jpg", "Doc201710071149010_foto_TargaMid_4007396_95010.jpg"};
var p = Pattern.compile("_([\\D_]+)_");
for(var str : strs) {
var m = p.matcher(str);
if(m.find()) {
System.out.println("found: "+m.group(1));
}
}
}
Output:
found: fotocontargasx
found: fotoAssicurazioneCartaceo
found: fotoesterna_ant
found: foto_TargaMid
Pattern: (?<=_).+(?=(_\d+){2}\.)
final String s = "Doc20191001119049_fotocontargasx_3962122_943000.jpg\n"
+ "\n"
+ "Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg\n"
+ "\n"
+ "Doc201910011214020_fotoesterna_ant_396024_947112.jpg\n"
+ "\n"
+ "Doc201710071149010_foto_TargaMid_4007396_95010.jpg";
Pattern pattern = Pattern.compile("(?<=_).+(?=(_\\d+){2}\\.)");
Matcher matcher = pattern.matcher(s);
List<String> allMatches = new ArrayList<>();
while (matcher.find()) {
allMatches.add(matcher.group());
}
Output: [fotocontargasx, fotoAssicurazioneCartaceo, fotoesterna_ant, foto_TargaMid]
i'v tested my regex on Regex101 and all the groups was captured and matched my string. But now when i'm trying to use it on java, it returns to me a
java.lang.IllegalStateException: No match found on line 9
String subjectCode = "02 credits between ----";
String regex1 = "^(\\d+).*credits between --+.*?$";
Pattern p1 = Pattern.compile(regex1);
Matcher m;
if(subjectCode.matches(regex1)){
m = p1.matcher(regex1);
m.find();
[LINE 9]Integer subjectCredits = Integer.valueOf(m.group(1));
System.out.println("Subject Credits: " + subjectCredits);
}
How's that possible and what's the problem?
Here is a fix and optimizations (thanks go to #cricket_007):
String subjectCode = "02 credits between ----";
String regex1 = "(\\d+).*credits between --+.*";
Pattern p1 = Pattern.compile(regex1);
Matcher m = p1.matcher(subjectCode);
if (m.matches()) {
Integer subjectCredits = Integer.valueOf(m.group(1));
System.out.println("Subject Credits: " + subjectCredits);
}
You need to pass the input string to the matcher. As a minor enhancement, you can use just 1 Matcher#matches and then access the captured group if there is a match. The regex does not need ^ and $ since with matches() the whole input should match the pattern.
See IDEONE demo
I have this regex and my output seems to be matching each single space but the capturing group is only alpha chars. I must be missing something.
String regexstring = new String("1234567 Mike Peloso ");
Pattern pattern = Pattern.compile("[A-Za-z]*");
Matcher matcher = pattern.matcher(regexstring);
while(matcher.find())
{
System.out.println(Integer.toString(matcher.start()));
String someNumberStr = matcher.group();
System.out.println(someNumberStr);
}
There is no capturing group, but you need to use the + quantifier (meaning 1 or more times). The * quantifier matches the preceding element zero or more times and creates a disaster of output...
Pattern pattern = Pattern.compile("[A-Za-z]+");
And then print the match result:
while (matcher.find()) {
System.out.println(matcher.start());
System.out.println(matcher.group());
}
Working Demo
In a line I may have (123,456)
I want to find it using pattern in java. What I did is:
Pattern pattern = Pattern.compile("\\W");
Matcher matcher = pattern.matcher("(");
while (matcher.find()) {
System.out.print("Start index: " + matcher.start());
System.out.print(" End index: " + matcher.end() + " ");
}
Input: This is test (123,456)
Output:Start index: 0 End index: 1 (
Why??
I am not sure how \W is going to match it. \W matches a non word character.
You will also have to escape those backslashes.
Round brackets need to be escaped , as by default they are used for grouping.
Maybe the regex you meant was
Pattern pattern = Pattern.compile("\\([,\\d]+\\)");
Matcher matcher = pattern.matcher(inputString);
while (matcher.find()) {
String matched = matcher.group();
//Do something with it
}
Explanation:
\\( # Match (
[,\\d]+ # Match 1+ digits/commas. Don't be surprised if it matches (,,,,,,)
\\) # Match )
To do it in one line:
String num = str.replaceAll(".*\\(([\\d,]+)\\).*", "$1");