Return a substring using a regExp [Java] - java

I need to implement a function that, given as input a filename, returns a substring according to the specifications of a regular expression
Filenames are composed this way, I need to get the string in bold
Doc20191001119049_fotocontargasx_3962122_943000.jpg
Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg
Doc201910011214020_fotoesterna_ant_396024_947112.jpg
Doc201710071149010_foto_TargaMid_4007396_95010.jpg
I have currently implemented this:
Pattern rexExp = Pattern.compile("_[a-zA-Z0-9]+_");
But not work properly

Solution 1: Matching/extracting
You may capture \w+ pattern inside _s that are followed with [digits][_][digits][.][extension]:
Pattern rexExp = Pattern.compile("_(\\w+)_\\d+_\\d+\\.[^.]*$");
See the regex demo
Details
_ - an underscore
(\w+) - 1+ letters/digits/_
_ - an underscore
\d+ - 1+ digits
_\d+ - _ and 1+ digits
\. - a dot
[^.]* - 0+ chars other than .
$ - end of string.
Online Java demo:
String s = "Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg";
Pattern rexExp = Pattern.compile("_(\\w+)_\\d+_\\d+\\.[^.]*$");
Matcher matcher = rexExp.matcher(s);
if (matcher.find()){
System.out.println(matcher.group(1));
} // => fotoAssicurazioneCartaceo
Solution 2: Trimming out unnecessary prefix/suffix
You may remove all from the start up to the first _ including it, and [digits][_][digits][.][extension] at the end:
.replaceAll("^[^_]*_|_\\d+_\\d+\\.[^.]*$", "")
See this regex demo
Details
^[^_]*_ - start of string, 0+ chars other than _ and then _
| - or
_\d+_\d+\.[^.]*$ - _, 1+ digits, _, 1+ digits, . and then 0+ chars other than . to the end of the string.

To complement Wiktor's precise answer, here's a "quick-and-dirty" way of doing it that makes the following hacky assumption about your input: "Required string is only non-numbers, surrounded by numbers, and the input is always a valid filepath".
public static void main(String[] args) {
String[] strs = {"Doc20191001119049_fotocontargasx_3962122_943000.jpg", "Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg", "Doc201910011214020_fotoesterna_ant_396024_947112.jpg", "Doc201710071149010_foto_TargaMid_4007396_95010.jpg"};
var p = Pattern.compile("_([\\D_]+)_");
for(var str : strs) {
var m = p.matcher(str);
if(m.find()) {
System.out.println("found: "+m.group(1));
}
}
}
Output:
found: fotocontargasx
found: fotoAssicurazioneCartaceo
found: fotoesterna_ant
found: foto_TargaMid

Pattern: (?<=_).+(?=(_\d+){2}\.)
final String s = "Doc20191001119049_fotocontargasx_3962122_943000.jpg\n"
+ "\n"
+ "Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg\n"
+ "\n"
+ "Doc201910011214020_fotoesterna_ant_396024_947112.jpg\n"
+ "\n"
+ "Doc201710071149010_foto_TargaMid_4007396_95010.jpg";
Pattern pattern = Pattern.compile("(?<=_).+(?=(_\\d+){2}\\.)");
Matcher matcher = pattern.matcher(s);
List<String> allMatches = new ArrayList<>();
while (matcher.find()) {
allMatches.add(matcher.group());
}
Output: [fotocontargasx, fotoAssicurazioneCartaceo, fotoesterna_ant, foto_TargaMid]

Related

Regular Expression in Java. Splitting a string using pattern and matcher

I am trying to get all the matching groups in my string.
My regular expression is "(?<!')/|/(?!')". I am trying to split the string using regular expression pattern and matcher. string needs to be split by using /, but '/'(surrounded by ') this needs to be skipped. for example "One/Two/Three'/'3/Four" needs to be split as ["One", "Two", "Three'/'3", "Four"] but not using .split method.
I am currently the below
// String to be scanned to find the pattern.
String line = "Test1/Test2/Tt";
String pattern = "(?<!')/|/(?!')";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.matches()) {
System.out.println("Found value: " + m.group(0) );
} else {
System.out.println("NO MATCH");
}
But it always saying "NO MATCH". where i am doing wrong? and how to fix that?
Thanks in advance
To get the matches without using split, you might use
[^'/]+(?:'/'[^'/]*)*
Explanation
[^'/]+ Match 1+ times any char except ' or /
(?: Non capture group
'/'[^'/]* Match '/' followed by optionally matching any char except ' or /
)* Close group and optionally repeat it
Regex demo | Java demo
String regex = "[^'/]+(?:'/'[^'/]*)*";
String string = "One/Two/Three'/'3/Four";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Output
One
Two
Three'/'3
Four
Edit
If you do not want to split don't you might also use a pattern to not match / but only when surrounded by single quotes
[^/]+(?:(?<=')/(?=')[^/]*)*
Regex demo
Try this.
String line = "One/Two/Three'/'3/Four";
Pattern pattern = Pattern.compile("('/'|[^/])+");
Matcher m = pattern.matcher(line);
while (m.find())
System.out.println(m.group());
output:
One
Two
Three'/'3
Four
Here is simple pattern matching all desired /, so you can split by them:
(?<=[^'])\/(?=')|(?<=')\/(?=[^'])|(?<=[^'])\/(?=[^'])
The logic is as follows: we have 4 cases:
/ is sorrounded by ', i.e. `'/'
/ is preceeded by ', i.e. '/
/ is followed by ', i.e. /'
/ is sorrounded by characters other than '
You want only exclude 1. case. So we need to write regex for three cases, so I have written three similair regexes and used alternation.
Explanation of the first part (other two are analogical):
(?<=[^']) - positiva lookbehind, assert what preceeds is differnt frim ' (negated character class [^']
\/ - match / literally
(?=') - positiva lookahead, assert what follows is '\
Demo with some more edge cases
Try something like this:
String line = "One/Two/Three'/'3/Four";
String pattern = "([^/]+'/'\d)|[^/]+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
boolean found = false;
while(m.find()) {
System.out.println("Found value: " + m.group() );
found = true;
}
if(!found) {
System.out.println("NO MATCH");
}
Output:
Found value: One
Found value: Two
Found value: Three'/'3
Found value: Four

Java reg expression capture string

I have the following string:
"(1)name1:content1(2)name2:content2(3)name3:content3...(n)namen:contentn"
what I want to do is to capture each of the name_i and content_i, how can I do this? I should mention that name_i is unknown. For example name1 could be "abc", name2 could be "xyz".
What I have tried:
String regex = "\\(\\d\\)(.*):(.*)(?=\\(\\d\\))";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
System.out.println(matcher.group(0);
System.out.println(matcher.group(1);
System.out.println(matcher.group(2);
}
But the results is not very good. I also tried matcher.mathes(), nothing will be returned.
You may use
String s = "(1)name1:content1(2)name2:content2(3)name3:content3...(4)namen:content4";
Pattern pattern = Pattern.compile("\\(\\d+\\)([^:]+):([^(]*(?:\\((?!\\d+\\))[^(]*)*)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
See the Java demo
Details
\\(\\d+\\) - matches (x) substring where x is 1 or more digits
([^:]+) - Group 1: one or more chars other than :
: - a colon
([^(]*(?:\\((?!\\d+\\))[^(]*)*) - Group 2:
[^(]* - zero or more chars other than (
(?:\\((?!\\d+\\))[^(]*)* - zero or more sequences of:
\\((?!\\d+\\)) - a ( that is not followed with 1+ digits and )
[^(]* - 0+ chars other than (
See the regex demo.
This will work if your name and content does not include any non "word"-boundary characters:
public static void test(String input){
String regexpp = "\\(\\d+\\)(\\w+):(\\w+)";
Pattern p = Pattern.compile(regexpp);
Matcher m = p.matcher(input);
while(m.find()){
System.out.println("Name: " + m.group(1));
System.out.println("Content: " + m.group(2));
}
}
Output:
Name: name1
Content: content1
Name: name2
Content: content2
Name: name3
Content: content3
Name: name99
Content: content99
Your expression matches greedily - your first group eats up the colon first so it won't be possible to match the entire expression. You can use non-greedy matching (using the question mark as in *?) to make your pattern match.
String regex = "\\(\\d\\)(.*?):(.*?)(?=\\(\\d\\))";

Java Find Substring Inbetween Characters

I am very stuck. I use this format to read a player's name in a string, like so:
"[PLAYER_yourname]"
I have tried for a few hours and can't figure out how to read only the part after the '_' and before the ']' to get there name.
Could I have some help? I played around with sub strings, splitting, some regex and no luck. Thanks! :)
BTW: This question is different, if I split by _ I don't know how to stop at the second bracket, as I have other string lines past the second bracket. Thanks!
You can do:
String s = "[PLAYER_yourname]";
String name = s.substring(s.indexOf("_") + 1, s.lastIndexOf("]"));
You can use a substring. int x = str.indexOf('_') gives you the character where the '_' is found and int y = str.lastIndexOF(']') gives you the character where the ']' is found. Then you can do str.substring(x + 1, y) and that will give you the string from after the symbol until the end of the word, not including the closing bracket.
Using the regex matcher functions you could do:
String s = "[PLAYER_yourname]";
String p = "\\[[A-Z]+_(.+)\\]";
Pattern r = Pattern.compile(p);
Matcher m = r.matcher(s);
if (m.find( ))
System.out.println(m.group(1));
Result:
yourname
Explanation:
\[ matches the character [ literally
[A-Z]+ match a single character (case sensitive + between one and unlimited times)
_ matches the character _ literally
1st Capturing group (.+) matches any character (except newline)
\] matches the character ] literally
This solution uses Java regex
String player = "[PLAYER_yourname]";
Pattern PLAYER_PATTERN = Pattern.compile("^\\[PLAYER_(.*?)]$");
Matcher matcher = PLAYER_PATTERN.matcher(player);
if (matcher.matches()) {
System.out.println( matcher.group(1) );
}
// prints yourname
see DEMO
You can do like this -
public static void main(String[] args) throws InterruptedException {
String s = "[PLAYER_yourname]";
System.out.println(s.split("[_\\]]")[1]);
}
output: yourname
Try:
Pattern pattern = Pattern.compile(".*?_([^\\]]+)");
Matcher m = pattern.matcher("[PLAYER_yourname]");
if (m.matches()) {
String name = m.group(1);
// name = "yourname"
}

regex to remove round brackets from a string

i have a string
String s="[[Identity (philosophy)|unique identity]]";
i need to parse it to .
s1 = Identity_philosphy
s2= unique identity
I have tried following code
Pattern p = Pattern.compile("(\\[\\[)(\\w*?\\s\\(\\w*?\\))(\\s*[|])\\w*(\\]\\])");
Matcher m = p.matcher(s);
while(m.find())
{
....
}
But the pattern is not matching..
Please Help
Thanks
Use
String s="[[Identity (philosophy)|unique identity]]";
String[] results = s.replaceAll("^\\Q[[\\E|]]$", "") // Delete double brackets at start/end
.replaceAll("\\s+\\(([^()]*)\\)","_$1") // Replace spaces and parens with _
.split("\\Q|\\E"); // Split with pipe
System.out.println(results[0]);
System.out.println(results[1]);
Output:
Identity_philosophy
unique identity
You may use
String s="[[Identity (philosophy)|unique identity]]";
Matcher m = Pattern.compile("\\[{2}(.*)\\|(.*)]]").matcher(s);
if (m.matches()) {
System.out.println(m.group(1).replaceAll("\\W+", " ").trim().replace(" ", "_")); // // => Identity_philosphy
System.out.println(m.group(2).trim()); // => unique identity
}
See a Java demo.
Details
The "\\[{2}(.*)\\|(.*)]]" with matches() is parsed as a ^\[{2}(.*)\|(.*)]]\z pattern that matches a string that starts with [[, then matches and captures any 0 or more chars other than line break chars as many as possible into Group 1, then matches a |, then matches and capture any 0 or more chars other than line break chars as many as possible into Group 2 and then matches ]]. See the regex demo.
The contents in Group 2 can be trimmed from whitespace and used as is, but Group 1 should be preprocessed by replacing all 1+ non-word character chhunks with a space (.replaceAll("\\W+", " ")), then trimming the result (.trim()) and replacing all spaces with _ (.replace(" ", "_")) as the final touch.

Punctuation Regex in Java

First, i'm read the documentation as follow
http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html
And i want find any punctuation character EXCEPT #',& but i don't quite understand.
Here is :
public static void main( String[] args )
{
// String to be scanned to find the pattern.
String value = "#`~!#$%^";
String pattern = "\\p{Punct}[^#',&]";
// Create a Pattern object
Pattern r = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
// Now create matcher object.
Matcher m = r.matcher(value);
if (m.find()) {
System.out.println("Found value: " + m.groupCount());
} else {
System.out.println("NO MATCH");
}
}
Result is NO MATCH.
Is there any mismatch ?
Thanks
MRizq
You're matching two characters, not one. Using a (negative) lookahead should solve the task:
(?![#',&])\\p{Punct}
You may use character subtraction here:
String pat = "[\\p{Punct}&&[^#',&]]";
The whole pattern represents a character class, [...], that contains a \p{Punct} POSIX character class, the && intersection operator and [^...] negated character class.
A Unicode modifier might be necessary if you plan to also match all Unicode punctuation:
String pat = "(?U)[\\p{Punct}&&[^#',&]]";
^^^^
The pattern matches any punctuation (with \p{Punct}) except #, ', , and &.
If you need to exclude more characters, add them to the negated character class. Just remember to always escape -, \, ^, [ and ] inside a Java regex character class/set. E.g. adding a backslash and - might look like "[\\p{Punct}&&[^#',&\\\\-]]" or "[\\p{Punct}&&[^#',&\\-\\\\]]".
Java demo:
String value = "#`~!#$%^,";
String pattern = "(?U)[\\p{Punct}&&[^#',&]]";
Pattern r = Pattern.compile(pattern); // Create a Pattern object
Matcher m = r.matcher(value); // Now create matcher object.
while (m.find()) {
System.out.println("Found value: " + m.group());
}
Output:
Found value: #
Found value: !
Found value: #
Found value: %
Found value: ,

Categories