Regex that allows only single separators between words - java

I need to construct a regular expression such that it should not allow / at the start or end, and there should not be more than one / in sequence.
Valid Expression is: AB/CD
Valid Expression :AB
Invalid Expression: //AB//CD//
Invalid Expression: ///////
Invalid Expression: AB////////
The / character is just a separator between two words. Its length should not be more than one between words.

Assuming you only want to allow alphanumerics (including underscore) between slashes, it's pretty trivial:
boolean foundMatch = subject.matches("\\w+(?:/\\w+)*");
Explanation:
\w+ # Match one or more alnum characters
(?: # Start a non-capturing group
/ # Match a single slash
\w+ # Match one or more alnum characters
)* # Match that group any number of times

This regex does it:
^(?!/)(?!.*//).*[^/]$
So in java:
if (str.matches("(?!/)(?!.*//).*[^/]"))
Note that ^ and $ are implied by matches(), because matches must match the whole string to be true.

[a-zA-Z]+(/[a-zA-Z]+)+
It matches
a/b
a/b/c
aa/vv/cc
doesn't matches
a
/a/b
a//b
a/b/
Demo
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Reg {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("[a-zA-Z]+(/[a-zA-Z]+)+");
Matcher matcher = pattern.matcher("a/b/c");
System.out.println(matcher.matches());
}
}

Related

Java regex repeating capture groups

Considering the following string: "${test.one}${test.two}" I would like my regex to return two matches, namely "test.one" and "test.two". To do that I have the following snippet:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTester {
private static final Pattern pattern = Pattern.compile("\\$\\{((?:(?:[A-z]+(?:\\.[A-z0-9()\\[\\]\"]+)*)+|(?:\"[\\w/?.&=_\\-]*\")+)+)}+$");
public static void main(String[] args) {
String testString = "${test.one}${test.two}";
Matcher matcher = pattern.matcher(testString);
while (matcher.find()) {
for (int i = 0; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
}
}
I have some other stuff in there as well, because I want this to also be a valid match ${test.one}${"hello"}.
So, basically, I just want it to match on anything inside of ${} as long as it either follows the format: something.somethingelse (alphanumeric only there) or something.somethingElse() or "something inside of quotations" (alphanumeric plus some other characters). I have the main regex working, or so I think, but when I run the code, it finds two groups,
${test.two}
test.two
I want the output to be
test.one
test.two
Basically, your regex main problem is that it matches only at the end of string, and you match many more chars that just letters with [A-z]. Your grouping also seem off.
If you load your regex at regex101, you will see it matches
\$\{
( - start of a capturing group
(?: - start of a non-capturing group
(?:[A-z]+ - start of a non-capturing group, and it matches 1+ chars between A and z (your first mistake)
(?:\.[A-z0-9()\[\]\"]+)* - 0 or more repetitions of a . and then 1+ letters, digits, (, ), [, ], ", \, ^, _, and a backtick
)+ - repeat the non-capturing group 1 or more times
| - or
(?:\"[\w/?.&=_\-]*\")+ - 1 or more occurrences of ", 0 or more word, /, ?, ., &, =, _, - chars and then a "
)+ - repeat the group pattern 1+ times
) - end of non-capturing group
}+ - 1+ } chars
$ - end of string.
To match any occurrence of your pattern inside a string, you need to use
\$\{(\"[^\"]*\"|\w+(?:\(\))?(?:\.\w+(?:\(\))?)*)}
See the regex demo, get Group 1 value after a match is found. Details:
\$\{ - a ${ substring
(\"[^\"]*\"|\w+(?:\(\))?(?:\.\w+(?:\(\))?)*) - Capturing group 1:
\"[^\"]*\" - ", 0+ chars other than " and then a "
| - or
\w+(?:\(\))? - 1+ word chars and an optional () substring
(?:\.\w+(?:\(\))?)* - 0 or more repetitions of . and then 1+ word chars and an optional () substring
} - a } char.
See the Java demo:
String s = "${test.one}${test.two}\n${test.one}${test.two()}\n${test.one}${\"hello\"}";
Pattern pattern = Pattern.compile("\\$\\{(\"[^\"]*\"|\\w+(?:\\(\\))?(?:\\.\\w+(?:\\(\\))?)*)}");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
}
Output:
test.one
test.two
test.one
test.two()
test.one
"hello"
You could use the regular expression
(?<=\$\{")[a-z]+(?="\})|(?<=\$\{)[a-z]+\.[a-z]+(?:\(\))?(?=\})
which has no capture groups. The characters classes [a-z] can be modified as required provided they do not include a double-quote, period or right brace.
Demo
Java's regex engine performs the following operations.
(?<=\$\{") # match '${"' in a positive lookbehind
[a-z]+ # match 1+ lowercase letters
(?="\}) # match '"}' in a positive lookahead
| # or
(?<=\$\{) # match '${' in a positive lookbehind
[a-z]+ # match 1+ lowercase letters
\.[a-z]+ # match '.' followed by 1+ lowercase letters
(?:\(\))? # optionally match `()`
(?=\}) # match '}' in a positive lookahead

Java regex (java.util.regex). Search for dollar sign

I have a search string.
When it contains a dollar symbol, I want to capture all characters thereafter, but not include the dot, or a subsequent dollar symbol.. The latter would constitute a subsequent match.
So for either of these search strings...:
"/bla/$V_N.$XYZ.bla";
"/bla/$V_N.$XYZ;
I would want to return:
V_N
XYZ
If the search string contains percent symbols, I also want to return what's between the pair of % symbols.
The following regex seems do the trick for that.
"%([^%]*?)%";
Inferring:
Start and end with a %,
Have a capture group - the ()
have a character class containing anything except a % symbol, (caret infers not a character)
repeated - but not greedily *?
Where some languages allow %1, %2, for capture groups, Java uses backslash\number syntax instead. So, this string compiles and generates output.
I suspect the dollar symbol and dot need escaping, as they are special symbols:
$ is usually end of string
. is a meta sequence for any character.
I have tried using double backslash symbols.. \
Both as character classes .e.g. [^\\.\\$%]
and using OR'd notation %|\\$
in attempts to combine this logic and can't seem to get anything to play ball.
I wonder if another pair of eyes can see how to solve this conundrum!
My attempts so far:
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Main {
public static void main(String[] args) {
String search = "/bla/$V_N.$XYZ.bla";
String pattern = "([%\\$])([^%\\.\\$]*?)\\1?";
/* Either % or $ in first capture group ([%\\$])
* Second capture group - anything except %, dot or dollar sign
* non greedy group ( *?)
* then a backreference to an optional first capture group \\1?
* Have to use two \, since you escape \ in a Java string.
*/
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(search);
List<String> results = new ArrayList<String>();
while (m.find())
{
for (int i = 0; i<= m.groupCount(); i++) {
results.add(m.group(i));
}
}
for (String result : results) {
System.out.println(result);
}
}
}
The following links may be helpful:
An interactive Java playground where you can experiment and copy/paste code.
Regex101
Java RegexTester
Java backreferences (The optional backreference \\1 in the Regex).
Link that summarises Regex special characters often found in languages
Java Regex book EPub link
Regex Info Website
Matcher class in the Javadocs
You may use
String search = "/bla/$V_N.$XYZ.bla";
String pattern = "[%$]([^%.$]*)";
Matcher matcher = Pattern.compile(pattern).matcher(search);
while (matcher.find()){
System.out.println(matcher.group(1));
} // => V_N, XYZ
See the Java demo and the regex demo.
NOTE
You do not need an optional \1? at the end of the pattern. As it is optional, it does not restrict match context and is redundant (as the negated character class cannot already match neither $ nor%)
[%$]([^%.$]*) matches % or $, then captures into Group 1 any zero or more
chars other than %, . and $. You only need Group 1 value, hence, matcher.group(1) is used.
In a character class, neither . nor $ are special, thus, they do not need escaping in [%.$] or [%$].

Regular Expression - Starting with and ending with string

I would like to write a regular expression to match files that starts with "AMDF" or "SB700" and does not end with ".tmp". This will be used in Java.
Code
See regex in use here
^(?:AMDF|SB700).*\.(?!tmp$)[^.]+$
Usage
See code in use here
import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
final String regex = "^(?:AMDF|SB700).*\\.(?!tmp$)[^.]+$";
final String[] files = {
"AMDF123978sudjfadfs.ext",
"SB700afddasjfkadsfs.ext",
"AMDE41312312089fsas.ext",
"SB701fs98dfjasdjfsd.ext",
"AMDF123120381203113.tmp"
};
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
for (String file:files) {
final Matcher matcher = pattern.matcher(file);
if(matcher.matches()) {
System.out.println(matcher.group(0));
}
}
}
}
Results
Input
AMDF123978sudjfadfs.ext
SB700afddasjfkadsfs.ext
AMDE41312312089fsas.ext
SB701fs98dfjasdjfsd.ext
AMDF123120381203113.tmp
Output
Below shows only matches.
AMDF123978sudjfadfs.ext
SB700afddasjfkadsfs.ext
Explanation
^ Assert position at the start of the line
(?:AMDF|SB700) Match either AMDF or SB700 literally
.* Match any character any number of times
\. Match a literal dot . character
(?!tmp$) Negative lookahead ensuring what follows doesn't match tmp literally (asserting the end of the line afterwards so as not to match .tmpx where x can be anything)
[^.]+ Match any character except . one or more times
$ Assert position at the end of the line
Here is another example that works:
^(SB700|AMDF).*(?!\.tmp).{4}$
An approach could be to try a regex using a negative lookahead to assert that the file name does not end on .tmp and use an anchor ^ to make sure that the file name starts with AMDF or SB700 like:
^(?!.*\.tmp$)(?:AMDF|SB700)\w*\.\w+$
Explanation
The beginning of the string ^
A negative lookahead (?!
To assert that the string ends with .tmp .*\.tmp$
A non capturing group which matches AMDF or SB700 (?:AMDF|SB700)
Match a word character zero or more times \w*
Match a dot \.
Match a word character one or more times \w+
The end of the string $
In Java it would look like:
^(?!.*\\.tmp$)(?:AMDF|SB700)\\w*\\.\\w+$
Demo

How to match key/value groups with regular expressions

Provided the following string:
#NAMEONE=any#character#OTHERNAME=any # character#THIRDNAME=even new lines
are possible
How can we match the full name/value pairs like #NAMEONE=any#character?
I am stuck with this regex (#(?:NAMEONE|OTHERNAME|THIRDNAME)=.+?)+ as it only matches #NAMEONE=a, #OTHERNAME=a etc. Using Java.
This would match any character but not of # and also # only if the preceding and following character of # is a non-word character.
"#(?:NAMEONE|OTHERNAME|THIRDNAME)=(?:\\B#\\B|[^#])*"
DEMO
or
"(?s)#(?:NAMEONE|OTHERNAME|THIRDNAME)=.*?(?=#(?:NAMEONE|OTHERNAME|THIRDNAME)=|$)"
DEMO
Here is a bit shorter version based on the uperrcase name for variables:
(#[A-Z]+=.+?)(?=#[A-Z]+=|$)
Explanation:
#[A-Z]+= matches the variable name and the = sign
.+? laziely matches any character
(?=#[A-Z]+=|$) positive look-ahead for variable name or end of string
Java code:
public static void test()
{
String str = "#NAMEONE=any # character#OTHERNAME=any # character#THIRDNAME=even";
Matcher matcher = Pattern.compile("(#[A-Z]+=.+?)(?=#[A-Z]+=|$)").matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
}
prints
#NAMEONE=any # character
#OTHERNAME=any # character
#THIRDNAME=eve
DEMO

Regex to extract strings within delimiter

I am trying to extract string occurences within delimiters (parentheses in this case) but not the ones which are within quotes (single or double). Here is what I have tried - this regex fetches all occurences within parentheses, also the ones which are within quotes (I don't want the ones within quotes)
public class RegexMain {
static final String PATTERN = "\\(([^)]+)\\)";
static final Pattern CONTENT = Pattern.compile(PATTERN);
/**
* #param args
*/
public static void main(String[] args) {
String testString = "Rhyme (Jack) and (Jill) went up the hill on \"(Peter's)\" request.";
Matcher match = CONTENT.matcher(testString);
while(match.find()) {
System.out.println(match.group()); // prints Jack, Jill and Peter's
}
}
}
You could try
public class RegexMain {
static final String PATTERN = "\\(([^)]+)\\)|\"[^\"]*\"";
static final Pattern CONTENT = Pattern.compile(PATTERN);
/**
* #param args
*/
public static void main(String[] args) {
String testString = "Rhyme (Jack) and (Jill) went up the hill on \"(Peter's)\" request.";
Matcher match = CONTENT.matcher(testString);
while(match.find()) {
if(match.group(1) != null) {
System.out.println(match.group(1)); // prints Jack, Jill
}
}
}
}
This pattern will match quoted strings as well as parenthesized ones but only the parenthesized ones will put something in group(1). Since + and * are greedy in regular expressions it will prefer to match "(Peter's)" over (Peter's).
This is a case where you can make elegant use of look-behind and look-ahead operators to achieve what you want. Here is a solution in Python (I always use it for trying out stuff quickly on the command line), but the regular expression should be the same in Java code.
This regex matches content that is preceded by an opening parenthesis using positive look-behind and succeeded by a closing parenthesis using positive look-ahead. But it avoids these matches when the opening parenthesis is preceded by a single or double quote using negative look-behind and when the closing parenthesis is succeeded by a single or double quote using negative look-ahead.
In [1]: import re
In [2]: s = "Rhyme (Jack) and (Jill) went up the hill on \"(Peter's)\" request."
In [3]: re.findall(r"""
...: (?<= # start of positive look-behind
...: (?<! # start of negative look-behind
...: [\"\'] # avoids matching opening parenthesis preceded by single or double quote
...: ) # end of negative look-behind
...: \( # matches opening parenthesis
...: ) # end of positive look-behind
...: \w+ (?: \'\w* )? # matches whatever your content looks like (configure this yourself)
...: (?= # start of positive look-ahead
...: \) # matches closing parenthesis
...: (?! # start of negative look-ahead
...: [\"\'] # avoids matching closing parenthesis succeeded by single or double quote
...: ) # end of negative look-ahead
...: ) # end of positive look-ahead
...: """,
...: s,
...: flags=re.X)
Out[3]: ['Jack', 'Jill']
Note: This is not the final response because I'm not familiar with JAVA but I believe it can still be converted into the JAVA language.
The easiest approach, as far as I'm concerned, is to replace the quoted parts in the string with an empty string, then look for the matches. Hoping you're somewhat familiar with PHP, here's the idea.
$str = "Rhyme (Jack) and (Jill) went up the hill on \" (Peter's)\" request.";
preg_match_all(
$pat = '~(?<=\().*?(?=\))~',
// anything inside parentheses
preg_replace('~([\'"]).*?\1~','',$str),
// this replaces quoted strings with ''
$matches
// and assigns the result into this variable
);
print_r($matches[0]);
// $matches[0] returns the matches in preg_match_all
// [0] => Jack
// [1] => Jill

Categories