Find duplicate char sequences in String by regex in Java - java

I have an input string and I want to use regex to check if this string has = and $, e.g:
Input:
name=alice$name=peter$name=angelina
Output: true
Input:
name=alicename=peter$name=angelina
Output: false
My regex does't work:
Pattern pattern = Pattern.compile("([a-z]*=[0-9]*$])*");
Matcher matcher = pattern.matcher("name=rob$name=bob");

With .matches(), you may use
Pattern pattern = Pattern.compile("\\p{Lower}+=\\p{Lower}+(?:\\$\\p{Lower}+=\\p{Lower}+)*"); // With `matches()` to ensure whole string match
Details
\p{Lower}+ - 1+ lowercase letters (use \p{L} to match any and \p{Alpha} to only match ASCII letters)
= - a = char
\p{Lower}+ - 1+ lowercase letters
(?:\\$\\p{Lower}+=\\p{Lower}+)* - 0 or more occurrences of:
\$ - a $ char
\p{Lower}+=\p{Lower}+ - 1+ lowercase letters, = and 1+ lowercase letters.
See the Java demo:
List<String> strs = Arrays.asList("name=alice$name=peter$name=angelina", "name=alicename=peter$name=angelina");
Pattern pattern = Pattern.compile("\\p{Lower}+=\\p{Lower}+(?:\\$\\p{Lower}+=\\p{Lower}+)*");
for (String str : strs)
System.out.println("\"" + str + "\" => " + pattern.matcher(str).matches());
Output:
"name=alice$name=peter$name=angelina" => true
"name=alicename=peter$name=angelina" => false

You have extra ] and need to escape $ to use it as a character though you also need to match the last parameter without $ so use
([a-z]*=[a-z0-9]*(\$|$))*
• [a-z]*= : match a-z zero or more times, match = character
• [a-z0-9]*(\$|$): match a-z and 0-9, zero or more times, followed by either $ character or end of match.
• ([a-z]*=[a-z0-9]*(\$|$))*: match zero or more occurences of pairs.
Note: use + (one or more matches) instead of * for strict matching as:
([a-z]+=[a-z0-9]+(\$|$))*

Related

Java regular expression to match the pattern for equals sign between two or more words that has an exclamatory as a delimeter between the ranges

I'm trying to write a regular expression that matches all the following given pattern examples -
1) Name=John!Age=25!Gender=M
2) Name=John!Name2=Sam!Name3=Josh
3) Name=John!Name2=Sam!Name3=Josh!
Basically there has to be an equals to sign between two words or numbers followed by an exclamatory and then the pattern repeats. the pattern can only end with an exclamatory or any alphabets or numbers or spaces but not the 'equals' sign or any other special characters
So these examples should not be matched -
1) Name!John=Name2+Sam=
2) Name=John=
3) Name=John!!
4) Name=John-
I'm very new to regular expressions and I just learnt a few basic things and I have this following regular expression written so far which doesn't fully satisfy my requirement ((?:\w+|=)*)!
I'm still trying to modify my regular expression to match my requirement, any help/guidance will be very helpful.
You can use
^(?:\w+=\w+!)*\w+=\w+!?$
In Java
String regex = "^(?:\\w+=\\w+!)*\\w+=\\w+!?$";
The pattern matches:
^ Start of string
(?:\w+=\w+!)* Optionally repeat 1+ word chars = 1+ word chars and !
\w+=\w+!? Match 1+ word chars = 1+ word chars and optional !
$ End of string
Regex demo
String[] strings = {
"Name=John!Age=25!Gender=M",
"Name=John!Name2=Sam!Name3=Josh",
"Name=John!Name2=Sam!Name3=Josh!",
"N=J!A=25!",
"a=ba=b",
"Name!John=Name2+Sam=",
"Name=John=",
"Name=John!!",
"Name=John-"
};
for (String s : strings) {
if (s.matches("(?:\\w+=\\w+!)*\\w+=\\w+!?")) {
System.out.println("Match: " + s);
} else {
System.out.println("No match: " + s);
}
}
Output
Match: Name=John!Age=25!Gender=M
Match: Name=John!Name2=Sam!Name3=Josh
Match: Name=John!Name2=Sam!Name3=Josh!
Match: N=J!A=25!
No match: a=ba=b
No match: Name!John=Name2+Sam=
No match: Name=John=
No match: Name=John!!
No match: Name=John-

Java regex repeating capture groups

Considering the following string: "${test.one}${test.two}" I would like my regex to return two matches, namely "test.one" and "test.two". To do that I have the following snippet:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTester {
private static final Pattern pattern = Pattern.compile("\\$\\{((?:(?:[A-z]+(?:\\.[A-z0-9()\\[\\]\"]+)*)+|(?:\"[\\w/?.&=_\\-]*\")+)+)}+$");
public static void main(String[] args) {
String testString = "${test.one}${test.two}";
Matcher matcher = pattern.matcher(testString);
while (matcher.find()) {
for (int i = 0; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
}
}
I have some other stuff in there as well, because I want this to also be a valid match ${test.one}${"hello"}.
So, basically, I just want it to match on anything inside of ${} as long as it either follows the format: something.somethingelse (alphanumeric only there) or something.somethingElse() or "something inside of quotations" (alphanumeric plus some other characters). I have the main regex working, or so I think, but when I run the code, it finds two groups,
${test.two}
test.two
I want the output to be
test.one
test.two
Basically, your regex main problem is that it matches only at the end of string, and you match many more chars that just letters with [A-z]. Your grouping also seem off.
If you load your regex at regex101, you will see it matches
\$\{
( - start of a capturing group
(?: - start of a non-capturing group
(?:[A-z]+ - start of a non-capturing group, and it matches 1+ chars between A and z (your first mistake)
(?:\.[A-z0-9()\[\]\"]+)* - 0 or more repetitions of a . and then 1+ letters, digits, (, ), [, ], ", \, ^, _, and a backtick
)+ - repeat the non-capturing group 1 or more times
| - or
(?:\"[\w/?.&=_\-]*\")+ - 1 or more occurrences of ", 0 or more word, /, ?, ., &, =, _, - chars and then a "
)+ - repeat the group pattern 1+ times
) - end of non-capturing group
}+ - 1+ } chars
$ - end of string.
To match any occurrence of your pattern inside a string, you need to use
\$\{(\"[^\"]*\"|\w+(?:\(\))?(?:\.\w+(?:\(\))?)*)}
See the regex demo, get Group 1 value after a match is found. Details:
\$\{ - a ${ substring
(\"[^\"]*\"|\w+(?:\(\))?(?:\.\w+(?:\(\))?)*) - Capturing group 1:
\"[^\"]*\" - ", 0+ chars other than " and then a "
| - or
\w+(?:\(\))? - 1+ word chars and an optional () substring
(?:\.\w+(?:\(\))?)* - 0 or more repetitions of . and then 1+ word chars and an optional () substring
} - a } char.
See the Java demo:
String s = "${test.one}${test.two}\n${test.one}${test.two()}\n${test.one}${\"hello\"}";
Pattern pattern = Pattern.compile("\\$\\{(\"[^\"]*\"|\\w+(?:\\(\\))?(?:\\.\\w+(?:\\(\\))?)*)}");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
}
Output:
test.one
test.two
test.one
test.two()
test.one
"hello"
You could use the regular expression
(?<=\$\{")[a-z]+(?="\})|(?<=\$\{)[a-z]+\.[a-z]+(?:\(\))?(?=\})
which has no capture groups. The characters classes [a-z] can be modified as required provided they do not include a double-quote, period or right brace.
Demo
Java's regex engine performs the following operations.
(?<=\$\{") # match '${"' in a positive lookbehind
[a-z]+ # match 1+ lowercase letters
(?="\}) # match '"}' in a positive lookahead
| # or
(?<=\$\{) # match '${' in a positive lookbehind
[a-z]+ # match 1+ lowercase letters
\.[a-z]+ # match '.' followed by 1+ lowercase letters
(?:\(\))? # optionally match `()`
(?=\}) # match '}' in a positive lookahead

Regulare expression for finding words started with # and end with whitespaces or new line

I am looking for a Java regex to find usernames in a text.
Usernames always start with #. No whitespace is acceptable after #. Usernames are combinations of upper- or lowercase letters, digits, ., _, -.
So, the regex should match words that start with # and end with whitespace or newline.
For example, the text hi #anyOne.2 I'm looking for #Name_13
#name14 #_n.a.m.e-15 but not # name16 contains the following matches: anyOne.2, Name_13, name14, _n.a.m.e-15.
I am using
String pattern = "#[^\\s]*(\\w+)";
You may use
\B#(\S+)
See the regex demo.
Details
\B - a non-word boundary, the char that is right before the current location must be a non-word char (or start of string)
# - a # char
(\S+) - Capturing group 1: one or more non-whitespace characters.
See Java sample usage:
String text = "hi #anyOne.2 I'm looking for #Name_13 #name14 #_n.a.m.e-15 but not # name16";
String pattern = "\\B#(\\S+)";
// Java 9+
String[] results = Pattern.compile(pattern).matcher(text).results().flatMap(grps -> Stream.of(grps.group(1))).toArray(String[]::new);
System.out.println(Arrays.toString(results)); // => [anyOne.2, Name_13, name14, _n.a.m.e-15]
// Java 8 (include import java.util.stream.*)
Matcher m = Pattern.compile(pattern).matcher(text);
List<String> strs = new ArrayList<>();
while(m.find()) {
strs.add(m.group(1));
}
System.out.println(strs); // => [anyOne.2, Name_13, name14, _n.a.m.e-15]

java regex minimum character not working

^[a-zA-Z1-9][a-zA-Z1-9_\\.-]{2,64}[^\\.-]$
this is the regex that should match the following conditions
should start only with alphabets and numbers ,
contains alphabets numbers ,dot and hyphen
should not end with hyphen
it works for all conditions but when i try with three character like
vu6
111
aaa
after four characters validation is working properly did i miss anything
Reason why your Regex doesn't work:
Hope breaking it into smaller pieces will help:
^[a-zA-Z1-9][a-zA-Z1-9_\\.-]{2,64}[^\\.-]$
[a-zA-Z1-9]: Will match a single alphanumeric character ( except for _ )
[a-zA-Z1-9_\\.-]{2,64}: Will match alphanumeric character + "." + -
[^\\.-]: Will expect exactly 1 character which should not be "." or "-"
Solution:
You can use 2 simple regex:
This answer assumes that the length of the string you want to match lies between [3-65] (both inclusive)
First, that will actually validate the string
[a-zA-Z1-9][a-zA-Z1-9_\\.-]{2,64}
Second, that will check the char doesn't end with ".|-"
[^\\.-]$
In Java
Pattern pattern1 = Pattern.compile("^[a-zA-Z1-9][a-zA-Z1-9_\\.-]{2,64}$");
Pattern pattern2 = Pattern.compile("[^\\.-]$");
Matcher m1 = pattern1.matcher(input);
Matcher m2 = pattern1.matcher(input);
if(m1.find() && m2.find()) {
System.out.println("found");
}

Regex java a regular expression to extract string except the last number

How to extract all characters from a string without the last number (if exist ) in Java, I found how to extract the last number in a string using this regex [0-9.]+$ , however I want the opposite.
Examples :
abd_12df1231 => abd_12df
abcd => abcd
abcd12a => abcd12a
abcd12a1 => abcd12a
What you might do is match from the start of the string ^ one or more word characters \w+ followed by not a digit using \D
^\w+\D
As suggested in the comments, you could expand the characters you want to match using a character class ^[\w-]+\D or if you want to match any character you could use a dot ^.+\D
If you want to remove one or more digits at the end of the string, you may use
s = s.replaceFirst("[0-9]+$", "");
See the regex demo
To also remove floats, use
s = s.replaceFirst("[0-9]*\\.?[0-9]+$", "");
See another regex demo
Details
(?s) - a Pattern.DOTALL inline modifier
^ - start of string
(.*?) - Capturing group #1: any 0+ chars other than line break chars as few as possible
\\d*\\.?\\d+ - an integer or float value
$ - end of string.
Java demo:
List<String> strs = Arrays.asList("abd_12df1231", "abcd", "abcd12a", "abcd12a1", "abcd12a1.34567");
for (String str : strs)
System.out.println(str + " => \"" + str.replaceFirst("[0-9]*\\.?[0-9]+$", "") + "\"");
Output:
abd_12df1231 => "abd_12df"
abcd => "abcd"
abcd12a => "abcd12a"
abcd12a1 => "abcd12a"
abcd12a1.34567 => "abcd12a"
To actually match a substring from start till the last number, you may use
(?s)^(.*?)\d*\.?\d+$
See the regex demo
Java code:
String s = "abc234 def1.566";
Pattern pattern = Pattern.compile("(?s)^(.*?)\\d*\\.?\\d+$");
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println(matcher.group(1));
}
With this Regex you could capture the last digit(s)
\d+$
You could save that digit and do a string.replace(lastDigit,"");

Categories