This question already has answers here:
Remove repeating character
(2 answers)
Closed 7 years ago.
What is the regex pattern to determine if a string solely consists of a single repeating character?
e.g.
"aaaaaaa" = true "aaabbbb" = false "$$$$$$$" =
true
This question checks if a string only contains repeating characters (e.g. "aabb") however I need to determine if it is a single repeating character.
You can try a backreference
^(.)\1{1,}$
Demo
Pattern Explanation:
^ the beginning of the string
( group and capture to \1:
. any character except \n
) end of \1
\1{1,} what was matched by capture \1 (at least 1 times)
$ the end of the string
Backreferences match the same text as previously matched by a capturing group. The backreference \1 (backslash one) references the first capturing group. \1 matches the exact same text that was matched by the first capturing group.
In Java you can try
"aaaaaaaa".matches("(.)\\1+") // true
There is no need for ^ and $ because String.matches() looks for whole string match.
this really depends on your language but in general this would match a line with all the same character.
^(.)\1+$
Regex101 Example
^ assert position at start of a line
1st Capturing group (.)
\1+ matches the same text as most recently matched by the 1st capturing group
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
$ assert position at end of a line
Related
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I am recently learning regex and i am not quite sure how the following regex works:
str.replaceAll("(\\w)(\\w*)", "$2$1ay");
This allows us to do the following:
input string: "Hello World !"
return string: "elloHay orldWay !"
From what I know: w is supposed to match all word characters including 0-9 and underscore and $ matches stuff at the end of string.
In the replaceAll method, the first parameter can be a regex. It matches all words in the string with the regex and changes them to the second parameter.
In simple cases replaceAll works like this:
str = "I,am,a,person"
str.replaceAll(",", " ") // I am a person
It matched all the commas and replaced them with a space.
In your case, the match is every alphabetic character(\w), followed by a stream of alphabetic characters(\w*).
The () around \w is to group them. So you have two groups, the first letter and the remaining part. If you use regex101 or some similar website you can see a visualization of this.
Your replacement is $2 -> Second group, followed by $1(remaining part), followed by ay.
Hope this clears it up for you.
Enclosing a regex expression in brackets () will make it a Capturing group.
Here you have 2 capturing groups , (\w) captures a single word character, and (\w*) catches zero or more.
$1 and $2 are used to refer to the captured groups, first and second respectively.
Also replaceAll takes each word individually.
So in this example in 'Hello' , 'H' is the first captured groups and 'ello' is the second. It's replaced by a reordered version - $2$1 which is basically swapping the captured groups.
So you get '$2$1ay' as 'elloHay'
The same for the next word also.
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
Can someone help me with the following Java regex expression? I've done some research but I'm having a hard time putting everything together.
The regex:
"^-?\\d+$"
My understandning of what each symbol does:
" = matches the beginning of the line
- = indicates a range
? = does not occur or occurs once
\\d = matches the digits
+ = matches one or more of the previous thing.
$ = matches end of the line
Is the regex saying it only want matches that start or end with digits? But where do - and ? come in?
- only indicates a range if it's within a character class (i.e. square brackets []). Otherwise, it's a normal character like any other. With that in mind, this regex matches the following examples:
"-2"
"3"
"-700"
"436"
That is, a positive or negative integer: at least one digit, optionally preceded by a minus sign.
Some regex is composed, as you have now, the correct way to read your regex is :
^ start of word
-? optional minus character
\\d+ one or more digits
$ end of word
This regex match any positive or negative numbers, like 0, -15, 558, -19663, ...
Fore details check this good post Reference - What does this regex mean?
"^-?\\d+$" is not a regex, it's a Java string literal.
Once the compiler has parsed the string literal, the string value is ^-?\d+$, which is a regex matching like this:
^ Matches beginning of input
- Matches a minus sign
? Makes previous match (minus sign) optional
\d Matches a digit (0-9)
+ Makes previous match (digit) match repeatedly (1 or more times)
$ Matches end of input
All-in-all, the regex matches a positive or negative integer number of unlimited length.
Note: A - only denotes a range when inside a [] character class, e.g. [4-7] is the range of characters between '4' and '7', while [3-] and [-3] are not ranges since the start/end value is missing, so they both just match a 3 or - character.
This question already has answers here:
Java: splitting a comma-separated string but ignoring commas in quotes
(12 answers)
Closed 7 years ago.
I have a string like
String str = "1000,\"1123\",aabb,\"aa,bb\",test,\"abcd,\"";
I want to extract the substring by the delimiter comma ',' except for the comma which is marked with quotation marks. For the above example, I want to get the array is
1000 1123 abcd aa,bb test abcd,
I tried to use the regular expression, but I failed to find it. Please tell me how should I extract it. Many thanks.
(,\\"|\",)+ regex can do this what you want
str.replaceAll("(,\\\"|\\\",)+", " ");
See it in action
Regex description:
1st Capturing group (,\\"|\",)+
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
Note: A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data
1st Alternative: ,\\"
, matches the character , literally
\\ matches the character \ literally
" matches the characters " literally
2nd Alternative: \",
\" matches the character " literally
, matches the character , literally
g modifier: global. All matches (don't return on first match)
Is it possible to exclude the use of certain characters if another character has been found already?
For example, in a phone number field 123-456-7890 and 123.456.7890 are valid, but 123-456.7890 is not.
at the minute i have:
static String pattern = "\\d{3}[-.]\\d{3}[-.]\\d{4}";
How can this be improved to fulfill the above requirement?
To clarify, it will be used in a string, which will be compiled to a Pattern object:
Pattern p = Pattern.compile(pattern);
Then used in a Matcher:
Matcher m = p.matcher(phoneNumber);
if(m.find()){
//do stuff
}
You can try with back reference that matches the same text as previously matched by a capturing group.
You need to add - and . in a capturing group using (...) that can be referred in next match using \index_of_group
\d{3}([-.])\d{3}\1\d{4}
Captured Group 1----^^^^ ^^-------- Back Reference first matched group
Here is online demo
Sample code:
System.out.print("123-456-7890".matches("^\\d{3}([-.])\\d{3}\\1\\d{4}$"));//true
System.out.print("123.456.7890".matches("^\\d{3}([-.])\\d{3}\\1\\d{4}$"));//true
System.out.print("123-456.7890".matches("^\\d{3}([-.])\\d{3}\\1\\d{4}$"));//false
Pattern explanation:
\d{3} digits (0-9) (3 times)
( group and capture to \1:
[-.] any character of: '-', '.'
) end of \1
\d{3} digits (0-9) (3 times)
\1 what was matched by capture \1
\d{4} digits (0-9) (4 times)
I've already gone through:
Regex to match four repeated letters in a string using a Java pattern
and
Regular expression to match any character being repeated more than 10 times
But they aren't useful in my case. They are fine if I just want to check if a string is containing repeated characters (like 1111, abccccd, 12aaaa3b, etc.). What I want is to check if string is comprising entirely of repeated characters only i.e. aabb111, 1111222, 11222aaa, etc.
Can anyone help me out with this?
Use ((.)\2+)+ as pattern:
String pattern = "((.)\\2+)+";
System.out.println("a".matches(pattern)); // false
System.out.println("1aaa".matches(pattern)); // false
System.out.println("aa".matches(pattern)); // true
System.out.println("aabb111".matches(pattern)); // true
System.out.println("1111222".matches(pattern)); // true
System.out.println("11222aaa".matches(pattern)); // true
System.out.println("etc.".matches(pattern)); // false
About the pattern:
(...): capture matched part as group. (starting from 1)
((.)\2+)+
^^
|+----- group 2
+----- group 1
(.): match any character (except newline) and capture it as group 2 (because it come after enclosing parenthesis).
\2: backreference to the matched group. If (.) matched a character x, \2 matches another x (not any character, but only x).
PATTERN+: matches one or more matches of PATTERN.
(.)\2+: match repeating characters greedy.