Regular Expression for finding Repeated Words [duplicate] - java

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
**(?i)\\b([a-z]+)\\b(?:\\s+\\1\\b)+**
I understand what each symbol means but when the symbols are combined...I can't figure out. The confusion part is (?:\s+\1\b)+. What does it mean??? Can you explain to me?? Thanks for your time!

Individual parts of (?:\s+\1\b)+ have the following meaning:
(?:...) - a non-capturing group. It contains:
\s+ - a non-empty sequence of white space chars.
\1 - a backreference to capturing group #1 (\b([a-z]+)\b).
It means that you want to have here just the same chars (the repeated word)
which has been just captured.
\b - a word boundary, in this case transition from word area to space area.
After the whole above group there is a + sign, meaning that you want to
match as many repeating words as possible.

Related

What is the functionality of this regex? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I am recently learning regex and i am not quite sure how the following regex works:
str.replaceAll("(\\w)(\\w*)", "$2$1ay");
This allows us to do the following:
input string: "Hello World !"
return string: "elloHay orldWay !"
From what I know: w is supposed to match all word characters including 0-9 and underscore and $ matches stuff at the end of string.
In the replaceAll method, the first parameter can be a regex. It matches all words in the string with the regex and changes them to the second parameter.
In simple cases replaceAll works like this:
str = "I,am,a,person"
str.replaceAll(",", " ") // I am a person
It matched all the commas and replaced them with a space.
In your case, the match is every alphabetic character(\w), followed by a stream of alphabetic characters(\w*).
The () around \w is to group them. So you have two groups, the first letter and the remaining part. If you use regex101 or some similar website you can see a visualization of this.
Your replacement is $2 -> Second group, followed by $1(remaining part), followed by ay.
Hope this clears it up for you.
Enclosing a regex expression in brackets () will make it a Capturing group.
Here you have 2 capturing groups , (\w) captures a single word character, and (\w*) catches zero or more.
$1 and $2 are used to refer to the captured groups, first and second respectively.
Also replaceAll takes each word individually.
So in this example in 'Hello' , 'H' is the first captured groups and 'ello' is the second. It's replaced by a reordered version - $2$1 which is basically swapping the captured groups.
So you get '$2$1ay' as 'elloHay'
The same for the next word also.

Can someone explain regex this regex expression [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
Hi there I'm new to Java and was going through some information on regex and I couldn't comprehend this the following expression:
"^[a-zA-Z\-]+$"
Could someone be kind enough to explain each and every character in this expression?
Thank you.
^ $ # Check if the entire string matches,
[ ]+ # with one or more of the following characters:
a-z # Any lowercase (ASCII) letter
A-Z # Any uppercase (ASCII) letter
\- # Or an "-" (the `\` is used to escape it)
Or in short: this regex checks if a given string consists solely of (ASCII) letters and/or -, and is non-empty.
Try it online.
[a-zA-Z] means all characters a through or A through Z, inclusive.
The "\" inside the square bracket is used as an escape character.
Symbol "+" in the end signified that your regex can occur once or more times.

What does regular expression (?<=[\\S])[\\S]*\\s* do? [duplicate]

This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 3 years ago.
I saw a regex expression in this other stackoverflow question but I didn't understand the meaning of each part.
String[] split = s.split("(?<=[\\S])[\\S]*\\s*");
The result of this is the Acronym of a sentence.
In order to understand a chaining regex expression should I start reading it from left to right or viceversa? How can I identify (or limit) each part?
Thank you for your answers.
(?<=[\\S]) states that the match should be preceded by \\S, that is, anything except for a space.
[\\S]* states that the regex should match zero or more non-space characters
\\s* matches zero or more spaces.
In essence, the regex finds a non-space character, and matches all non-space characters in front of it, along with the spaces after them.
The regex matches ohandas<space><space> and aramchand<space> from Mohandas Karamchand G
Thus, after using these matches to split the string, you end up with {"M", "K", "G"}
Note the two spaces that the regex matches after Mohandas, because the \\s* part matches zero or more spaces
To clarify suspircius regular expression you may use the websites https://regexr.com/ or https://regex101.com/
Both mark parts with colors and explain what they do. But you have to replace the double backslashes by single backslashes.

Regex for arithmetic expression [duplicate]

This question already has answers here:
In a java regex, how can I get a character class e.g. [a-z] to match a - minus sign?
(5 answers)
Closed 5 years ago.
The regex -?\d+ [+|-|*|/] -?\d+ matches expression 1 + 3 without any problems also 1 + -2 without any problems, but I don't know why it does not match 1 - 2. Could you explaing why it does not match the - char correctly?
By my regex I wanted to achieve:
optional - at the beginning
string of digits
whitespace then operator then whitespace
optional - before second stringof digits
A - unescaped in the middle of a character class creates a range. You can escape it or move it to the start or end of the character class. You also don't need/want the |s I'd guess.
You currently make a range between | and | which doesn't really make sense. You also could just use grouping instead of a character class.
(\+|-|\*|/)
With this approach the + and * need to be escaped because they are quantifiers when outside a character class.

Difference between "\\d+" and "\\d++" in java regex [duplicate]

This question already has answers here:
What is the difference between [0-9]+ and [0-9]++?
(2 answers)
Closed 2 years ago.
In java, what's the difference between "\\d+" and "\\d++"?
I know ++ is a possessive quantifier, but what's the difference in matching the numeric string?
What string can match "\\d+" but can't with "\\d++"?
Possessive quantifier seems to be significant with quantifier ".*" only. Is it true?
Possessive quantifiers will not back off, even if some backing off is required for the overall match.
So, for example, the regex \d++0 can never match any input, because \d++ will match all digits, including the 0 needed to match the last symbol of the regex.
\d+ Means:
\d means a digit (Character in the range 0-9), and + means 1 or more times. So, \d+ is 1 or more digits.
\d++ Means from Quantifiers
This is called the possessive quantifiers and they always eat the entire input string, trying once (and only once) for a match. Unlike the greedy quantifiers, possessive quantifiers never back off, even if doing so would allow the overall match to succeed.

Categories