Java RegEx back slash Number [duplicate] - java

This question already has answers here:
What's the meaning of a number after a backslash in a regular expression?
(2 answers)
Closed 8 years ago.
What does it mean to have a \number in a regex in java.
Let's say I have something like \1 or \2. What does this mean and how is it used?
An example would be really helpful.
Thanks

Backreferences match the same text as previously matched by a
capturing group. Suppose you want to match a pair of opening and
closing HTML tags, and the text in between. By putting the opening tag
into a backreference, we can reuse the name of the tag for the closing
tag. Here's how:
<([A-Z][A-Z0-9]*)\b[^>]*>.*?</\1>
This regex contains only one pair of parentheses, which capture the string
matched by
[A-Z][A-Z0-9]*
The backreference \1 (backslash one)
references the first capturing group. \1 matches the exact same text
that was matched by the first capturing group. The / before it is a
literal character. It is simply the forward slash in the closing HTML
tag that we are trying to match.
For more details and examples check:
http://www.regular-expressions.info/backref.html

\ usually is used at the start of the construction of a match.
It also represents an escape character.

Related

What is the functionality of this regex? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I am recently learning regex and i am not quite sure how the following regex works:
str.replaceAll("(\\w)(\\w*)", "$2$1ay");
This allows us to do the following:
input string: "Hello World !"
return string: "elloHay orldWay !"
From what I know: w is supposed to match all word characters including 0-9 and underscore and $ matches stuff at the end of string.
In the replaceAll method, the first parameter can be a regex. It matches all words in the string with the regex and changes them to the second parameter.
In simple cases replaceAll works like this:
str = "I,am,a,person"
str.replaceAll(",", " ") // I am a person
It matched all the commas and replaced them with a space.
In your case, the match is every alphabetic character(\w), followed by a stream of alphabetic characters(\w*).
The () around \w is to group them. So you have two groups, the first letter and the remaining part. If you use regex101 or some similar website you can see a visualization of this.
Your replacement is $2 -> Second group, followed by $1(remaining part), followed by ay.
Hope this clears it up for you.
Enclosing a regex expression in brackets () will make it a Capturing group.
Here you have 2 capturing groups , (\w) captures a single word character, and (\w*) catches zero or more.
$1 and $2 are used to refer to the captured groups, first and second respectively.
Also replaceAll takes each word individually.
So in this example in 'Hello' , 'H' is the first captured groups and 'ello' is the second. It's replaced by a reordered version - $2$1 which is basically swapping the captured groups.
So you get '$2$1ay' as 'elloHay'
The same for the next word also.

What does regular expression (?<=[\\S])[\\S]*\\s* do? [duplicate]

This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 3 years ago.
I saw a regex expression in this other stackoverflow question but I didn't understand the meaning of each part.
String[] split = s.split("(?<=[\\S])[\\S]*\\s*");
The result of this is the Acronym of a sentence.
In order to understand a chaining regex expression should I start reading it from left to right or viceversa? How can I identify (or limit) each part?
Thank you for your answers.
(?<=[\\S]) states that the match should be preceded by \\S, that is, anything except for a space.
[\\S]* states that the regex should match zero or more non-space characters
\\s* matches zero or more spaces.
In essence, the regex finds a non-space character, and matches all non-space characters in front of it, along with the spaces after them.
The regex matches ohandas<space><space> and aramchand<space> from Mohandas Karamchand G
Thus, after using these matches to split the string, you end up with {"M", "K", "G"}
Note the two spaces that the regex matches after Mohandas, because the \\s* part matches zero or more spaces
To clarify suspircius regular expression you may use the websites https://regexr.com/ or https://regex101.com/
Both mark parts with colors and explain what they do. But you have to replace the double backslashes by single backslashes.

Difficulties finding a Java regex equivalent to a JavaScript regex

So, what I am trying to do is:
I have a string:
Special Skills:
someText
could range
through multiple lines
Special Abilities:
another
someText
Background:
multiline
text
I've already managed to come up with the following regex. It works perfectly in JavaScript according to regexr.com, but not in Java, according to Intellij's built-in Check-Regex and freeformatter.com.
Special Abilities:\n(.*\n)+?(Special Skills:|Background:)
The expression should, first off, extract
Special Skills:
someText
could range
through multiple lines
Mind that the both the sections "Special Abilities" and "Background" are optional.
Since I am kindoff stuck here, any help would be greatly appreciated!
You may add the end-of-string(line) anchor $ as an alternative to the alternation group at the end of the pattern, make sure the . matches carriage returns with (?d) Pattern.UNIX_LINES embedded flag and wrap (.*\n)+? with a capturing group to capture all text it matches into 1 group (and the (.*\n)+? can be changed into a non-capturing group):
(?d)Special Abilities:\r?\n((?:.*\n)*?)(Special Skills:|Background:|$)
See this regex demo.
Details
(?d) - . now matches any char but a newline
Special Abilities: - a literal text
\r?\n - a CRLF or LF line ending
((?:.*\n)*?) - Group 1: zero or more, but as few as possible, repetitionsof 0+ chars other than LF symbol and then an LF symbol
(Special Skills:|Background:|$) - either of the three alternatives: Special Skills:, Background: or end of string ($).
An alternative expression:
(?ms)Special Abilities:\r?\n(.*?)(^Special Skills:|^Background:|\Z)
See this regex demo
Here, (?ms) defines the multiline and dotall modes (^ will match start of a line here and . will match all symbols). Instead of $, we need to use \Z - end of string anchor.

Escape Sequence vs. Whitespace Character (\s) [duplicate]

This question already has answers here:
Version difference? Regex Escape in Java
(2 answers)
Closed 1 year ago.
Are escape sequences and whitespace characters the same thing? I'm not sure what else to write here but Stackoverflow said the first sentence is not enough so I'm typing this second sentence for no reason at all but that so this post will go through.
There are a few escape sequences specified in Java, of which \s is not part. The \s is recognized as whitespace in regular expressions, where it is a predefined character class.
Check the following sections from the Java Tutorial:
Escape Sequences
Predefined Character Classes

Regex to determine if string is a single repeating character [duplicate]

This question already has answers here:
Remove repeating character
(2 answers)
Closed 7 years ago.
What is the regex pattern to determine if a string solely consists of a single repeating character?
e.g.
"aaaaaaa" = true "aaabbbb" = false "$$$$$$$" =
true
This question checks if a string only contains repeating characters (e.g. "aabb") however I need to determine if it is a single repeating character.
You can try a backreference
^(.)\1{1,}$
Demo
Pattern Explanation:
^ the beginning of the string
( group and capture to \1:
. any character except \n
) end of \1
\1{1,} what was matched by capture \1 (at least 1 times)
$ the end of the string
Backreferences match the same text as previously matched by a capturing group. The backreference \1 (backslash one) references the first capturing group. \1 matches the exact same text that was matched by the first capturing group.
In Java you can try
"aaaaaaaa".matches("(.)\\1+") // true
There is no need for ^ and $ because String.matches() looks for whole string match.
this really depends on your language but in general this would match a line with all the same character.
^(.)\1+$
Regex101 Example
^ assert position at start of a line
1st Capturing group (.)
\1+ matches the same text as most recently matched by the 1st capturing group
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
$ assert position at end of a line

Categories