What does regular expression (?<=[\\S])[\\S]*\\s* do? [duplicate] - java

This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 3 years ago.
I saw a regex expression in this other stackoverflow question but I didn't understand the meaning of each part.
String[] split = s.split("(?<=[\\S])[\\S]*\\s*");
The result of this is the Acronym of a sentence.
In order to understand a chaining regex expression should I start reading it from left to right or viceversa? How can I identify (or limit) each part?
Thank you for your answers.

(?<=[\\S]) states that the match should be preceded by \\S, that is, anything except for a space.
[\\S]* states that the regex should match zero or more non-space characters
\\s* matches zero or more spaces.
In essence, the regex finds a non-space character, and matches all non-space characters in front of it, along with the spaces after them.
The regex matches ohandas<space><space> and aramchand<space> from Mohandas Karamchand G
Thus, after using these matches to split the string, you end up with {"M", "K", "G"}
Note the two spaces that the regex matches after Mohandas, because the \\s* part matches zero or more spaces

To clarify suspircius regular expression you may use the websites https://regexr.com/ or https://regex101.com/
Both mark parts with colors and explain what they do. But you have to replace the double backslashes by single backslashes.

Related

Find strings with regex expression [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I want to search into Java packages using the following expression:
com.company.*
Test example: https://regex101.com/r/tHTQd9/2
But when I use it into Java code it's not finding anything. Do I need to put some escape characters for .?
The following expression would work:
\bcom\.company\.\w[\w\.]*\b
Match between word-boundaries
Use literal dot characters by escaping
1 alphanumeric (or underscore) followed by 0 or more alphanumerics or dots
Pattern regex = Pattern.compile("\\bcom\\.company\\.\\w[\\w\\.]*\\b");
If you are looking for a word or more in the last sequence you can try:
com\\.company\\.\w+
Or, even more generic (any other character or more):
com\\.company\\..+
Please remember that this is quite generic and prone to errors.
If you provide a more detailed explanation or constraints we can help building a better RegEx.
Why double backslash in Java?
We know that the backslash character is an escape character in Java
String literals as well. Therefore, we need to double the backslash
character when using it to precede any character (including the \
character itself).
Source
In java to escape dot (.) you need to append double backslash (\\) so your regex will be like this:
com\\.company\\.*
Why double backslash is needed:
As dot(.) is a special symbol in regex so you need to escape it using a backslash (\) but as backslash also works as an escape character in java so it will be removed by java after processing the string. In order to preserve it, we need to add another backslash (\)
Regex string you will see
com\\.company\\.*
String after java processed it which will be the input as regex
com\.company\.*

What is the functionality of this regex? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I am recently learning regex and i am not quite sure how the following regex works:
str.replaceAll("(\\w)(\\w*)", "$2$1ay");
This allows us to do the following:
input string: "Hello World !"
return string: "elloHay orldWay !"
From what I know: w is supposed to match all word characters including 0-9 and underscore and $ matches stuff at the end of string.
In the replaceAll method, the first parameter can be a regex. It matches all words in the string with the regex and changes them to the second parameter.
In simple cases replaceAll works like this:
str = "I,am,a,person"
str.replaceAll(",", " ") // I am a person
It matched all the commas and replaced them with a space.
In your case, the match is every alphabetic character(\w), followed by a stream of alphabetic characters(\w*).
The () around \w is to group them. So you have two groups, the first letter and the remaining part. If you use regex101 or some similar website you can see a visualization of this.
Your replacement is $2 -> Second group, followed by $1(remaining part), followed by ay.
Hope this clears it up for you.
Enclosing a regex expression in brackets () will make it a Capturing group.
Here you have 2 capturing groups , (\w) captures a single word character, and (\w*) catches zero or more.
$1 and $2 are used to refer to the captured groups, first and second respectively.
Also replaceAll takes each word individually.
So in this example in 'Hello' , 'H' is the first captured groups and 'ello' is the second. It's replaced by a reordered version - $2$1 which is basically swapping the captured groups.
So you get '$2$1ay' as 'elloHay'
The same for the next word also.

Regular Expression for finding Repeated Words [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
**(?i)\\b([a-z]+)\\b(?:\\s+\\1\\b)+**
I understand what each symbol means but when the symbols are combined...I can't figure out. The confusion part is (?:\s+\1\b)+. What does it mean??? Can you explain to me?? Thanks for your time!
Individual parts of (?:\s+\1\b)+ have the following meaning:
(?:...) - a non-capturing group. It contains:
\s+ - a non-empty sequence of white space chars.
\1 - a backreference to capturing group #1 (\b([a-z]+)\b).
It means that you want to have here just the same chars (the repeated word)
which has been just captured.
\b - a word boundary, in this case transition from word area to space area.
After the whole above group there is a + sign, meaning that you want to
match as many repeating words as possible.

Escape Sequence vs. Whitespace Character (\s) [duplicate]

This question already has answers here:
Version difference? Regex Escape in Java
(2 answers)
Closed 1 year ago.
Are escape sequences and whitespace characters the same thing? I'm not sure what else to write here but Stackoverflow said the first sentence is not enough so I'm typing this second sentence for no reason at all but that so this post will go through.
There are a few escape sequences specified in Java, of which \s is not part. The \s is recognized as whitespace in regular expressions, where it is a predefined character class.
Check the following sections from the Java Tutorial:
Escape Sequences
Predefined Character Classes

This regular expression is for which type of strings [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 7 years ago.
How can I know that any particular regular expression matches which type of input? Like I want to know about \$\{([\w]+)\}. Which string will be matched by this regular expression?
Pattern placeholder = Pattern.compile("\\$\\{([\\w]+)\\}");
Matcher mat = placeholder.matcher("input");
while (mat.find()) {
}
It accepts E.L access type to variables:
${somethingHere}
As comemnted above, you can check that Reference for more info.
This will find any character within ${}
The \w metacharacter is used to find a word character.
A word character is a character from a-z, A-Z, 0-9, including the _ (underscore) character.
The other characters are escaped by \, \$ looks for a $ \{ looks for { and \} looks for }
The + token mean to repeat the character ([\w]) between one and unlimited times, as many times as possible.

Categories