other regex patterns [duplicate] - java

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
Is there exist regex pattern that includes (part1|part2|...) and [part] that do:
(part1|part2) will match either part 1, or part 2, e.g. leav(e|ing) matches leave and leaving
[part] is an optional word, e.g. cat[s] will match cat and cats
I also want to soild words that must be in every pattern e.g. give cat[s] will match give cat and give cats

\bcats?\b will match both cat, cats but will not match cat in cater
\bleav(?:e|ing)\b will match both leave and leaving
\bpart(?:1|2|3)?\b will match part1,part2,part3 or part but not part in apart or partner
Explanation
\b // Forces a word boundary so that it does not match in the middle of a word like part in apart
(?: //Non capturing group so that we do not have extra groups in the matches, using this is a matter of choice
| //OR
? //Previous char in cats previous group in (?:1|2|3) is optional
Note that you need to escape the \ in \b while initializing the regex string.

Related

What is the functionality of this regex? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I am recently learning regex and i am not quite sure how the following regex works:
str.replaceAll("(\\w)(\\w*)", "$2$1ay");
This allows us to do the following:
input string: "Hello World !"
return string: "elloHay orldWay !"
From what I know: w is supposed to match all word characters including 0-9 and underscore and $ matches stuff at the end of string.
In the replaceAll method, the first parameter can be a regex. It matches all words in the string with the regex and changes them to the second parameter.
In simple cases replaceAll works like this:
str = "I,am,a,person"
str.replaceAll(",", " ") // I am a person
It matched all the commas and replaced them with a space.
In your case, the match is every alphabetic character(\w), followed by a stream of alphabetic characters(\w*).
The () around \w is to group them. So you have two groups, the first letter and the remaining part. If you use regex101 or some similar website you can see a visualization of this.
Your replacement is $2 -> Second group, followed by $1(remaining part), followed by ay.
Hope this clears it up for you.
Enclosing a regex expression in brackets () will make it a Capturing group.
Here you have 2 capturing groups , (\w) captures a single word character, and (\w*) catches zero or more.
$1 and $2 are used to refer to the captured groups, first and second respectively.
Also replaceAll takes each word individually.
So in this example in 'Hello' , 'H' is the first captured groups and 'ello' is the second. It's replaced by a reordered version - $2$1 which is basically swapping the captured groups.
So you get '$2$1ay' as 'elloHay'
The same for the next word also.

Regular Expression for finding Repeated Words [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
**(?i)\\b([a-z]+)\\b(?:\\s+\\1\\b)+**
I understand what each symbol means but when the symbols are combined...I can't figure out. The confusion part is (?:\s+\1\b)+. What does it mean??? Can you explain to me?? Thanks for your time!
Individual parts of (?:\s+\1\b)+ have the following meaning:
(?:...) - a non-capturing group. It contains:
\s+ - a non-empty sequence of white space chars.
\1 - a backreference to capturing group #1 (\b([a-z]+)\b).
It means that you want to have here just the same chars (the repeated word)
which has been just captured.
\b - a word boundary, in this case transition from word area to space area.
After the whole above group there is a + sign, meaning that you want to
match as many repeating words as possible.

How to match two strings before a specific string

I want to match just two strings before a matched string
e.g Rohan pillai J.
Currently i am using :
pattern= (?=\w+ J[.])\w+
Answer - pillai
desired answer - Rohan pillai
An alternative to take the first two names:
\w*\s\w*(?=\sJ\.)
Regex live here.
Explaining:
\w*\s # the first word (name) followed by space
\w* # the second word (name)
(?=\sJ\.) # must end with space and "J." - without taking it
Tip: Generally to escape regex metacharacters (like dot .) we use back-slash. Use character class like [.] if you want to put emphasis on that character (if you want to make it more visible when you will read this regex).
You need to put the look ahead in trailing :
(\w+) (\w+)(?= J\.)
See demo https://regex101.com/r/wH0oU8/1
Or more general you can use \s to match any whitespace instead of space :
(\w+)\s(\w+)(?=\sJ\.)

Regex to determine if string is a single repeating character [duplicate]

This question already has answers here:
Remove repeating character
(2 answers)
Closed 7 years ago.
What is the regex pattern to determine if a string solely consists of a single repeating character?
e.g.
"aaaaaaa" = true "aaabbbb" = false "$$$$$$$" =
true
This question checks if a string only contains repeating characters (e.g. "aabb") however I need to determine if it is a single repeating character.
You can try a backreference
^(.)\1{1,}$
Demo
Pattern Explanation:
^ the beginning of the string
( group and capture to \1:
. any character except \n
) end of \1
\1{1,} what was matched by capture \1 (at least 1 times)
$ the end of the string
Backreferences match the same text as previously matched by a capturing group. The backreference \1 (backslash one) references the first capturing group. \1 matches the exact same text that was matched by the first capturing group.
In Java you can try
"aaaaaaaa".matches("(.)\\1+") // true
There is no need for ^ and $ because String.matches() looks for whole string match.
this really depends on your language but in general this would match a line with all the same character.
^(.)\1+$
Regex101 Example
^ assert position at start of a line
1st Capturing group (.)
\1+ matches the same text as most recently matched by the 1st capturing group
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
$ assert position at end of a line

How do I capture the text that is before and after a multiple regex matches in java?

Given a test string of:
I have a 1234 and a 2345 and maybe a 3456 id.
I would like to match all the IDs (the four digit numbers) AND at the same time get 12 characters of their surrounding text (before and after) (if any!)
So the matches should be:
BEFORE MATCH AFTER
Match #1: I have a- 1234 -and a 2345-
Match #2: -1234 and a- 2345 -and maybe a
Match #3: and maybe a- 3456 -id.
This (-) is a space character
Note:
The BEFORE match of Match #1 is not 12 characters long (not many characters at the beginning of the string). Same with the AFTER match of Match #3 (not many characters after the last match)
Can I achieve these matches with a single regex in java?
My best attempt so far is to use a positive look behind and an atomic group (to get the surrounding text) but it fails in the beginning and the end of the string when there are not enough characters (like my note above)
(?<=(.{12}))(\d{4})(?>(.{12}))
This matches only 2345. If I use a small enough value for the quantifiers (2 instead of 12, for example) then I correctly match all IDs.
Here is a link to my regex playground where I was trying my regex's:
http://regex101.com/r/cZ6wG4
When you look at the MatchResult (http://docs.oracle.com/javase/7/docs/api/java/util/regex/MatchResult.html) interface implemented by the Matcher class (http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html) you will find the functions start() and end() which give you the index of the first / last character of the match within the input string. Once you have the indicies, you can use some simple math and the substring function to extract the parts you want.
I hope this helps you, because I won't write the entire code for you.
There might be a possibility to do what you want purely with regex. But I think using the indicies and substring is easier (and probably more reliable)
You can do it in a single regex:
Pattern regex = Pattern.compile("(?<=^.{0,10000}?(.{0,12}))(\\d+)(?=(.{0,12}))");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
before = regexMatcher.group(1);
match = regexMatcher.group(2);
after = regexMatcher.group(3);
}
Explanation:
(?<= # Assert that the following can be matched before current position
^.{0,10000}? # Match as few characters as possible from the start of the string
(.{0,12}) # Match and capture up to 12 chars in group 1
) # End of lookbehind
(\d+) # Match and capture in group 2: Any number
(?= # Assert that the following can be matched here:
(.*) # Match and capture up to 12 chars in group 3
) # End of lookahead
You don't need a lookbehind or an atomic group for this, but you do need a lookahead:
(.{0,12}?)\b(\d+)\b(?=(.{0,12}))
I'm assuming your ID's are not enclosed in longer words (thus the \b). I used a reluctant quantifier in the leading portion ({0,12}?) to prevent it consuming more than one ID when they're spaced close to each other, and in:
I have a 1234, 2345 and 1456 id.

Categories