How to match two strings before a specific string - java

I want to match just two strings before a matched string
e.g Rohan pillai J.
Currently i am using :
pattern= (?=\w+ J[.])\w+
Answer - pillai
desired answer - Rohan pillai

An alternative to take the first two names:
\w*\s\w*(?=\sJ\.)
Regex live here.
Explaining:
\w*\s # the first word (name) followed by space
\w* # the second word (name)
(?=\sJ\.) # must end with space and "J." - without taking it
Tip: Generally to escape regex metacharacters (like dot .) we use back-slash. Use character class like [.] if you want to put emphasis on that character (if you want to make it more visible when you will read this regex).

You need to put the look ahead in trailing :
(\w+) (\w+)(?= J\.)
See demo https://regex101.com/r/wH0oU8/1
Or more general you can use \s to match any whitespace instead of space :
(\w+)\s(\w+)(?=\sJ\.)

Related

RegEx matching $1 tokens [duplicate]

Imagine you are trying to pattern match "stackoverflow".
You want the following:
this is stackoverflow and it rocks [MATCH]
stackoverflow is the best [MATCH]
i love stackoverflow [MATCH]
typostackoverflow rules [NO MATCH]
i love stackoverflowtypo [NO MATCH]
I know how to parse out stackoverflow if it has spaces on both sites using:
/\s(stackoverflow)\s/
Same with if its at the start or end of a string:
/^(stackoverflow)\s/
/\s(stackoverflow)$/
But how do you specify "space or end of string" and "space or start of string" using a regular expression?
You can use any of the following:
\b #A word break and will work for both spaces and end of lines.
(^|\s) #the | means or. () is a capturing group.
/\b(stackoverflow)\b/
Also, if you don't want to include the space in your match, you can use lookbehind/aheads.
(?<=\s|^) #to look behind the match
(stackoverflow) #the string you want. () optional
(?=\s|$) #to look ahead.
(^|\s) would match space or start of string and ($|\s) for space or end of string. Together it's:
(^|\s)stackoverflow($|\s)
Here's what I would use:
(?<!\S)stackoverflow(?!\S)
In other words, match "stackoverflow" if it's not preceded by a non-whitespace character and not followed by a non-whitespace character.
This is neater (IMO) than the "space-or-anchor" approach, and it doesn't assume the string starts and ends with word characters like the \b approach does.
\b matches at word boundaries (without actually matching any characters), so the following should do what you want:
\bstackoverflow\b

Is it possible to change last character in a regex repetition?

I need to include space between each string repetition, that is working fine using )-- )+ at the end of the regex, but I need to delete the space for the last string entered.
Using the following regex the space is mandatory for all of the strings including the last one which is not necessary.
How to avoid the space to be mandatory for the last repeated string?
This is my regex:
"(?:sv(?:32i|32e)(?:a|m|c|)-(?:32f|32e)(?:a|b|)-- )+"
You could match the first part sv32[ie][amc]-32[ef][ab]-- without the trailing space and repeat matching the same pattern preceded by a space in a non capturing group (?: ..)* or use + to repeat 1 or more times.
The pattern might be simplified a bit using character classes instead of the non capturing groups.
sv32[ie][amc]-?32[ef][ab]--(?: sv32[ie][amc]-?32[ef][ab]--)*
Regex demo
Note that if you want to match sv32ia-32ea-- sv32ia32ea-- sv32ia-32ea-- sv32ia32ea-- you could make the hypen optional using -?32
You could also make the space optional using -- ?)+ if you would allow a trailing space.

How to restrict more than one whitespace in Java Regex?

A validation to be developed for a JavaFX text field where single whitespace is allowed but more than one whitespace is not be allowed.
For example, "Apple Juice" -- correct "Apple Juice" -- incorrect
should be restricted
if (title.matches("[a-zA-Z0-9 ]+"))
Found couple of links but not meeting my requirement. I believe that it is more of a logical tweak.
Whitespace Matching Regex - Java
Regex allowing a space character in Java
You can do:
if (title.matches([a-zA-z]+[ ][a-zA-Z]+))
The first [a-zA-z]+ checks for any characters before the space.
The [ ] checks for exactly one space.
The second [a-zA-z]+ checks for any characters after the space.
Note: This will match only if the space is present in between the string. If you want to match strings like Abcd<space> or <space>Abcd, (I used <spcace> as SO does not allow two spaces to be present simultaneously) then you can replace the +s with *s., i.e.,
if (title.matches([a-zA-z]*[ ][a-zA-Z]*))
You'd better find more than a single whitespace and negate the result:
if(!title.trim().matches("\s{2,}"))
, see java.util.regex.Pattern javadoc for the syntax. The string is first trimmed, so you don't need to check for non-whitespace characters. If you don't do the trim() operation, leading and trailing whitespace will also be considered.
You could do
if ("Apple Juice".matches("\\w+ \\w+")) {
.......

Replace multiple capture groups using regexp with java

I have this requirement - for an input string such as the one shown below
8This8 is &reallly& a #test# of %repl%acing% %mul%tiple 9matched9 9pairs
I would like to strip the matched word boundaries (where the matching pair is 8 or & or % etc) and will result in the following
This is really a test of repl%acing %mul%tiple matched 9pairs
This list of characters that is used for the pairs can vary e.g. 8,9,%,# etc and only the words matching the start and end with each type will be stripped of those characters, with the same character embedded in the word remaining where it is.
Using Java I can do a pattern as \\b8([^\\s]*)8\\b and replacement as $1, to capture and replace all occurrences of 8...8, but how do I do this for all the types of pairs?
I can provide a pattern such as \\b8([^\\s]*)8\\b|\\b9([^\\s]*)9\\b .. and so on that will match all types of matching pairs *8,9,..), but how do I specify a 'variable' replacement group -
e.g. if the match is 9...9, the the replacement should be $2.
I can of course run it through multiple of these, each replacing a specific type of pair, but I am wondering if there is a more elegant way.
Or is there a completely different way of approaching this problem?
Thanks.
You could use the below regex and then replace the matched characters by the characters present inside the group index 2.
(?<!\S)(\S)(\S+)\1(?=\s|$)
OR
(?<!\S)(\S)(\S*)\1(?=\s|$)
Java regex would be,
(?<!\\S)(\\S)(\\S+)\\1(?=\\s|$)
DEMO
String s1 = "8This8 is &reallly& a #test# of %repl%acing% %mul%tiple 9matched9 9pairs";
System.out.println(s1.replaceAll("(?<!\\S)(\\S)(\\S+)\\1(?=\\s|$)", "$2"));
Output:
This is reallly a test of repl%acing %mul%tiple matched 9pairs
Explanation:
(?<!\\S) Negative lookbehind, asserts that the match wouldn't be preceded by a non-space character.
(\\S) Captures the first non-space character and stores it into group index 1.
(\\S+) Captures one or more non-space characters.
\\1 Refers to the character inside first captured group.
(?=\\s|$) And the match must be followed by a space or end of the line anchor.
This makes sure that the first character and last character of the string must be the same. If so, then it replaces the whole match by the characters which are present inside the group index 2.
For this specific case, you could modify the above regex as,
String s1 = "8This8 is &reallly& a #test# of %repl%acing% %mul%tiple 9matched9 9pairs";
System.out.println(s1.replaceAll("(?<!\\S)([89&#%])(\\S+)\\1(?=\\s|$)", "$2"));
DEMO
(?<![a-zA-Z])[8&#%9](?=[a-zA-Z])([^\s]*?)(?<=[a-zA-Z])[8&#%9](?![a-zA-Z])
Try this.Replace with $1 or \1.See demo.
https://regex101.com/r/qB0jV1/15
(?<![a-zA-Z])[^a-zA-Z](?=[a-zA-Z])([^\s]*?)(?<=[a-zA-Z])[^a-zA-Z](?![a-zA-Z])
Use this if you have many delimiters.

How to match ^(d+) in a particular text using regex

For example I have text like below :
case1:
(1) Hello, how are you?
case2:
Hi. (1) How're you doing?
Now I want to match the text which starts with (\d+).
I have tried the following regex but nothing is working.
^[\(\d+\)], ^\(\d+\).
[] are used to match any of the things you specify inside the brackets, and are to be followed by a quantifier.
The second regexp will work: ^\(\d+\), so check your code.
Check also so there's no space in front of the first parenthesis, or add \s* in front.
EDIT: Also, java can be tricky with escapes depending on if the regexp you type is directly translated to a regexp or is first a string literal. You may need to double escape your escapes.
In Java you have to escape parenthesis, so "\\(\\d+\\)" should match (1) in case one and two. Adding ^ as you did "^\\(\\d+\\)" will match only case1.
You have to use double back slashes within java string. Consider this
"\n" give you [line break]
"\\n" give you [backslash][n]
If you are going to downvote my post, at least comment to tell me WHY it's not useful.
I believe Java's Regex Engine supports Positive Lookbehind, in which case you can use the following regex:
(?<=[(][0-9]{1,9999}[)]\s?)\b.*$
Which matches:
The literal text (
Any digit [0-9], between 1 and 9999 times {1,9999}
The literal text )
A space, between 0 and 1 times \s?
A word boundary \b
Any character, between 0 and unlimited times .*
The end of a string $

Categories