JAVA regex for "String, String." - java

Given string "Neil, Gogte., Satyam, B.: Introduction to Java"
I need to extract only "Neil, Gogte." and "Satyam, B." from given string using regex how can I do it?

You can use matcher to group
String str = "Neil, Gogte., Satyam, B.: Introduction to Java";
Pattern pattern = Pattern.compile("([a-zA-Z]+, [a-zA-Z]+\\.)");
Matcher matcher = pattern.matcher(str);
while(matcher.find()){
String result = matcher.group();
System.out.println(result);
}

You can use the following regex to split the string. This matches any locations where ., exist:
(?<=\.),\s*
(?<=\.) Positive lookbehind ensuring what precedes is a literal dot character .
,\s* Matches , followed by any number of whitespace characters
See code in use here
import java.util.*;
import java.util.regex.Pattern;
class Main {
public static void main(String[] args) {
final String s = "Neil, Gogte., Satyam, B.: Introduction to Java";
final Pattern r = Pattern.compile("(?<=\\.),\\s*");
String[] result = r.split(s);
Arrays.stream(result).forEach(System.out::println);
}
}
Result:
Neil, Gogte.
Satyam, B.: Introduction to Java

You might use this regex to match your names:
[A-Z][a-z]+, [A-Z][a-z]*\.
In Java:
[A-Z][a-z]+, [A-Z][a-z]*\\.
That would match
[A-Z] Match an uppercase character
[a-z]+ Match one or more lowercase characters
, Match comma and a whitespace
[A-Z] Match an uppercase character
[a-z]* Match zero or more lowercase characters
\. Match a dot
Demo Java

Related

Java regex for preserving currency symbols along with comma and dot if they are surrounded by numbers

This is my input string
String inputString = "fff.fre def $fff$ £45112,662 $0.33445533 abc,def 12,34"
I tried below regex to split
String[] tokens = inputString.split("(?![$£](?=(\\d)*[.,]?(\\d)*))[\\p{Punct}\\s]");
but it is not preserving comma and dot if they are surrounded by numbers. Basically,I don't want to split by comma and dot if they are part of price value
Output I get is
token==>fff
token==>fre
token==>def
token==>$fff$
token==>£45112
token==>662
token==>$0
token==>33445533
token==>abc
token==>def
token==>12
token==>34
Expected output
token==>fff
token==>fre
token==>def
token==>$fff$
token==>£45112.662
token==>$0.33445533
token==>abc
token==>def
token==>12
token==>34
Instead of split, you may use this simpler regex to get all the desired matches:
[$£]\w+[$£]?|[^\p{Punct}\h]+
RegEx Demo
RegEx Breakup:
[$£]: Match $ or £
\w+: Match 1+ word chars
[$£]?: Match optional $ or £
|: OR
[^\p{Punct}\h]+: Match 1+ of any char that are not whitespace or punctuation
Code:
final String regex = "[$£]\\w+[$£]?|[^\\p{Punct}\\h]+";
final String string = "fff.fre def $fff$ £45112,662 $0.33445533 abc,def 12,34";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("token==>" + matcher.group());
}

RegEx for capturing special chars

I am trying to replace a string using regular expression what i need basically is to convert a code like assignment:
k*=i
into
k=k+i
In my example:
jregex.Pattern p=new jregex.Pattern("([a-z]|[A-Z])([a-z]|[A-Z]|\\d)*[\\+|\\*|\\-|\\/][=]([a-z]|[A-Z])*([a-z]|[A-Z]|\\d)");
Replacer r= new Replacer(p,"1=$1,2=$2,3=$3,4=$4,5=$5,6=$6,7=$7,8=$8");
String result=r.replace("k*=i");
The regex seems to not extract the special chars.
(in this example: +, -, *, /, =)
So what I get as result is:
1=k,2=,3=,4=i,5=,6=,7=,8=
(I can extract only the k & i)
How do I solve this problem?
Here, we can design as expression similar to:
(.+)[*+-/]=(.+)
where we are capturing our k and i using these two capturing groups in the start and end:
(.+)
We can add more boundaries, if we wish, such as start and end char:
^(.+)[*+-/]=(.+)$
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "(.+)[*+-/]=(.+)";
final String string = "k*=i\n"
+ "apple*=orange";
final String subst = "$1=$1+$2";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
// The substituted value will be contained in the result variable
final String result = matcher.replaceAll(subst);
System.out.println("Substitution result: " + result);
DEMO
RegEx Circuit
jex.im visualizes regular expressions:
You could use 3 capturing groups and capturing *+/- in a character class.
([a-zA-Z])([*+/-])=([a-zA-Z])
That will match:
([a-zA-Z]) Capture group 1, match a-z A-Z
([*+/-]) Capture group 2, match * + / -
= Match literally
([a-zA-Z]) Capture group 3, match a-z A-Z
Regex demo | Java demo
And replace with:
$1=$1$2$3

Regex to find a word between $$ sign

I want regular expression to find a word between $$ sign only. It must start and end with $ sign. I have tried below expression
final String regex = "\\$\\w+\\$";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher("$abc$ cde$efg$hij pqr");
This should give me count as 1. But my regular expression also considering second occurrence of (cde$efg$hij) which it should not consider as it is not starting and ending with $$ sign.
You may use non-word boundaries:
final String regex = "\\B\\$\\w+\\$\\B";
The pattern will only match if the $abc$ is not preceded and followed with word chars. See the regex demo.
See Java demo:
String regex = "\\B\\$\\w+\\$\\B";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher("$abc$ cde$efg$hij pqr");
while (matcher.find()){
System.out.println(matcher.group(0));
} // => $abc$
Besides non-word boundaries, you may use whitespace boundaries if you only want to match in between whitespace chars or start/end of string:
String regex = "(?<!\\S)\\$\\w+\\$(?!\\S)";
Or, use unambiguous word boundaries (as I call them):
String regex = "(?<!\\w)\\$\\w+\\$(?!\\w)";
The (?<!\\w) negative lookbehind will fail the match if a word char is found immediately to the left of the current location, and the (?!\w) negative lookahead will fail the match if a word char is found immediately to the right of the current location.
The problem was extracting fields between dollar signs for me.
List<String> getFieldNames(#NotNull String str) {
final String regex = "\\$(\\w+)\\$";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
List<String> fields = new ArrayList<>();
while (matcher.find()) {
fields.add(matcher.group(1));
}
return fields;
}
This will return list of words between dollar signs.

Find String After Last Dot and Remove Last word which is Starting from Capital letter USing RegEx

String myString = "abc.kiransinh.bapu.abc.events.KiranSinhBorasibBapu";
Pattern regex = Pattern.compile("[^.]+(?=[^.]*$)[^Bapu]");
System.out.println(myString);
Matcher regexMatcher = regex.matcher(myString);
if (regexMatcher.find()) {
String ResultString = regexMatcher.group();
ResultString=ResultString.replaceAll("(.)(\\p{Lu})", "$1_$2").toUpperCase();
System.out.println(ResultString);
}
Desire Output is: KIRAN_SINH_BORASIAB
i tried Above code.Want to use only Regex.
though i have used replaceAll method.
Desired Output might be possible using only Regex.
I am new to regex.Any help woud be too much appreciated.
Thanks in Advance :-)
You can use this regex to match the string you desire:
String re = "(?<=\\.)(?=[^.]*$)\\p{Lu}\\p{L}*?(?=\\p{Lu}(?=[^\\p{Lu}]*$))";
Pattern pattern = Pattern.compile(re);
RegEx Demo
Use Matcher#find to get your matched text.
Matcher regexMatcher = pattern.matcher( input );
if (regexMatcher.find()) {
System.out.println( regexMatcher.group() );
}
RegEx Breakup:
(?<=\.) # lookbehind to assert preceding char is DOT
(?=[^.]*$) # lookahead to assert there is no further DOT in text
\p{Lu} # match a unicode uppercase letter
\p{L}*? # match 0 or more of unicode letters (non-greedy)
(?=\p{Lu}(?=[^\p{Lu}]*$)) # make sure next char is uppercase letter and further
# (?=[^\p{Lu}]*$) makes sure there is no uppercase letter after

pattern matching to detect special characters in a word

I am trying to identify any special characters ('?', '.', ',') at the end of a string in java. Here is what I wrote:
public static void main(String[] args) {
Pattern pattern = Pattern.compile("{.,?}$");
Matcher matcher = pattern.matcher("Sure?");
System.out.println("Input String matches regex - "+matcher.matches());
}
This returns a false when it's expected to be true. Please suggest.
Use "sure?".matches(".*[.,?]").
String#matches(...) anto-anchors the regex with ^ and $, no need to add them manually.
This is your code:
Pattern pattern = Pattern.compile("{.,?}$");
Matcher matcher = pattern.matcher("Sure?");
System.out.println("Input String matches regex - "+matcher.matches());
You have 2 problems:
You're using { and } instead of character class [ and ]
You're using Matcher#matches() instead of Matcher#find. matches method matches the full input line while find performs a search anywhere in the string.
Change your code to:
Pattern pattern = Pattern.compile("[.,?]$");
Matcher matcher = pattern.matcher("Sure?");
System.out.println("Input String matches regex - " + matcher.find());
Try this
Pattern pattern = Pattern.compile(".*[.,?]");
...

Categories