regex delete heading and tailing punctuation - java

I am trying to write a regex in Java to get rid of all heading and tailing punctuation characters except for "-" in a String, however keeping the punctuation within words intact.
I tried to replace the punctuations with "", String regex = "[\\p{Punct}+&&[^-]]"; right now, but it will delete the punctuation within word too.
I also tried to match pattern: String regex = "[(\\w+\\p{Punct}+\\w+)]"; and Matcher.maches() to match a group, but it gives me null for input String word = "#(*&wor(&d#)("
I am wondering what is the right way to deal with Regex group matching in this case
Examples:
Input: #)($&word#)($& Output: word
Input: #)($)word#google.com#)(*$&$ Output: word#google.com

Pattern p = Pattern.compile("^\\p{Punct}*(.*?)\\p{Punct}*$");
Matcher m = p.matcher("#)($)word#google.com#)(*$&$");
if (m.matches()) {
System.out.println(m.group(1));
}
To give some more info, the key is to have marks for the beginning and end of the string in the regex (^ and $) and to have the middle part match non-greedily (using *? instead of just *).

Related

How to write a regex capture group which matches a character 3 or 4 times before a delimiter?

I'm trying to write a regex that splits elements out according to a delimiter. The regex also needs to ensure there are ideally 4, but at least 3 colons : in each match.
Here's an example string:
"Checkers, etc:Blue::C, Backgammon, I say:Green::Pepsi:P, Chess, misc:White:Coke:Florida:A, :::U"
From this, there should be 4 matches:
Checkers, etc:Blue::C
Backgammon, I say:Green::Pepsi:P
Chess, misc:White:Coke:Florida:A
:::U
Here's what I've tried so far:
([^:]*:[^:]*){3,4}(?:, )
Regex 101 at: https://regex101.com/r/O8iacP/8
I tried setting up a non-capturing group for ,
Then I tried matching a group of any character that's not a :, a :, and any character that's not a : 3 or 4 times.
The code I'm using to iterate over these groups is:
String line = "Checkers, etc:Blue::C, Backgammon, I say::Pepsi:P, Chess:White:Coke:Florida:A, :::U";
String pattern = "([^:]*:[^:]*){3,4}(?:, )";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher matcher = r.matcher(line);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Any help is appreciated!
Edit
Using #Casimir's regex, it's working. I had to change the above code to use group(0) like this:
String line = "Checkers, etc:Blue::C, Backgammon, I say::Pepsi:P, Chess:White:Coke:Florida:A, :::U";
String pattern = "(?![\\s,])(?:[^:]*:){3}\\S*(?![^,])";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher matcher = r.matcher(line);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Now prints:
Checkers, etc:Blue::C
Backgammon, I say::Pepsi:P
Chess:White:Coke:Florida:A
:::U
Thanks again!
I suggest this pattern:
(?![\\s,])(?:[^:]*:){3}\\S*(?![^,])
Negative lookaheads avoid to match leading or trailing delimiters. The second one in particular forces the match to be followed by the delimiter or the end of the string (not followed by a character that isn't a comma).
demo
Note that the pattern doesn't have capture groups, so the result is the whole match (or group 0).
You might use
(?:[^,:]+, )?[^:,]*(?::+[^:,]+)+
(?:[^,:]+, )? Optionally match 1+ any char except a , or : followed by , and space
[^:,]* Match 0+ any char except : or ,
(?: Non Capturing group
:+[^:,]+ Match 1+ : and 1+ times any char except : and ,
)+ Close group and repeat 1+ times
Regex demo
You seem to be making it harder than it needs to be with the lookahead (which won't be satisfied at end-of-line anyway).
([^:]*:){3}[^:,]*:?[^:,]*
Find the first 3 :'s, then start including , in the negative groupings, with an optional 4th :.

regex find string between 2 characters, seperated by comma

I am new to regular expression and i want to find a string between two characters,
I tried below but it always returns false. May i know whats wrong with this ?
public static void main(String[] args) {
String input = "myFunction(hello ,world, test)";
String patternString = "\\(([^]]+)\\)";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
Input:
myFunction(hello,world,test) where myFunction can be any characters. before starting ( there can be any characters.
Output:
hello
world
test
You could match make use of the \G anchor which asserts the position at the end of the previous match and and capture your values in a group:
(?:\bmyFunction\(|\G(?!^))([^,]+)(?:\h*,\h*)?(?=[^)]*\))
In Java:
String regex = "(?:\\bmyFunction\\(|\\G(?!^))([^,]+)(?:\\h*,\\h*)?(?=[^)]*\\))";
Explanation
(?: Non capturing group
\bmyFunction\( Word boundary to prevent the match being part of a larger word, match myFunction and an opening parentheses (
| Or
\G(?!^) Assert position at the end of previous match, not at the start of the string
) Close non capturing group
([^,]+) Capture in a group matching 1+ times not a comma
(?:\h*,\h*)? Optionally match a comma surrounded by 0+ horizontal whitespace chars
(?=[^)]*\)) Positive lookahead, assert what is on the right is a closing parenthesis )
Regex demo | Java demo
For example:
String patternString = "(?:\\bmyFunction\\(|\\G(?!^))([^,]+)(?:\\h*,\\h*)?(?=[^)]*\\))";
String input = "myFunction(hello ,world, test)";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Result
hello
world
test
I'd suggest you to achieve this in a two-step process:
Step 1: Capture all the content between ( and )
Use the regex: ^\S+\((.*)\)$
Demo
The first and the only capturing group will contain the required text.
Step 2: Split the captured string above on ,, thus yielding all the comma-separated parameters independently.
See this you may get idea
([\w]+),([\w]+),([\w]+)
DEMO: https://rubular.com/r/9HDIwBTacxTy2O

Regex to match the beginning and the end of a string in Java

I want to extract a certain like of string using Regex in Java. I currently have this pattern:
pattern = "^\\a.+\\sed$\n";
Supposed to match on a string that starts with "a" and ends with "sed". This is not working. Did I miss something ?
Removed the \n line at the end of the pattern and replaced it with a "$":
Still doesn't get a match. The regex looks legit from my side.
What I want to extract is the "a sed" from the temp string.
String temp = "afsgdhgd gfgshfdgadh a sed afdsgdhgdsfgdfagdfhh";
pattern = "(?s)^a.*sed$";
pr = Pattern.compile(pattern);
math = pr.matcher(temp);
UPDATE
You want to match a sed, so you can use a\\s+sed if there is only whitespace between a and sed:
String s = "afsgdhgd gfgshfdgadh a sed afdsgdhgdsfgdfagdfhh";
Pattern pattern = Pattern.compile("a\\s+sed");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(0));
}
See IDEONE demo
Now, if there can be anything between a and sed, use a tempered greedy token:
Pattern pattern = Pattern.compile("(?s)a(?:(?!a|sed).)*sed");
^^^^^^^^^^^^^
See another IDEONE demo.
ORIGINAL ANSWER
The main problem with your regex is the \n at the end. $ is the end of string, and you try to match one more character after a string end, which is impossible. Also, \\s matches a whitespace symbol, but you need a literal s.
You need to remove \\s and \n and make . match a newline, and also it is advisbale to use * quantifier to allow 0 symbols in-between:
pattern = "(?s)^a.*sed$";
See the regex demo
The regex matches:
^ - start of string
a - a literal a
.* - 0 or more any characters (since (?s) modifier makes a . match any character including a newline)
sed - a literal letter sequence sed
$ - end of string
Your temp string cannot match the pattern (?s)^a.*sed$, because this pattern says that your temp string must begin with the character a and end with the sequence sed, which is not the case. Your string has trailing characters after the "sed" sequence.
If you only want to extract that a...sed portion of the whole string, try using the unanchored pattern "a.*sed" and use the find() method of the Matcher class:
Pattern pattern = Pattern.compile("a.*sed");
Matcher m = pattern.matcher(temp);
if (m.find())
{
System.out.println("Found string "+m.group());
System.out.println("From "+m.start()+" to "+m.end());
}

Regular expression java to extract the balance from a string

I have a String which contains " Dear user BAL= 1,234/ ".
I want to extract 1,234 from the String using the regular expression. It can be 1,23, 1,2345, 5,213 or 500
final Pattern p=Pattern.compile("((BAL)=*(\\s{1}\\w+))");
final Matcherm m = p.matcher(text);
if(m.find())
return m.group(3);
else
return "";
This returns 3.
What regular expression should I make? I am new to regular expressions.
You search in your regex for word characters \w+ but you should search for digits with \d+.
Additionally there is the comma, so you need to match that as well.
I'd use
/.BAL=\s([\d,]+(?=/)./
as pattern and get only the number in the resulting group.
Explanation:
.* match anything before
BAL= match the string "BAL="
\s match a whitespace
( start matching group
[\d,]+ matches every digit or comma one ore more times
(?=/) match the former only if followed by a slash
) end matching group
.* matches anything thereaft
This is untestet, but it should work like this:
final Pattern p=Pattern.compile(".*BAL=\\s([\\d,]+(?=/)).*");
final Matcherm m = p.matcher(text);
if(m.find())
return m.group(1);
else
return "";
According to an online tester, the pattern above matches the text:
BAL= 1,234/
If it didn't have to be extracted by the regular expression you could simply do:
// split on any whitespace into a 4-element array
String[] foo = text.split("\\s+");
return foo[3];

Remove occurrences of a given character sequence at the beginning of a string using Java Regex

I have a string that begins with one or more occurrences of the sequence "Re:". This "Re:" can be of any combinations, for ex. Re<any number of spaces>:, re:, re<any number of spaces>:, RE:, RE<any number of spaces>:, etc.
Sample sequence of string : Re: Re : Re : re : RE: This is a Re: sample string.
I want to define a java regular expression that will identify and strip off all occurrences of Re:, but only the ones at the beginning of the string and not the ones occurring within the string.
So the output should look like This is a Re: sample string.
Here is what I have tried:
String REGEX = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)";
String INPUT = title;
String REPLACE = "";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT);
while(m.find()){
m.appendReplacement(sb,REPLACE);
}
m.appendTail(sb);
I am using p{Z} to match whitespaces(have found this somewhere in this forum, as Java regex does not identify \s).
The problem I am facing with this code is that the search stops at the first match, and escapes the while loop.
Try something like this replace statement:
yourString = yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
Explanation of the regex:
(?i) make it case insensitive
^ anchor to start of string
( start a group (this is the "re:")
\\s* any amount of optional whitespace
re "re"
\\s* optional whitespace
: ":"
\\s* optional whitespace
) end the group (the "re:" string)
+ one or more times
in your regex:
String regex = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)"
here is what it does:
see it live here
it matches strings like:
\p{Z}Reee\p{Z: or
R\p{Z}}}
which make no sense for what you try to do:
you'd better use a regex like the following:
yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
or to make #Doorknob happy, here's another way to achieve this, using a Matcher:
Pattern p = Pattern.compile("(?i)^(\\s*re\\s*:\\s*)+");
Matcher m = p.matcher(yourString);
if (m.find())
yourString = m.replaceAll("");
(which is as the doc says the exact same thing as yourString.replaceAll())
Look it up here
(I had the same regex as #Doorknob, but thanks to #jlordo for the replaceAll and #Doorknob for thinking about the (?i) case insensitivity part ;-) )

Categories