Regex for numeric portion of Java string - java

I'm trying to write a Java method that will take a string as a parameter and return another string if it matches a pattern, and null otherwise. The pattern:
Starts with a number (1+ digits); then followed by
A colon (":"); then followed by
A single whitespace (" "); then followed by
Any Java string of 1+ characters
Hence, some valid string thats match this pattern:
50: hello
1: d
10938484: 394958558
And some strings that do not match this pattern:
korfed49
: e4949
6
6:
6:sdjjd4
The general skeleton of the method is this:
public String extractNumber(String toMatch) {
// If toMatch matches the pattern, extract the first number
// (everything prior to the colon).
// Else, return null.
}
Here's my best attempt so far, but I know I'm wrong:
public String extractNumber(String toMatch) {
// If toMatch matches the pattern, extract the first number
// (everything prior to the colon).
String regex = "???";
if(toMatch.matches(regex))
return toMatch.substring(0, toMatch.indexOf(":"));
// Else, return null.
return null;
}
Thanks in advance.

Your description is spot on, now it just needs to be translated to a regex:
^ # Starts
\d+ # with a number (1+ digits); then followed by
: # A colon (":"); then followed by
# A single whitespace (" "); then followed by
\w+ # Any word character, one one more times
$ # (followed by the end of input)
Giving, in a Java string:
"^\\d+: \\w+$"
You also want to capture the numbers: put parentheses around \d+, use a Matcher, and capture group 1 if there is a match:
private static final Pattern PATTERN = Pattern.compile("^(\\d+): \\w+$");
// ...
public String extractNumber(String toMatch) {
Matcher m = PATTERN.matcher(toMatch);
return m.find() ? m.group(1) : null;
}
Note: in Java, \w only matches ASCII characters and digits (this is not the case for .NET languages for instance) and it will also match an underscore. If you don't want the underscore, you can use (Java specific syntax):
[\w&&[^_]]
instead of \w for the last part of the regex, giving:
"^(\\d+): [\\w&&[^_]]+$"

Try using the following: \d+: \w+

Related

Java regex matches but String.replaceAll() doesn't replace matching substrings

public class test {
public static void main(String[]args) {
String test1 = "Nørrebro, Denmark";
String test2 = "ø";
String regex = new String("^&\\S*;$");
String value = test1.replaceAll(regex,"");
System.out.println(test2.matches(regex));
System.out.println(value);
}
}
This gives me following Output:
true
Nørrebro, Denmark
How is that possible ? Why does replaceAll() not register a match?
Your regex includes ^. Which makes the regex match from the very start.
If you try
test1.matches(regex)
you will get false.
You need to understand what ^ and $ means.
You probably put them in there because you want to say:
At the start of each match, I want a &, then 0 or more non-whitespace characters, then a ; at the end of the match.
However, ^ and $ doesn't mean the start and end of each match. It means the start and end of the string.
So you should remove the ^ and $ from your regex:
String regex = "&\\S*;";
Now it outputs:
true
Nrrebro, Denmark
"What character specifies the start and end of the match then?" you might ask. Well, since your regex basically the pattern you are matching, the start of the regex is the start of the match (unless you have lookbehinds)!
It is possible because ^&\S*;$ pattern matches the entire ø string but it does not match entire Nørrebro, Denmark string. The ^ matches (requires here) start of string to be right before & and $ requires the ; to appear right at the end of the string.
Just removing the ^ and $ anchors may not work, because \S* is a greedy pattern, and it may overmatch, e.g. in Nørrebro;.
You may use &\w+; or &\S+?; pattern, e.g.:
String test1 = "Nørrebro, Denmark";
String regex = "&\\w+;";
String value = test1.replaceAll(regex,"");
System.out.println(value); // => Nrrebro, Denmark
See the Java demo.
The &\w+; pattern matches a &, then any 1+ word chars, and then ;, anywhere inside the string. \S*? matches any 0+ chars other than whitespace.
You can use this regex : &(.*?);
String test1 = "Nørrebro, Denmark";
String test2 = "ø";
String regex = new String("&(.*?);");
String value = test1.replaceAll(regex,"");
System.out.println(test2.matches(regex));
System.out.println(value);
output :
true
Nrrebro, Denmark

Regex to find a string containing more than a single whitespace with no leading/trailing whitespace

Currently i have
Pattern p = Pattern.compile("\s");
boolean invalidChar = p.matcher(text).find();
I want it to return true only when i have more than a single whitespace.
Also there should not be any whitespace in the beginning or ending of string.
So some valid/invalid text would be
12 34 56 = valid
ab-34 56 = valid
ab 34 = invalid
12 34 53 = invalid
Without regex..
public class Answ {
public static boolean isValid(String s) {
return !s.contains(" "); //two white spaces
}
public static void main(String[] args) {
String st1 = "12 34 56";
System.out.println(isValid(st1));
}
}
Try this:
(^\s{1,}|\s{2,}|\s$)
Final:
Pattern p = Pattern.compile("(^\s{1,}|\s{2,}|\s$)");
Since there can't be whitespace at the start and end of the string, and there cannot be two or more consecutive whitespaces inside, you may use
boolean isValid = s.matches("\\S+(?:\\s\\S+)*");
This expression will match the following:
^ (implicit in matches that anchors the match by default, i.e. the whole string must match the regex pattern) - the start of the string
\S+ - 1 or more chars other than whitespaces
(?:\s\S+)* - zero or more sequences of:
\s - a single whitespace
\S+ - 1 or more chars other than whitespaces
$ (implicit in matches) - the end of the string.
See the regex demo.
You can use this pattern:
Pattern p = Pattern.compile("(?<!\\S)(?!\\S)");
Matcher m = p.matcher(text);
boolean invalidChar = m.find();
or boolean isValid = !m.find(), as you want.
Where (?<!\\S) means "not preceded by a non-whitespace" (that includes a preceding whitespace or the start of the string) and (?!\\S) "not followed by a non-whitespace" (that includes a following whitespace or the end of the string).
These two lookarounds describe all possible cases:
successive white-spaces (matches the position between the first two white-spaces)
white-space at the beginning or at the end
empty string
Try this:
boolean invalidChar = text.matches("\\S(?!.*\\s\\s).*\\S");
Explanation:
\\S - the match begins with a non-whitespace character
(?!.*\\s\\s) - negative lookahead assertion to ensure there are no instances of two whitespace characters next to each other
.* - matches 0 or more of any character
\\S - the match ends with a non-whitespace character
Note: the matches("regex") method returns true only if the regex matches the entire text string.

Regex to identify strings containing a particular symbol?

I have set of inputs ++++,----,+-+-.Out of these inputs I want the string containing only + symbols.
If you want to see if a String contains nothing but + characters, write a loop to check it:
private static boolean containsOnly(String input, char ch) {
if (input.isEmpty())
return false;
for (int i = 0; i < input.length(); i++)
if (input.charAt(i) != ch)
return false;
return true;
}
Then call it to check:
System.out.println(containsOnly("++++", '+')); // prints: true
System.out.println(containsOnly("----", '+')); // prints: false
System.out.println(containsOnly("+-+-", '+')); // prints: false
UPDATE
If you must do it using regex (worse performance), then you can do any of these:
// escape special character '+'
input.matches("\\++")
// '+' not special in a character class
input.matches("[+]+")
// if "+" is dynamic value at runtime, use quote() to escape for you,
// then use a repeating non-capturing group around that
input.matches("(?:" + Pattern.quote("+") + ")+")
Replace final + with * in each of these, if an empty string should return true.
The regular expression for checking if a string is composed of only one repeated symbol is
^(.)\1*$
If you only want lines composed by '+', then it's
^\++$, or ^++*$ if your regex implementation does not support +(meaning "one or more").
For a sequence of the same symbol, use
(.)\1+
as the regular expression. For example, this will match +++, and --- but not +--.
Regex pattern: ^[^\+]*?\+[^\+]*$
This will only permit one plus sign per string.
Demo Link
Explanation:
^ #From start of string
[^\+]* #Match 0 or more non plus characters
\+ #Match 1 plus character
[^\+]* #Match 0 or more non plus characters
$ #End of string
edit, I just read the comments under the question, I didn't actually steal the commented regex (it just happens to be intellectual convergence):
Whoops, when using matches disregard ^ and $ anchors.
input.matches("[^\\+]*?\+[^\\+]*")

price formatting fails to match commas

I have this test:
#Test
public void succeedsWhenFormatWithTwoCommas(){
String input = "#,###,###.##";
PriceFormatValidator priceFormatValidator = new PriceFormatValidator();
boolean answer = priceFormatValidator.validate(input);
assertTrue(answer);
}
and it fails when it runs this code:
public boolean validate(String input) {
Pattern pattern = Pattern.compile("^#{1,3}(,?#{3})?(\\.#{0,3})?$");
Matcher matcher = pattern.matcher(input);
boolean isValid = matcher.matches();
return isValid;
}
why is that
Your ^#{1,3}(,?#{3})?(\\.#{0,3})?$ regex only allows 1 or zero ,### inside because (,?#{3})? matches an optional sequence of one or zero , followed with exactly 3 # symbols.
You need to turn the (,?#{3})? part into (,#{3})* to allow zero or more sequences of , + three # symbols.
Use
"^#{1,3}(,#{3})*(\\.#{0,3})?$"
See the regex demo.
The whole pattern will now match the following:
^ - start of string
#{1,3} - one to three #
(,#{3})* - zero or more ,+3 # symbols sequences
(\\.#{0,3})? - an optional . + 0 to 3 # symbols
$ - end of string.
NOTE: The (\\.#{0,3})? at the end allows a trailing .. If you do not want that, change it to (\\.#{1,3})?.
NOTE 2: If you are not using the captured values (those matched with (...) patterns), it is a good idea to change capturing groups into non-capturing ones (i.e. (...) with (?:...)).
You can replece your pattern by:
Pattern pattern = Pattern.compile("^#{1,3}(,?#{3}){1,2}(\\.#{0,3})?$");

Java Regex Word Extract exclude with special char

below are the String values
"method" <in> abs
("method") <in> abs
method <in> abs
i want to extract only the Word method, i tries with below regex
"(^[^\\<]*)" its included the special char also
O/p for the above regex
"method"
("method")
method
my expected output
method
method
method
^\\W*(\\w+)
You can use this and grab the group 1 or capture 1.See demo.
https://regex101.com/r/sS2dM8/20
A couple of words on your "(^[^<]*)" regex: it does not match because it has beginning of string anchor ^ after ", which is never the case. However, even if you remove it "([^<]*)", it will not match the last case where " and ( are missing. You need to make them optional. And note the brackets must escaped, and the order of quotes and brackets is different than in your input.
So, your regex could be fixed as
^\(?"?(\b[^<]*)\b"?\)?(?=\s+<)
See demo
However, I'd suggest using a replaceAll approach:
String rx = "(?s)\\(?\"?(.*?)\"?\\)?\\s+<.*";
System.out.println("\"My method\" <in> abs".replaceAll(rx, "$1"));
See IDEONE demo
If the strings start with ("My method, you can also add ^ to the beginning of the pattern: String rx = "(?s)^\\(?\"?(.*?)\"?\\)?\\s+<.*";.
The regex (?s)^\\(?\"?(.*?)\"?\\)?\\s+<.* matches:
(?s) makes . match a newline symbol (may not be necessary)
^ - matches the beginning of a string
\\(? - matches an optional (
\"? - matches an optional "
(.*?) - matches and captures into Group 1 any characters as few as possible
\"? - matches an optional "
\\)? - matches an optional )
\\s+ - matches 1 or more whitespace
< - matches a <
.* - matches 0 or more characters to the end of string.
With $1, we restore the group 1 text in the resulting string.
In fact it is not too complicated.
Here is my answer:
Pattern pattern = Pattern.compile("([a-zA-Z]+)");
String[] myStrs = {
"\"method\"",
"(\"method\")",
"method"
};
for(String s:myStrs) {
Matcher matcher = pattern.matcher(s);
if(matcher.find()) {
System.out.println( matcher.group(0) );
}
}
The output is:
method
method
method
You just need to use:
[a-zA-Z]+

Categories