How to use Pattern and Matcher? [duplicate] - java

This question already has answers here:
Java regular expressions and dollar sign
(5 answers)
Closed 4 years ago.
I have two simple questions about Pattern.
First one is reading the given name(s) and surname. I need to tell whether they contain numbers or punctuation characters. If not, it's a valid name. Whatever I input, the output is
This is not a valid name.
What am I doing wrong?
Scanner input = new Scanner(System.in);
System.out.print("Enter: ");
String name = input.next();
Pattern p = Pattern.compile("[A-Za-z]");
Matcher m = p.matcher(name);
boolean n = m.matches();
if (n == true) {
System.out.println(name);
}
else {
System.out.println("This is not a valid name.");
}
The second question: I read a list of salary amounts that start with a dollar sign $ and followed by a non-negative number, and save the valid salaries into an array. My program can output an array, but it can't distinguish $.
Scanner sc = new Scanner(System.in);
System.out.print("Enter Salary: ");
String salary = sc.nextLine();
Pattern pattern = Pattern.compile("($+)(\\d)");
Matcher matcher = pattern.matcher(salary);
String[] slArray=pattern.split(salary);
System.out.print(Arrays.toString(slArray));

I wouldn't even use a formal matcher for these simple use cases. Java's String#matches() method can just as easily handle this. To check for a valid name using your rules, you could try this:
String name = input.next();
if (name.matches("[A-Za-z]+")) {
System.out.println(name);
}
else {
System.out.println("This is not a valid name.");
}
And to check salary amounts, you could use:
String salary = sc.nextLine();
if (salary.matches("\\$\\d+(?:\\.\\d+)?")) {
System.out.println("Salary is valid.");
}
A note on the second pattern \$\d+(?:\.\d+)?, we need to escape dollar sign, because it is a regex metacharacter. Also, I did not use ^ and $ anchors in any of the two patterns, because String#matches() by default applies the pattern to the entire string.
Edit:
If you have multiple currency amounts in a given line, then split by whitespace to get an array of currency strings:
String input = "$23 $24.50 $25.10";
String[] currencies = input.split("\\s+");
Then, use the above matching logic to check each entry.

Explanation
Your regex pattern is wrong. You are missing the symbol to repeat the pattern.
Currently you have [A-Za-z] which matches only one letter. You can repeat using
* - 0 to infinite repetitions
? - 0 to 1 repetitions
+ - 1 to infinite repetitions
{x, y} - x to y repetitions
So you probably wanted a pattern like [A-Za-z]+. You can use sites like regex101.com to test your regex patterns (it also explains the pattern in detail). See regex101/n6OZGp for an example of your pattern.
Here is a tutorial on the regex repetition symbols.
For the second problem you need to know that $ is a special symbol in regex. It stands for the end of a line. If you want to match the $ symbol instead you need to escape it by adding a backslash:
"\\$\\d+"
Note that you need to add two backslashes because the backslash itself has a special meaning in Java. So you first need to escape the backslash using a backslash so that the string itself contains a backslash:
\$\d+
which then is passed to the regex engine. The same if you want to match a + sign, you need to escape it.
Notes
If you just want to check a given String against a pattern you can use the String#matches method:
String name = "John";
if (name.matches("[A-Za-z]+")) {
// Do something
}
Also note that there are shorthand character classes like \w (word character) which is short for [A-Za-z0-9_].
Code like
if (n == true) { ... }
can be simplified to just
if (n) { ... }
Because n already is a boolean, you don't need to test it against true anymore.
To parse currency values you should consider using already given methods like
NumberFormat format = NumberFormat.getCurrencyInstance();
Number num = format.parse("$5.34");
See the documentation of the class for examples.

Related

How do I specify input from a user?

I'm writing a program and I would like to call for a specially-formed string that consists of 3 letters (can be upper-case or lower-case), followed by a dash and then followed by 4 numbers.
For example, "abc-1234".
The value must follow this pattern, otherwise, they are invalid.
You can use String regex with String's match method:
String pattern = "[\\w^\\d]{3}-\\d{4}";
This is a template for 'ccc-dddd', where d is a number from [0-9] and c is a word character, not including [0-9].
Then, you can see if your input string matches the template:
if(input.matches(pattern)) { //input is input string
...
} else {
System.out.println("Input does not match template ccc-dddd");
}

Figuring out regex for the mentioned condition

I came across the concept of regex recently and was poised to solve the problem using just the regex inside matches() and length() method of String class. The problem was related to password matching.Here are the three conditions that need to be considered:
A password must have at least eight characters.
A password consists of only letters and digits.
A password must contain at least two digits.
I was able to do this problem by using various other String and Character class methods but I need to do them only by regex.What I have tried helps me with most of the test cases but some of them(test cases) are still failing.Since, I am learning regex implementation so please help me with what I am missing or doing wrong. Below is what I tried:
public class CheckPassword {
public static void main(String[]args){
Scanner sc = new Scanner(System.in);
System.out.println("Enter your password:\n");
String str1 = sc.next();
//String dig2 = "\\d{2}";
//String letter = ".*[A-Z].*";
//String letter1 = ".*[a-z].*";
//if(str1.length() >= 8 && str1.matches(dig2) &&(str1.matches(letter) || str1.matches(letter1)) )
if(str1.length() >= 8 && str1.matches("^(?=.*[A-Z])(?=.*[a-z])(?=.*\\d{2,})(?=.*[0-9])[A-Z0-9a-z]+$"))
System.out.println("Valid Password");
else
System.out.println("Invalid Password");
}
}
EDIT
Okay So I figured out the first and second case just I am having problem in appending the third case with them i.e. contains at least 2 digits.
if(str1.length() >= 8 && str1.matches("[a-zA-Z0-9]*"))
//works exclusive of the third criterion
You may actually use a single regex inside matches() to validate all 3 conditions:
A password must have at least eight characters and
A password consists of only letters and digits - use \p{Alnum}{8,} in the consuming part
A password must contain at least two digits - use the (?=(?:[a-zA-Z]*\d){2}) positive lookahead anchored at the start
Combining all three:
.matches("(?=(?:[a-zA-Z]*\\d){2})\\p{Alnum}{8,}")
Since matches() method anchors the pattern by default (i.e. it requires a full string match) no ^ and $ anchors are necessary.
Details
^ - implicit in matches() - start of string
(?=(?:[a-zA-Z]*\d){2}) - a positive lookahead ((?=...)) that requires the presence of exactly two sequences of:
[a-zA-Z]* - zero or more ASCII letters
\d - an ASCII digit
\p{Alnum}{8,} - 8 or more alphanumeric chars (ASCII only)
$ - implicit in matches() - end of string.
Okay Thank you #TDG and M.Aroosi for giving your precious time. I have figured out the solution and this solution satisfies all cases
// answer edited based on OP's working comment.
String dig2 = "^(?=.*?\\d.*\\d)[a-zA-Z0-9]{8,}$";
if(str1.matches(dig2))
{
//body
}

How to check if a string contains only digits in Java

In Java for String class there is a method called matches, how to use this method to check if my string is having only digits using regular expression. I tried with below examples, but both of them returned me false as result.
String regex = "[0-9]";
String data = "23343453";
System.out.println(data.matches(regex));
String regex = "^[0-9]";
String data = "23343453";
System.out.println(data.matches(regex));
Try
String regex = "[0-9]+";
or
String regex = "\\d+";
As per Java regular expressions, the + means "one or more times" and \d means "a digit".
Note: the "double backslash" is an escape sequence to get a single backslash - therefore, \\d in a java String gives you the actual result: \d
References:
Java Regular Expressions
Java Character Escape Sequences
Edit: due to some confusion in other answers, I am writing a test case and will explain some more things in detail.
Firstly, if you are in doubt about the correctness of this solution (or others), please run this test case:
String regex = "\\d+";
// positive test cases, should all be "true"
System.out.println("1".matches(regex));
System.out.println("12345".matches(regex));
System.out.println("123456789".matches(regex));
// negative test cases, should all be "false"
System.out.println("".matches(regex));
System.out.println("foo".matches(regex));
System.out.println("aa123bb".matches(regex));
Question 1:
Isn't it necessary to add ^ and $ to the regex, so it won't match "aa123bb" ?
No. In java, the matches method (which was specified in the question) matches a complete string, not fragments. In other words, it is not necessary to use ^\\d+$ (even though it is also correct). Please see the last negative test case.
Please note that if you use an online "regex checker" then this may behave differently. To match fragments of a string in Java, you can use the find method instead, described in detail here:
Difference between matches() and find() in Java Regex
Question 2:
Won't this regex also match the empty string, "" ?*
No. A regex \\d* would match the empty string, but \\d+ does not. The star * means zero or more, whereas the plus + means one or more. Please see the first negative test case.
Question 3
Isn't it faster to compile a regex Pattern?
Yes. It is indeed faster to compile a regex Pattern once, rather than on every invocation of matches, and so if performance implications are important then a Pattern can be compiled and used like this:
Pattern pattern = Pattern.compile(regex);
System.out.println(pattern.matcher("1").matches());
System.out.println(pattern.matcher("12345").matches());
System.out.println(pattern.matcher("123456789").matches());
You can also use NumberUtil.isNumber(String str) from Apache Commons
Using regular expressions is costly in terms of performance. Trying to parse string as a long value is inefficient and unreliable, and may be not what you need.
What I suggest is to simply check if each character is a digit, what can be efficiently done using Java 8 lambda expressions:
boolean isNumeric = someString.chars().allMatch(x -> Character.isDigit(x));
One more solution, that hasn't been posted, yet:
String regex = "\\p{Digit}+"; // uses POSIX character class
You must allow for more than a digit (the + sign) as in:
String regex = "[0-9]+";
String data = "23343453";
System.out.println(data.matches(regex));
Long.parseLong(data)
and catch exception, it handles minus sign.
Although the number of digits is limited this actually creates a variable of the data which can be used, which is, I would imagine, the most common use-case.
We can use either Pattern.compile("[0-9]+.[0-9]+") or Pattern.compile("\\d+.\\d+"). They have the same meaning.
the pattern [0-9] means digit. The same as '\d'.
'+' means it appears more times.
'.' for integer or float.
Try following code:
import java.util.regex.Pattern;
public class PatternSample {
public boolean containNumbersOnly(String source){
boolean result = false;
Pattern pattern = Pattern.compile("[0-9]+.[0-9]+"); //correct pattern for both float and integer.
pattern = Pattern.compile("\\d+.\\d+"); //correct pattern for both float and integer.
result = pattern.matcher(source).matches();
if(result){
System.out.println("\"" + source + "\"" + " is a number");
}else
System.out.println("\"" + source + "\"" + " is a String");
return result;
}
public static void main(String[] args){
PatternSample obj = new PatternSample();
obj.containNumbersOnly("123456.a");
obj.containNumbersOnly("123456 ");
obj.containNumbersOnly("123456");
obj.containNumbersOnly("0123456.0");
obj.containNumbersOnly("0123456a.0");
}
}
Output:
"123456.a" is a String
"123456 " is a String
"123456" is a number
"0123456.0" is a number
"0123456a.0" is a String
According to Oracle's Java Documentation:
private static final Pattern NUMBER_PATTERN = Pattern.compile(
"[\\x00-\\x20]*[+-]?(NaN|Infinity|((((\\p{Digit}+)(\\.)?((\\p{Digit}+)?)" +
"([eE][+-]?(\\p{Digit}+))?)|(\\.((\\p{Digit}+))([eE][+-]?(\\p{Digit}+))?)|" +
"(((0[xX](\\p{XDigit}+)(\\.)?)|(0[xX](\\p{XDigit}+)?(\\.)(\\p{XDigit}+)))" +
"[pP][+-]?(\\p{Digit}+)))[fFdD]?))[\\x00-\\x20]*");
boolean isNumber(String s){
return NUMBER_PATTERN.matcher(s).matches()
}
Refer to org.apache.commons.lang3.StringUtils
public static boolean isNumeric(CharSequence cs) {
if (cs == null || cs.length() == 0) {
return false;
} else {
int sz = cs.length();
for(int i = 0; i < sz; ++i) {
if (!Character.isDigit(cs.charAt(i))) {
return false;
}
}
return true;
}
}
In Java for String class, there is a method called matches(). With help of this method you can validate the regex expression along with your string.
String regex = "^[\\d]{4}$";
String value = "1234";
System.out.println(data.matches(value));
The Explanation for the above regex expression is:-
^ - Indicates the start of the regex expression.
[] - Inside this you have to describe your own conditions.
\\\d - Only allows digits. You can use '\\d'or 0-9 inside the bracket both are same.
{4} - This condition allows exactly 4 digits. You can change the number according to your need.
$ - Indicates the end of the regex expression.
Note: You can remove the {4} and specify + which means one or more times, or * which means zero or more times, or ? which means once or none.
For more reference please go through this website: https://www.rexegg.com/regex-quickstart.html
Offical regex way
I would use this regex for integers:
^[-1-9]\d*$
This will also work in other programming languages because it's more specific and doesn't make any assumptions about how different programming languages may interpret or handle regex.
Also works in Java
\\d+
Questions regarding ^ and $
As #vikingsteve has pointed out in java, the matches method matches a complete string, not parts of a string. In other words, it is unnecessary to use ^\d+$ (even though it is the official way of regex).
Online regex checkers are more strict and therefore they will behave differently than how Java handles regex.
Try this part of code:
void containsOnlyNumbers(String str)
{
try {
Integer num = Integer.valueOf(str);
System.out.println("is a number");
} catch (NumberFormatException e) {
// TODO: handle exception
System.out.println("is not a number");
}
}

regex needed which matches for two sample string

I have two input strings :
this-is-a-sample-string-%7b3DES%7dFPvKTjGHUA3lD9Us70rfjQ==?Id=113690_2&Index=0&Referrer=IC
this-is-a-sample-string-%7b3DES%7dFPvKTjGHUA3lD9Us70rfjQ==
What I want is only the %7b3DES%7dFPvKTjGHUA3lD9Us70rfjQ== from both of the sample strings.
I tried by using the regex [a-zA-Z-]+-(.*) which works fine for the second input string.
String inputString = "this-is-a-sample-string-%7b3DES%7dFPvKTjGHUA3lD9Us70rfjQ==";
String regexString = "[a-zA-Z-]+-(.*)";
Pattern pattern = Pattern.compile(regexString);
Matcher matcher = pattern.matcher(inputString);
if(matcher.matches()) {
System.out.println("--->" + matcher.group(1) + "<---");
} else {
System.out.println("nope");
}
The following patterns match the desired group with the limited information and examples provided:
-([^-?]*)(?:\?|$)
.*-(.*?)(?:\?|$)
The first will match a hyphen then group all the characters up to either the ? or the end of the string.
The second matches as many characters and hyphens as possible followed by the smallest string to either the next question mark or the end of the string.
There are dozens of ways of writing something that will match this text though so I'm kinda just guessing if this is what you wanted. If this is not what you're after please elaborate on what exactly you're trying to accomplish.

Regex to get first number in string with other characters

I'm new to regular expressions, and was wondering how I could get only the first number in a string like 100 2011-10-20 14:28:55. In this case, I'd want it to return 100, but the number could also be shorter or longer.
I was thinking about something like [0-9]+, but it takes every single number separately (100,2001,10,...)
Thank you.
/^[^\d]*(\d+)/
This will start at the beginning, skip any non-digits, and match the first sequence of digits it finds
EDIT:
this Regex will match the first group of numbers, but, as pointed out in other answers, parseInt is a better solution if you know the number is at the beginning of the string
Try this to match for first number in string (which can be not at the beginning of the string):
String s = "2011-10-20 525 14:28:55 10";
Pattern p = Pattern.compile("(^|\\s)([0-9]+)($|\\s)");
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(2));
}
Just
([0-9]+) .*
If you always have the space after the first number, this will work
Assuming there's always a space between the first two numbers, then
preg_match('/^(\d+)/', $number_string, $matches);
$number = $matches[1]; // 100
But for something like this, you'd be better off using simple string operations:
$space_pos = strpos($number_string, ' ');
$number = substr($number_string, 0, $space_pos);
Regexs are computationally expensive, and should be avoided if possible.
the below code would do the trick.
Integer num = Integer.parseInt("100 2011-10-20 14:28:55");
[0-9] means the numbers 0-9 can be used the + means 1 or more times. if you use [0-9]{3} will get you 3 numbers
Try ^(?'num'[0-9]+).*$ which forces it to start at the beginning, read a number, store it to 'num' and consume the remainder without binding.
This string extension works perfectly, even when string not starts with number.
return 1234 in each case - "1234asdfwewf", "%sdfsr1234" "## # 1234"
public static string GetFirstNumber(this string source)
{
if (string.IsNullOrEmpty(source) == false)
{
// take non digits from string start
string notNumber = new string(source.TakeWhile(c => Char.IsDigit(c) == false).ToArray());
if (string.IsNullOrEmpty(notNumber) == false)
{
//replace non digit chars from string start
source = source.Replace(notNumber, string.Empty);
}
//take digits from string start
source = new string(source.TakeWhile(char.IsDigit).ToArray());
}
return source;
}
NOTE: In Java, when you define the patterns as string literals, do not forget to use double backslashes to define a regex escaping backslash (\. = "\\.").
To get the number that appears at the start or beginning of a string you may consider using
^[0-9]*\.?[0-9]+ # Float or integer, leading digit may be missing (e.g, .35)
^-?[0-9]*\.?[0-9]+ # Optional - before number (e.g. -.55, -100)
^[-+]?[0-9]*\.?[0-9]+ # Optional + or - before number (e.g. -3.5, +30)
See this regex demo.
If you want to also match numbers with scientific notation at the start of the string, use
^[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Just number
^-?[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Number with an optional -
^[-+]?[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Number with an optional - or +
See this regex demo.
To make sure there is no other digit on the right, add a \b word boundary, or a (?!\d)
or (?!\.?\d) negative lookahead that will fail the match if there is any digit (or . and a digit) on the right.
public static void main(String []args){
Scanner s=new Scanner(System.in);
String str=s.nextLine();
Pattern p=Pattern.compile("[0-9]+");
Matcher m=p.matcher(str);
while(m.find()){
System.out.println(m.group()+" ");
}
\d+
\d stands for any decimal while + extends it to any other decimal coming directly after, until there is a non number character like a space or letter

Categories