Consider:
String str = "XYhaku(ABH1235-123548)";
From the above string, I need only "ABH1235-123548" and so far I created a regular expression:
Pattern.compile("ABH\\d+")
But it returns false. So what the correct regular expression for it?
I would just grab whatever is in the parenthesis:
Pattern p = Pattern.compile("\\((?<data>[A-Z\\d]+\\-\\d+)\\)");
Or, if you want to be even more open (any parenthesis):
Pattern p = Pattern.compile("\\((?<data>.+\\)\\)");
Then just nab it:
String s = /* some input */;
Matcher m = p.matcher(s);
if (m.find()) { //just find first
String tag = m.group("data"); //ABH1235-123548
}
\d only matches digits. To include other characters, use a character class:
Pattern.compile("ABH[\\d-]+")
Note that the - must be placed first or last in the character class, because otherwise it will be treated as a range indicator ([A-Z] matching every letter between A and Z, for example). Another way to avoid that would be to escape it, but that adds two more backslashes to your string...
Related
I want to write a regular expression in java which will accept the String having alphabets, numbers, - and space any number of times any where.
The string should only contain above mentioned and no other special characters. How to code the regular expression in java?
I tried the following, It works when I run it as a java application.
But the same code when I run in web application and accept the values through XML, It accepts '/'.
String test1 = null;
Scanner scan = new Scanner(System.in);
test1 = scan.nextLine();
String alphaExp = "^[a-zA-Z0-9-]*$";
Pattern r = Pattern.compile(alphaExp);
Matcher m = r.matcher(test1);
boolean flag = m.lookingAt();
System.out.println(flag);
Can anyone help me on this please?
You can try to use POSIX character classes (see here):
Pattern p = Pattern.compile("^[\\p{Alnum}\\p{Space}-]*$");
Matcher m = p.matcher("asfsdf 1212sdfsd-gf121sdg5 4s");
boolean b = m.lookingAt();
With this regular expression if the string you pass contain anything else than alphanumeric or space characters it will be a no match result.
I think you're just missing a space from the character class - since you mentioned it in your text ^[a-zA-Z0-9 -]*$
You can add the Pattern.MULTILINE flag too so you can specify how the pattern handles the lines:
String alphaExp = "^[a-zA-Z0-9 -]*$";
Pattern r = Pattern.compile(alphaExp, Pattern.MULTILINE);
Matcher m = r.matcher(test1);
boolean flag = m.lookingAt();
Pay attention to the fact that * quantifier will make it match to everything including no matches (0 or more times, like empty lines or blank tokens "", infinitely.
If you instead use + "[\w\d\s-\]+" it will match one or more (consider using \\ for each \ in your Java Regex code as follow: "[\\w\\d\\s-]+"
Consider that * is a quantity operator that works as {0, } and + works like {1, }
Can you help with this code?
It seems easy, but always fails.
#Test
public void normalizeString(){
StringBuilder ret = new StringBuilder();
//Matcher matches = Pattern.compile( "([A-Z0-9])" ).matcher("P-12345678-P");
Matcher matches = Pattern.compile( "([\\w])" ).matcher("P-12345678-P");
for (int i = 1; i < matches.groupCount(); i++)
ret.append(matches.group(i));
assertEquals("P12345678P", ret.toString());
}
Constructing a Matcher does not automatically perform any matching. That's in part because Matcher supports two distinct matching behaviors, differing in whether the match is implicitly anchored to the beginning of the Matcher's region. It appears that you could achieve your desired result like so:
#Test
public void normalizeString(){
StringBuilder ret = new StringBuilder();
Matcher matches = Pattern.compile( "[A-Z0-9]+" ).matcher("P-12345678-P");
while (matches.find()) {
ret.append(matches.group());
}
assertEquals("P12345678P", ret.toString());
}
Note in particular the invocation of Matcher.find(), which was a key omission from your version. Also, the nullary Matcher.group() returns the substring matched by the last find().
Furthermore, although your use of Matcher.groupCount() isn't exactly wrong, it does lead me suspect that you have the wrong idea about what it does. In particular, in your code it will always return 1 -- it inquires about the pattern, not about matches to it.
First of all you don't need to add any group because entire match can be always accessed by group 0, so instead of
(regex) and group(1)
you can use
regex and group(0)
Next thing is that \\w is already character class so you don't need to surround it with another [ ], because it will be similar to [[a-z]] which is same as [a-z].
Now in your
for (int i = 1; i < matches.groupCount(); i++)
ret.append(matches.group(i));
you will iterate over all groups from 1 but you will exclude last group, because they are indexed from 1 so n so i<n will not include n. You would need to use i <= matches.groupCount() instead.
Also it looks like you are confusing something. This loop will not find all matches of regex in input. Such loop is used to iterate over groups in used regex after match for regex was found.
So if regex would be something like (\w(\w))c and your match would be like abc then
for (int i = 1; i < matches.groupCount(); i++)
System.out.println(matches.group(i));
would print
ab
b
because
first group contains two characters (\w(\w)) before c
second group is the one inside first one, right after first character.
But to print them you actually would need to first let regex engine iterate over your input and find() match, or check if entire input matches() regex, otherwise you would get IllegalStateException because regex engine can't know from which match you want to get your groups (there can be many matches of regex in input).
So what you may want to use is something like
StringBuilder ret = new StringBuilder();
Matcher matches = Pattern.compile( "[A-Z0-9]" ).matcher("P-12345678-P");
while (matches.find()){//find next match
ret.append(matches.group(0));
}
assertEquals("P12345678P", ret.toString());
Other way around (and probably simpler solution) would be actually removing all characters you don't want from your input. So you could just use replaceAll and negated character class [^...] like
String input = "P-12345678-P";
String result = input.replaceAll("[^A-Z0-9]+", "");
which will produce new string in which all characters which are not A-Z0-9 will be removed (replaced with "").
I'm trying to read a pattern input string. Let's assume this input string is separated in each by new space.
The first numeric-string (one, two, three, ... ) is mandatory, optional numeric-string can be optional to represent up until it meets operand then comes after same numeric-string pattern.
For example,
ONE TWO ADD TWO FIVE // which is valid
ONE ADD TWO // which is valid
TWO SUB FIVE // also is valid
SUB TWO // is not valid
How can I approach using regex to find a pattern? I barely started using Java's Pattern and Matcher class to start with.
public boolean validate(String inputStr) {
// pattern regex
/* (zero|one|two|three|four|five|six|seven|eight|nine)\\s(zero|one|two|three|four|five|six|seven|eight|nine)?\\s(add|sub) */
Pattern p = Pattern.compile("(zero|one|two|three|four|five|six|seven|eight|nine)\\s[(zero|one|two|three|four|five|six|seven|eight|nine)]?\\s(add|sub|divide|multiply)\\s(zero|one|two|three|four|five|six|seven|eight|nine)", Pattern.CASE_INSENSITIVE);
// input string
Matcher m = p.matcher(inputStr);
return m.matches();
}
It returns false.
boolean isValidate = validate("One add two ");
System.out.println(isValidate);
Can anyone help me with this? Thanks.
This is because when ever the optional numeric string is not present it will take the space also after that so all together it will match for twospaces after the first string which will not be there and you are getting false.So move the space also inside the square bracket.
try this,
Pattern p = Pattern.compile("((zero|one|two|three|four|five|six|seven|eight|nine)\\s){1,2}(add|sub|divide|multiply)(\\s(zero|one|two|three|four|five|six|seven|eight|nine)){1,2}", Pattern.CASE_INSENSITIVE);
I'm new to regular expressions...
I have a problem about the regular expression that will match a string only contains:
0-9, a-z, A-Z, space, comma, and single quote?
If the string contain any char that doesn't belong the above expression, it is invalid.
Is that something like:
Pattern p = Pattern.compile("\\s[a-zA-Z0-9,']");
Matcher m = p.matcher("to be or not");
boolean b = m.lookingAt();
Thank you!
Fix your expression adding bounds:
Pattern p = Pattern.compile("^\\s[a-zA-Z0-9,']+$");
Now your can say m.find() and be sure that this returns true only if your string contains the enumerated symbols only.
BTW is it mistake that you put \\s in the beginning? This means that the string must start from single white space. If this is not the requirement just remove this.
You need to include the space inside the character class and allow more than one character:
Pattern p = Pattern.compile("[\\sa-zA-Z0-9,']*");
Matcher m = p.matcher("to be or not");
boolean b = m.matches();
Note that \s will match any whitespace character (including newlines, tabs, carriage returns, etc.) and not only the space character.
You probably want something like this:
"^[a-zA-Z0-9,' ]+$"
I want to search for a given string pattern in an input sting.
For Eg.
String URL = "https://localhost:8080/sbs/01.00/sip/dreamworks/v/01.00/cui/print/$fwVer/{$fwVer}/$lang/en/$model/{$model}/$region/us/$imageBg/{$imageBg}/$imageH/{$imageH}/$imageSz/{$imageSz}/$imageW/{$imageW}/movie/Kung_Fu_Panda_two/categories/3D_Pix/item/{item}/_back/2?$uniqueID={$uniqueID}"
Now I need to search whether the string URL contains "/{item}/". Please help me.
This is an example. Actually I need is check whether the URL contains a string matching "/{a-zA-Z0-9}/"
You can use the Pattern class for this. If you want to match only word characters inside the {} then you can use the following regex. \w is a shorthand for [a-zA-Z0-9_]. If you are ok with _ then use \w or else use [a-zA-Z0-9].
String URL = "https://localhost:8080/sbs/01.00/sip/dreamworks/v/01.00/cui/print/$fwVer/{$fwVer}/$lang/en/$model/{$model}/$region/us/$imageBg/{$imageBg}/$imageH/{$imageH}/$imageSz/{$imageSz}/$imageW/{$imageW}/movie/Kung_Fu_Panda_two/categories/3D_Pix/item/{item}/_back/2?$uniqueID={$uniqueID}";
Pattern pattern = Pattern.compile("/\\{\\w+\\}/");
Matcher matcher = pattern.matcher(URL);
if (matcher.find()) {
System.out.println(matcher.group(0)); //prints /{item}/
} else {
System.out.println("Match not found");
}
That's just a matter of String.contains:
if (input.contains("{item}"))
If you need to know where it occurs, you can use indexOf:
int index = input.indexOf("{item}");
if (index != -1) // -1 means "not found"
{
...
}
That's fine for matching exact strings - if you need real patterns (e.g. "three digits followed by at most 2 letters A-C") then you should look into regular expressions.
EDIT: Okay, it sounds like you do want regular expressions. You might want something like this:
private static final Pattern URL_PATTERN =
Pattern.compile("/\\{[a-zA-Z0-9]+\\}/");
...
if (URL_PATTERN.matcher(input).find())
If you want to check if some string is present in another string, use something like String.contains
If you want to check if some pattern is present in a string, append and prepend the pattern with '.*'. The result will accept strings that contain the pattern.
Example: Suppose you have some regex a(b|c) that checks if a string matches ab or ac
.*(a(b|c)).* will check if a string contains a ab or ac.
A disadvantage of this method is that it will not give you the location of the match, you can use java.util.Mather.find() if you need the position of the match.
You can do it using string.indexOf("{item}"). If the result is greater than -1 {item} is in the string