Java/Regex - finding a characters anywhere in a String

Java/Regex - finding a characters anywhere in a String - java

I have a series of strings that I am searching for a particular combination of characters in. I am looking for a digit, following by the letter m or M, followed by a digit, then followed by the letter f or F.
An example string is - "Class (4) 1m5f Good" - The text in bold is what I want to extract from the string.
Here is the code I have, that doesn't work.
Pattern distancePattern = Pattern.compile("\\^[0-9]{1}[m|M]{1}[0-9]{1}[f|F]{1}$\\");
Matcher distanceMatcher = distancePattern.matcher(raceDetails.toString());
while (distanceMatcher.find()) {
String word= distanceMatcher.group(0);
System.out.println(word);
}
Can anyone suggest what I am doing wrong?

The ^ and $ characters at the start and end of your regex are anchors - they're limiting you to strings that only consist of the pattern you're looking for. The first step is to remove those.
You can then either use word boundaries (\b) to limit the pattern you're looking for to be an entire word, like this:
Pattern distancePattern = Pattern.compile("\\b\\d[mM]\\d[fF]\\b");
...or, if you don't mind your pattern appearing in the middle of a word, e.g., "Class (4) a1m5f Good", you can drop the word boundaries:
Pattern distancePattern = Pattern.compile("\\d[mM]\\d[fF]");
Quick notes:
You don't really need the {1}s everywhere - the default assumption
is that a character or character class is happening once.
You can
replace the [0-9] character class with \d (it means the same
thing).
Both links are to regular-expressions.info, a great resource for learning about regexes that I highly recommend you check out :)

I'd use word boundaries \b:
\b\d[mM]\d[fF]\b
for java, backslashes are to be escaped:
\\b\\d[mM]\\d[fF]\\b
{1} is superfluous
[m|M] means mor | or M

For the requirement of a digit, following by the letter m or M, followed by a digit, then followed by the letter f or F regex can be simplified to:
Pattern distancePattern = Pattern.compile("(?i)\\dm\\df");
Where:
(?i) - For ignore case
\\d - For digits [0-9]

Related

Exclude a letter in Regex Pattern

I am trying to create a Regex pattern for <String>-<String>. This is my current pattern:
(\w+\-\w+).
The first String is not allowed to be "W". However, it can still contain "W"s if it's more than one letter long.
For example:
W-80 -> invalid
W42-80 -> valid
How can this be achieved?

So your first string can be either: one character but not W or 2+ characters. Simple pattern to achieve that is:
([^W]|\w{2,})-\w+
But this pattern is not entirely correct, because now it allows any character for first part, but originally only \w characters were expected to be allowed. So correct pattern is:
([\w&&[^W]]|\w{2,})-\w+
Pattern [\w&&[^W]] means any character from \w character class except W character.

Just restrict the last char to "any word char except 'W'".
There are a couple of ways to do this:
Negative look-behind (easy to read):
^\w+(?<!W)-\w+$
See live demo.
Negated intersection (trainwreck to read):
^\w*[\w&&[^W]]-\w+$
See live demo.
——
The question has shifted. Here’s a new take:
^.+(?<!^W)-\w+
This allows anything as the first term except just "W".

Regex first character not matching

I am having some Java Pattern problems. This is my pattern:
"^[\\p{L}\\p{Digit}~._-]+$"
It matches any letter of the US-ASCII, numerals, some special characters, basically anything that wouldn't scramble an URL.
What I would like is to find the first letter in a word that does not match this pattern. Basically the user sends a text as an input and I have to validate it and to throw an exception if I find an illegal character.
I tried negating this pattern, but it wouldn't compile properly. Also find() didn't help out much.
A legal input would be hello while ?hello should not be, and my exception should point out that ? is not proper.
I would prefer a suggestion using Java's Matcher, Pattern or something using util.regex. Its not a necessity, but checking each character in the string individually is not a solution.
Edit: I came up with a better regex to match unreserved URI characters

Try this :
^[\\p{L}\\p{Digit}.'-.'_]*([^\\p{L}\\p{Digit}.'-.'_]).*$
The first character non matching is the group n°1
I made a few try here : http://fiddle.re/gkkzm61
Explanation :
I negate your pattern, so i built this :
[^\\p{L}\\p{Digit}.'-.'_] [^...] means every character except for
^ ^ the following ones.
| your pattern inside |
The pattern has 3 parts :
^[\\p{L}\\p{Digit}.'-.'_]*
Checks the regex from the first character until he meets a non matching character
([^\\p{L}\\p{Digit}.'-.'_])
The non-matching character (negation) inside a capturing group
.*$
Any character until the end of the string.
Hope it helps you
EDIT :
The correct regex shoud be :
^[\\p{L}\\p{Digit}~._-]*([^\\p{L}\\p{Digit}~._-]).*$
It is the same method, i only change the contents of the first and second part.
I tried and it seems to work.

The "^[\\p{L}\\p{Digit}.'-.'_]+$" pattern matches any string containing 1+ characters defined inside the character class. Note that double ' and . are suspicious and you might be unaware of the fact that '-. creates a range and matches '()*+,-.. If it is not on purpose, I think you meant to use .'_-.
To check if a string starts with a character other than the one defined in the character class, you can negated the character class, and check the first character in the string only:
if (str.matches("[^\\p{L}\\p{Digit}.'_-].*")) {
/* String starts with the disallowed character */
}
I also think you can shorten the regex to "(?U)[^\\w.'-].*". At any rate, \\p{Digit} can be replaced with \\d.

Try out this one to find the first non valid char:
Pattern negPattern = Pattern.compile(".*?([^\\p{L}^\\p{Digit}^.^'-.'^_]+).*");
Matcher matcher = negPattern.matcher("hel?lo");
if (matcher.matches())
{
System.out.println("'" + matcher.group(1).charAt(0) + "'");
}

How to subString based on the special character?

I have String like below ,I want to get subString If any special character is there.
String myString="Regular $express&ions are <patterns <that can# be %matched against *strings";
I want out like below
express
inos
patterns
that
matched
Strings
Any one help me.Thanks in Advance

Note: as #MaxZoom pointed out, it seems that I didn't understand the OP's problem properly. The OP apparently does not want to split the string on special characters, but rather keep the words starting with a special character. The former is adressed by my answer, the latter by #MaxZoom's answer.
You should take a look at the String.split() method.
Give it a regexp matching all the characters you want, and you'll get an array of all the strings you want. For instance:
String myString = "Regular $express&ions are <patterns <that can# be %matched against *strings";
String[] words = myString.split("[$&<#%*]");

This regex will select words that starts with special character:
[$&<%*](\w*)
explanation:
[$&<%*] match a single character present in the list below
$&<%* a single character in the list $&<%* literally (case sensitive)
1st Capturing group (\w*)
\w* match any word character [a-zA-Z0-9_]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
g modifier: global. All matches (don't return on first match)
DEMO
MATCH 1 [9-16] express
MATCH 2 [17-21] ions
MATCH 3 [27-35] patterns
MATCH 4 [37-41] that
MATCH 5 [51-58] matched
MATCH 6 [68-75] strings
Solution in Java code:
String str = "Regular $express&ions are <patterns <that can# be %matched against *strings";
Matcher matcher = Pattern.compile("[$&<%*](\\w*)").matcher(str);
List<String> words = new ArrayList<>();
while (matcher.find()) {
words.add(matcher.group(1));
}
System.out.println(words.toString());
// prints [express, ions, patterns, that, matched, strings]

Match word in String in Java

I'm trying to match Strings that contain the word "#SP" (sans quotes, case insensitive) in Java. However, I'm finding using Regexes very difficult!
Strings I need to match:
"This is a sample #sp string",
"#SP string text...",
"String text #Sp"
Strings I do not want to match:
"Anything with #Spider",
"#Spin #Spoon #SPORK"
Here's what I have so far: http://ideone.com/B7hHkR .Could someone guide me through building my regexp?
I've also tried: "\\w*\\s*#sp\\w*\\s*" to no avail.
Edit: Here's the code from IDEone:
java.util.regex.Pattern p =
java.util.regex.Pattern.compile("\\b#SP\\b",
java.util.regex.Pattern.CASE_INSENSITIVE);
java.util.regex.Matcher m = p.matcher("s #SP s");
if (m.find()) {
System.out.println("Match!");
}

(edit: positive lookbehind not needed, only matching is done, not replacement)
You are yet another victim of Java's misnamed regex matching methods.
.matches() quite unfortunately so tries to match the whole input, which is a clear violation of the definition of "regex matching" (a regex can match anywhere in the input). The method you need to use is .find().
This is a braindead API, and unfortunately Java is not the only language having such misguided method names. Python also pleads guilty.
Also, you have the problem that \\b will detect on word boundaries and # is not part of a word. You need to use an alternation detecting either the beginning of input or a space.
Your code would need to look like this (non fully qualified classes):
Pattern p = Pattern.compile("(^|\\s)#SP\\b", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("s #SP s");
if (m.find()) {
System.out.println("Match!");
}

You're doing fine, but the \b in front of the # is misleading. \b is a word boundary, but # is already not a word character (i.e. it isn't in the set [0-9A-Za-z_]). Therefore, the space before the # isn't considered a word boundary. Change to:
java.util.regex.Pattern p =
java.util.regex.Pattern.compile("(^|\\s)#SP\\b",
java.util.regex.Pattern.CASE_INSENSITIVE);
The (^|\s) means: match either ^ OR \s, where ^ means the beginning of your string (e.g. "#SP String"), and \s means a whitespace character.

The regular expression "\\w*\\s*#sp\\w*\s*" will match 0 or more words, followed by 0 or more spaces, followed by #sp, followed by 0 or more words, followed by 0 or more spaces. My suggestion is to not use \s* to break words up in your expression, instead, use \b.
"(^|\b)#sp(\b|$)"

Wierd behaviour on regexp Matcher

My regexp below is supposed to filter out capital words with a length of 8-10, where 0-2 numbers may appear. It has been working for all of my tests, but for some reason it got stuck on the string below. And n.group(0) only contains an empty string instead of the matched "word".
static final Pattern PATTERN =
Pattern.compile("\\b(?=[A-Z\\d]{9,10}\\b)(?:[A-Z]*\\d){0,2}[A-Z]*\\b");
Matcher n = LONG_PASSWORD.matcher("foo ID:636152727 bar");
while (n.find()) {
String s = n.group(0);
resultArrayList.add(s);
}
Why does my pattern match ID:636152727?
Some examples that I want to filter out (which is working):
AAAAAAAAAA
1AAAAAAAAA
1AAAAAAAA1
etc...

I don't have a better solution to offer than the one in Ωmega's answer, but I think I can explain what's happening. What it boils down to is that the first \b and the last \b are matching the same spot: right after the colon.
That's the first place where the lookahead can match, since it's followed by nine digits and a word boundary. Then the next part of the regex tries to match two digits (interspersed with any number of uppercase letters) followed by a word boundary, and fails. So it tries to match just one digit (ditto), and fails again. Then it tries matching zero digits (interspersed with zero letters), and it succeeds, without advancing the match position. That position is still a word boundary, so the final \b succeeds as well.
A word boundary is just another zero-width assertion, like lookaheads and lookbehinds. There's no reason why two or more can't be applied at the same spot; you did that on purpose with the first word boundary and the lookahead. Some regex flavors treat it as an error if you apply a quantifier to an assertion (like \b+), but I don't think any of them would catch this problem. This is one of those rare instances where separate start-of-word and end-of-word assertions, like GNU's \< and \> or TCL's \y and \Y, would make a difference.

You need to use anchors ^ and $ »
Pattern.compile("^(?=[A-Z\\d]{9,10}$)(?:[A-Z]*\\d){0,2}[A-Z]*$");
Use this pattern:
"(?:^|(?<=\\s))(?=[A-Z\\d]{9,10}(?:\\s|$))(?:[A-Z]*\\d){0,2}[A-Z]*(?=\\s|$)"

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java/Regex - finding a characters anywhere in a String - java

I'd use word boundaries \b: \b\d[mM]\d[fF]\b for java, backslashes are to be escaped: \\b\\d[mM]\\d[fF]\\b {1} is superfluous [m|M] means mor | or M

For the requirement of a digit, following by the letter m or M, followed by a digit, then followed by the letter f or F regex can be simplified to: Pattern distancePattern = Pattern.compile("(?i)\\dm\\df"); Where: (?i) - For ignore case \\d - For digits [0-9]

Related

Exclude a letter in Regex Pattern

Regex first character not matching

How to subString based on the special character?

Match word in String in Java

Wierd behaviour on regexp Matcher

Categories

Resources