How to check if substring is contained within word Java

How to check if substring is contained within word Java - java

I want to write a regex pattern that looks at a string to see if there is a "." followed by letters or numbers or both with no space in between.
Currently I have:
Pattern.matches(".*(\\W+|\\d+|[a-z]+)\\.[a-z]+", testStr)
But this doesn't work if there are numbers or symbols after the "." Can someone help me find a regex string that will return true for the string:
asdad-asdd/asdcs.pd(210)fsd
Just to reiterate the criteria for a successful match is the string contains any possible combination of letters, numbers, and/or symbols before and after a "."

You can match these strings by replacing [a-z]+ with [a-z\d\p{Punct}]+:
Pattern.matches(".*(\\W+|\\d+|[a-z]+)\\.[a-z\\d\\p{Punct}]+", testStr)
The [a-z\d\p{Punct}]+ pattern matches lowercase ASCII letters, digits or punctuation. Add A-Z into the brackets if you plan to allow uppercase ASCII letters. See the regex demo.
However, you might also match any non-whitespace chars with \S+:
Pattern.matches(".*(\\W+|\\d+|[a-z]+)\\.\\S+", testStr)
If you do not want to allow another dot:
Pattern.matches(".*(\\W+|\\d+|[a-z]+)\\.[^\\s.]+", testStr)
Here, [^\\s.]+ matches one or more chars other than whitespace and . chars.

Related

Regex to check if string contains only one uppercase

What is the regex to make sure that a given string contains exactly one uppercase letter? if the string contains more than one i dont want it to be matched.
just 1 Uppercase character
I know the patterns for individual sets namely [a-z], [A-Z], \d and _|[^\w] (I got them correct, didn't I?).
But how do I make them to match only with strings (in java) that only contains 1 uppercase?

You may use this regex with 2 negated character classes:
^[^A-Z]*[A-Z][^A-Z]*$
Above regex will support ASCII upper case letters only. If you want to match unicode letters then use:
^\P{Lu}*\p{Lu}\P{Lu}*$
RegEx Demo 2
RegEx Demo
Here:
\P{Lu}*: Match 0 or more non-uppercase unicode letters
\p{Lu}: Match an uppercase unicode letter

Regex expression to match on hyphens in words within sentence based on occurrences of hyphen

I am trying to match on hyphens in a word but only if the hyphen occurs in said word say more than once
So in the phrase "Step-By-Step" the hyphens would be matched whereas in the phrase "Coca-Cola", the hyphens would not be matched.
In a full sentence combining phrases "Step-By-Step and Coca-Cola" only the hyphens within "Step-By-Step" would be expected to match.
I have the following expression currently, but this is matching all hyphens separated by non-digit characters regardless of occurences
((?=\D)-(?<=\D))
I can't seem to get the quantifiers to work with this expression, any ideas?

Java Regex Solution:
(?<=-[^\s-]{0,999})-|-(?=[^\s-]*-)
Java RegEx Demo
PCRE Regex Solution:
Here is a way to match all hyphens in a line with more than one hyphen in PCRE:
(?:(?:^|\s)(?=(?:[^\s-]*-){2})|(?!^)\G)[^\s-]*\K-
RegEx Demo
Explanation:
[^\s-]* matches a character that is not a whitespace and not a hyphen
(?=(?:[^\s-]*-){2}) is lookahead to make sure there are at least 2 hyphens in a non-whitespace substring
\G asserts position at the end of the previous match or the start of the string for the first match
\K resets match info

This matches at least two words each followed by hyphen, followed by another word (I'm assuming you don't want to allow hyphen at the very beginning or end, only between words).
(\w+-){2,}\w+

Regex not matching against ampersand

I'm trying to match the following regex:
\b(?:mr|mrs|ms|miss|messrs|mmes|dr|prof|rev|sr|jr|&|and)\.?\b
In other words, a word boundary followed by any of the strings above (optionally followed by a period character) followed by a word boundary.
I'm trying to match this in Java, but the ampersand will not match. For example:
Pattern p = Pattern.compile(
"\\b(?:mr|mrs|ms|miss|messrs|mmes|dr|prof|rev|sr|jr|&|and)\\.?\\b",
Pattern.CASE_INSENSITIVE);
String result = p.matcher("mr one and mrs.two and three & four").replaceAll(" ");
System.out.println("["+result+"]");
The output of this is: [ one two three & four]
I've also tried this at regex101, and again the ampersand does not match: https://regex101.com/r/klkmwl/1
Escaping the ampersand does not make a difference, and I've tried using the hex escape sequence \x26 instead of ampersand (as suggested in this question). Why is this not matching?

Your regex will match an ampersand if it is located in between word chars, e.g. three&four, see this regex demo. This happens because \b before a non-word char requires a word char to appear immediately before it. Also, as there is a \b after an optional dot, both the dot and ampersand will only match if there is a word char immediately on the left.
You need to re-write the pattern so that the word boundaries are applied to the words rather than symbols:
Pattern p = Pattern.compile(
"(?:\\b(?:mr|mrs|ms|miss|messrs|mmes|dr|prof|rev|sr|jr|and)\\b|&)\\.?",
Pattern.CASE_INSENSITIVE);
See the regex demo online.

Problem is due to use of word boundaries. There are no word boundaries before or after a non-word character like &.
In place of word boundary you can use lookarounds:
(?<!\w)(?:[jsdm]r|mr?s|miss|messrs|mmes|prof|re|&|and)\.?(?!\w)
Updated RegEx Demo
(?<!\w): Make sure that previous character is not a word character
(?!\w): Make sure that next character is not a word character
Note some tweaks in your regex to make it shorter.

RegEx for combining multiple sequences

As many people ,i am struggling with what it seems a "trivial" regex issue.
in a given text, whenever I encounter a word within {} brackets i need to extract it.At first i used
"\\{-?(\\w{3,})\\}"
and it worked ok:
as long as the word didnt have any white space or special character like ' .
For example {Project} returns Project.But {Project Test} or {Project D'arce} don't return anything.
i know that for white characters i need to use \s.But it is absolutely not clear for me how to add to the above , i tried :
"%\\{-?(\\w(\\s{3,})\\)\\}"))
but not working.Also what if i want to add words containing a special characters like ' ??? Its really frustrating

How about matching any character inside {..} which is not }?
To do so you can use negated character class [^..] like [^}]. So your regex can look like
"\\{[^}]{3,}\\}"
But if you want to limit your regex only to some specific alphabet you can also use character class to combine many characters and even predefined shorthand character classes like \w \s \d and so on.
So if you want to accept any word character \w or whitespace \s or ' your regex can look like
"\\{[\\w\\s']{3,}\\}"

You could use a character class [\w\s']and add to it what you could allow to match:
\{-?([\w\s']{3,})}
In Java
String regex = "\\{-?([\\w\\s']{3,})}";
Regex demo
If you want to prevent matching only 3 whitespace chars, you could use a repeating group:
\{-?\h*([\w']{3,}(?:\h+[\w']+)*)\h*}
About the pattern
\{ Match { char
-? Optional hyphen
\h* Match 0+ times a horizontal whitespace char
([\w\s']{3,}) Capture in a group matching 3 or more times either a word char, whitespace char or '
(?:\h[\w']+)* Repeat 0+ times matching 1+ horizontal whitespace chars followed by what is listed in the character class
\h* Match 0+ times a horizontal whitespace char
} Match }
In Java
String regex = "\\{-?\\h*([\\w']{3,}(?:\\h+[\\w']+)*)\\h*}";
Regex demo

regular expression to validate 2 alphanumerics

I have the follow pattern to validate a string, it has to validate 4 letters, 6 numbers, 6 letters and 2 alphanumerics, but with my current pattern I cant get a valid test
Pattern.compile("[A-Za-z]{4}\\d{6}\\w{6}\\[A-ZÑa-zñ0-9\\- ]{2}");
I think my pattern it's wrong, because I'm not shure about this [A-ZÑa-zñ0-9\\- ]{2}
Can you please help me?

You can use pattern:
^[a-zA-Z]{4}[0-9]{6}[a-zA-Z]{6}[a-zA-Z0-9]{2}$
Check it live here.
In your expression you are using \w+, which does not only match digits and alphabetic characters, but also underscores _.

A few things off on your regex.
You have extra backslashes in your digit and word matching. Change from \\d to \d and \\w to \w.
The \\ is not needed.
Your end regex is invalid syntax. Just remove the "\\- " bit.
You can also slim down your initial part to be \w instead of [A-Za-z]. So, you're new regex should look like:
"\w{4}\d{6}\w{6}[A-ZÑa-zñ0-9]{2}"
That is if you're okay with the only non-ascii characters being Ñ and ñ in your last two alphanumerics.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to check if substring is contained within word Java - java

Related

Regex to check if string contains only one uppercase

Regex expression to match on hyphens in words within sentence based on occurrences of hyphen

Regex not matching against ampersand

RegEx for combining multiple sequences

regular expression to validate 2 alphanumerics

Categories

Resources