An example of how the Strings may look:
TADE000177
TADE007,daFG
TADE0277 DFDFG
It's a little unclear what you want.
If you mean four capital letters from A to Z, followed by at least one digit in 0-9 you could try this:
"^[A-Z]{4}[0-9]+"
If instead of capital letters you want to allow any character except newline change [A-Z] to ..
If you want to also allow zero digits change the + to a *.
Exactly four characters followed by 1 or more digits: [A-Z]{4}\d+
Remember to escape the backslash if you put it in a string literal.
Breakdown:
[A-Z]…: An upper case letter, equivalent to \p{Upper}
To also include lower case letters, you could instead use [A-Za-z] or \p{Alpha}
…{4}… exactly 4 times
…\d…+ a digit
…+ 1 or more times
To allow 0 digits, you could change to *.
If i understood correctly what you're asking for you can try: .{4}\d*
^\w{4}.*$
Matches a string starting with 4 characters followed by any number of any other charcters.
Your examples include spaces and punctuation, if you know exactly which characters are allowed then you might want to use this pattern.
^\w{4}[A-z\d<other known characters go here>]*$
Remember to remove the < and > too :)
Related
I have the follow pattern to validate a string, it has to validate 4 letters, 6 numbers, 6 letters and 2 alphanumerics, but with my current pattern I cant get a valid test
Pattern.compile("[A-Za-z]{4}\\d{6}\\w{6}\\[A-ZÑa-zñ0-9\\- ]{2}");
I think my pattern it's wrong, because I'm not shure about this [A-ZÑa-zñ0-9\\- ]{2}
Can you please help me?
You can use pattern:
^[a-zA-Z]{4}[0-9]{6}[a-zA-Z]{6}[a-zA-Z0-9]{2}$
Check it live here.
In your expression you are using \w+, which does not only match digits and alphabetic characters, but also underscores _.
A few things off on your regex.
You have extra backslashes in your digit and word matching. Change from \\d to \d and \\w to \w.
The \\ is not needed.
Your end regex is invalid syntax. Just remove the "\\- " bit.
You can also slim down your initial part to be \w instead of [A-Za-z]. So, you're new regex should look like:
"\w{4}\d{6}\w{6}[A-ZÑa-zñ0-9]{2}"
That is if you're okay with the only non-ascii characters being Ñ and ñ in your last two alphanumerics.
I want to know if there is a way to check if a given string contains only combination of alphabets and numbers and nothing else.
for just alphanumeric i can use http://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringUtils.html
for just numbers i can use Interger.parse or some regular expression.
But is there some library which does the combined check. I googled but didnt came across anything. Everywhere it's done separately.
Alphanumeric means "Only letters and/or digits"
StringUtils.isAlphanumeric(String str) does what you want
someString.matches("[A-Za-z0-9]+")
[a-zA-Z0-9]+
This will check numbers and alphabets.
You can use a regular expression to match your String:
someString.matches("[a-zA-Z0-9]+");
This matches for at least one (+, if empty string is valid too, use * instead) character that can be either a digit from 0 to 9, or an uppercase or lowercase letter (no unicode letters, just A-Z).
I am trying to replace last word of a string if it is 2 characters long using regex. I used [a-zA-Z]{2}$ but it is finding last 2 characters of string. I don't want to replace the last word if it is not exactly 2 characters long, how can I do it?
You need to match a word boundary (\b) before the two letters:
\b[a-zA-Z]{2}$
This will match any two Latin letters that appear at the end of a string, as long as they are not preceded by a 'word' character (which is a Latin letter, digit, or underscore).
In case you want to replace the word even if it is preceded by a digit or underscore, you might want to use a lookbehind assertion, like this:
(?<![a-zA-Z])[a-zA-Z]{2}$
\\b\\w\\w\\b$ (regex in java flavor)
should work as well
Edit: in fact \\b\\w\\w$ should be enough. (or \b\w\w$ in non-java flavor.. see demo link)
You could also use:
[^\p{Alpha}]\p{Alpha}{2}$
Use Alnum instead if digits count as words. This does, however, fail if the entire string is only two characters long.
I tried using this pattern
^[A-z]*[A-z,-, ]*[A-z]*
To match against a string that starts with multiple alpha characters (a-z) followed by multiple hyphens or spaces and ends with alpha characters, eg:
Azasdas- - sa-as
But it does not work.
Try ^[A-Za-z][A-Za-z -]*[A-Za-z]$
^ indicates that the word should start with alphabets (A-Z or a-z) and then followed by any number of alphabets or hyphens. And then end with alphabets denoted by $ .
Also, you should not be using A-z because this will include unintended characters from ASCII range 91 to 96. See this table
Don't use ',' (comma)
^[A-z]*[A-z- ]*[A-z]*
You don't want the commas, in a character range you also need to specify [A-Za-z\- ] because the ASCII for A-Z and a-z aren't contiguous. You're missing some allowable spaces, and your last expression needs to account for the hypen.
You need something closer to this:
^([A-Za-z]*)-\s*([A-Za-z][A-Za-z -]*)([A-Za-z-]*)$
Depending on how you actually want to break things up. Without knowing the context behind the "chunks", it may or may not just be easier to split it apart on hyphens.
Edit
Actually, it's more like:
^([A-Za-z]*)([- ]*)([A-Za-z-]*)$
This is a word, followed by arbitrary spaces and hyphens, followed by a word that may contain a hyphen.
The currently accepted answer (^[A-Za-z][A-Za-z-]*[A-Za-z]$) will only match strings that are at least two characters long--for example, it will match the string "AB", but not just "A" or "B". Compare that to this regex:
^[A-Za-z]+([ -]+[A-Za-z]+)*$
By grouping the [ -]+ and the second [A-Za-z]+ together I'm saying, if there are any spaces and/or hyphens, they must be followed by more letters. The * quantifier on the group makes it optional, so "A" will match, while still meeting the requirement that the string start and end with a letter.
How can i get this pattern to work:
Pattern pattern = Pattern.compile("[\\p{P}\\p{Z}]");
Basically, this will split my String[] sentence by any kind of punctuation character (p{P} or any kind of whitespace (p{Z}). But i want to exclude the following case:
(?<![A-Za-z-])[A-Za-z]+(?:-[A-Za-z]+){1,}(?![A-Za-z-])
pattern explained here: Java regex patterns
which are the hyphened words like this: "aaa-bb", "aaa-bb-cc", "aaa-bb-c-dd". SO, i can i do that?
Unfortunately it seems like you can't merge both expressions, at least as far as I know.
However, maybe you can reformulate your problem.
If, for example, you want to split between words (which can contain hyphens), try this expression:
(?>[^\p{L}-]+|-[^\p{L}]+|^-|-$)
This should match any sequence of non-letter characters that are not a minus or any minus that is followed my a non-letter character or that is the first or last character in the input.
Using this expression for a split should result in this:
input="aaa-bb, aaa-bb-cc, aaa-bb-c-dd,no--match,--foo"
ouput={"aaa-bb","aaa-bb-cc","aaa-bb-c-dd","no","match","","foo"}
The regex might need some additional optimization but it is a start.
Edit: This expression should get rid of the empty string in the split:
(?>[^\p{L}-][^\p{L}]*|-[^\p{L}]+|^-|-$)
The first part would now read as "any non-character which is not a minus followed by any number of non-character characters" and should match .-- as well.
Edit: in case you want to match words that could potentially contain hyphens, try this expression:
(?>(?<=[^-\p{L}])|^)\p{L}+(?:-\p{L}+)*(?>(?=[^-\p{L}])|$)
This means "any sequence of letters (\p{L}+) followed by any number of sequences consisting of one minus and at least one more letters ((?:-\p{L}+)*+). That sequence must be preceeded by either the start or anything not a letter or minus ((?>(?<=[^-\p{L}])|^)) and be followed by anything that is not a letter or minus or the end of the input ((?>(?=[^-\p{L}])|$))".