Regex matching capital characters, numbers and period - java

I'm trying to see if a input only contains capital letters, numbers and a period in regex. What would the regex pattern be for this in Java?
Is there any guides on how I can build this regex, even some online tools?
Also is it possible to check length of string is no more than 50 using regex?

This is the Unicode answer:
^[\p{Lu}\p{Nd}.]{0,50}$
From regular-expressions.info
\p{Lu} or \p{Uppercase_Letter}: an uppercase letter that has a lowercase variant.
\p{Nd} or \p{Decimal_Digit_Number}: a digit zero through nine in any script except ideographic scripts.
^ and $ is the start and the end of the string

Regex pattern:
Pattern.compile("^[A-Z\\d.]*$")
To check the length of a string:
Pattern.compile("^.{0,50}$")
Both combined:
Pattern.compile("^[A-Z\\d.]{0,50}$")
Although I wouldn't use regular expressions to check for length if I were you, just call .length() on the string.

This website is really handy for building and testing and regular expressions

Regular expressions in Java have a lot in common with other languages when it comes to the simple syntax, with some predefined character classes that add more than you'd find in Perl for example. The Java API docs on Pattern show the various patterns that are supported. A friendlier introduction to regexes in Java is http://www.regular-expressions.info/java.html.
Some very quick Googling shows there are many tools online for testing Java regular expressions against input strings. Here is one.
To check for the type of input you are interested in, the following regex should work:
^[A-Z0-9.]{,50}$
Broken down, this is saying:
^: start matching from the start of the input; do not allow the first character(s) to be skipped
[]: match one of the characters in this range
A-Z: within a range, - means to accept all values between the first and last character inclusive, so in this case all characters from A to Z.
0-9: add to the previous range all digits
.: periods are special in regexes, but all special characters become simple again within a character class ([])
{,50}: require (or 0) matches up to 50 of the character class just defined.
$: the match must reach the end of the input; do not allow the last character(s) to be skipped

This returns true for strings, containing only 50 characters that can be numbers, capital letters or a dot.
string.matches("[0-9A-Z\\.]{0,50}")

In response to what tools you can use, I prefer Regex Coach

Related

How to match a string in this way?

I need to check if a String matches this specific pattern.
The pattern is:
(Numbers)(all characters allowed)(numbers)
and the numbers may have a comma ("." or ",")!
For instance the input could be 500+400 or 400,021+213.443.
I tried Pattern.matches("[0-9],?.?+[0-9],?.?+", theequation2), but it didn't work!
I know that I have to use the method Pattern.match(regex, String), but I am not being able to find the correct regex.
Dealing with numbers can be difficult. This approach will deal with your examples, but check carefully. I also didn't do "all characters" in the middle grouping, as "all" would include numbers, so instead I assumed that finding the next non-number would be appropriate.
This Java regex handles the requirements:
"((-?)[\\d,.]+)([^\\d-]+)((-?)[\\d,.]+)"
However, there is a potential issue in the above. Consider the following:
300 - -200. The foregoing won't match that case.
Now, based upon the examples, I think the point is that one should have a valid operator. The number of math operations is likely limited, so I would whitelist the operators in the middle. Thus, something like:
"((-?)[\\d,.]+)([\\s]*[*/+-]+[\\s]*)((-?)[\\d,.]+)"
Would, I think, be more appropriate. The [*/+-] can be expanded for the power operator ^ or whatever. Now, if one is going to start adding words (such as mod) in the equation, then the expression will need to be modified.
You can see this regular expression here
In your regex you have to escape the dot \. to match it literally and escape the \+ or else it would make the ? a possessive quantifier. To match 1+ digits you have to use a quantifier [0-9]+
For your example data, you could match 1+ digits followed by an optional part which matches either a dot or a comma at the start and at the end. If you want to match 1 time any character you could use a dot.
Instead of using a dot, you could also use for example a character class [-+*] to list some operators or list what you would allow to match. If this should be the only match, you could use anchors to assert the start ^ and the end $ of the string.
\d+(?:[.,]\d+)?.\d+(?:[.,]\d+)?
In Java:
String regex = "\\d+(?:[.,]\\d+)?.\\d+(?:[.,]\\d+)?";
Regex demo
That would match:
\d+(?:[.,]\d+)? 1+ digits followed by an optional part that matches . or , followed by 1+ digits
. Match any character (Use .+) to repeat 1+ times
Same as the first pattern

Regex - Disallow certain characters to appear consecutively

I'm not sure if this is possible or not:
Writing a program to convert an infix notation to postfix notation. All is working well so far but trying to implement validation is proving difficult.
I'm trying to use a regex to validate an infix notation, conforming to the following rules:
String must only start with a number or ( (program does not allow negative numbers)
String must only end with number or )
String must only contain 0-9*/()-+
String must not allow following characters to appear together +*/-
I have a regex which conforms to the first 3 rules:
(^[0-9(])([0-9+()*]+)([0-9)]+$)
Is it possible to use regex to implement the last rule?
I will answer only to fourth rule as you have problem only with it.
Yes, there is a possibility, but I think regex is not appropriate tool to check that...
This pattern ^(?(?=.*\+)(?!.*[\*\/-])).+$ will match any string that contain + and not contain other characters: /,*,-. For one character is already lengthy and hard to read. See demo.
It uses conditional expression (?...) to check if lookahead checking for + was successfull, if it is, then negative lookahead assures that you won't have any of \*- characters.
For all characters, the regex will become very big and hard to maintain.
That's why I don't recommend it for this task.
I agree with MichaƂ Turczyn that regex is not the task for this, but not with the reason. It is easy to implement your restrictions. However, your restrictions also allow expressions like (0+3, 2(*4), ((((1, and other things you likely don't want - so regex validation is kind of pointless. If you were writing this with a regex engine with some significant power like PCRE (Perl, PHP) or Onigmo (Ruby), you can fake a parser in regex; but in Java, regex is quite restricted in what it can do. It is enough for the requirements in the question, though:
^[0-9(](?:(?![+*/-][+*/-])[0-9+*/()-])*[0-9)]$
starts with digit or paren
any number of repetitions of any allowed character, such that that character and the next character aren't both operators
ends with digit or thesis.

Java regex lookbehind issue with quantifiers

I'm using a Java regex pattern in an application that only allows access to the whole match value (that is, I cannot use capturing groups).
I am trying to extract values from my sample text:
C02 SURVEY : 2010 F10446P BONAPARTE 2D
In the above example I need to check for the keyword SURVEY and have to extract value after that :. And I wanted my output to be:
2010 F10446P BONAPARTE 2D
I used the pattern (?<=(?i)survey\s{2}[:])(?:(?![\n]).)*
In this pattern, I have hardcoded the spaces to be 2 (\s{2}) which may vary and not constant value.
I need to use quantifiers with lookbehind operation.
If any other option is there please let me know.
You may leverage a feature in a Java regex engine that is called "constrained width lookbehind":
Java accepts quantifiers within lookbehind, as long as the length of the matching strings falls within a pre-determined range. For instance, (?<=cats?) is valid because it can only match strings of three or four characters. Likewise, (?<=A{1,10}) is valid.
That means, you may replace the {2} limiting quantifier with a limiting quantifier with both minimum and maximum values, e.g. {0,100} to allow zero to a hundred whitespace symbols. Adjust them as you see fit.
Besides, you needn't use a tempered greedy token (?:(?![\n]).)* as the dot in Java regex does not match a newline. Just replace it with .* to match any zero or more chars other than newline. So, your pattern might look as simple as (?i)(?<=survey\s{0,100}:).*.

Constructing a specific regex

I want to make a regular expression in Java, with the next criteria:
Length: 10 characters exactly. Not more, not less.
Can accept any character between A-Z (only uppercase letters) and between digits 0-9.
Can accept only one dash character '-' in any position. It cannot accept any other characters, strictly only one dash.
EXAMPLES:
ABCD-12345
F-01234GHK
09-PL89GG5
LJ8U9N3-Y2
PLN86D4V-1
I have been making tries with regex of my own invention, some regular expressions that are close to the result I want, but with no success.
Do I have to combine two regular expressions?
Please, help me to get rid of this issue.... and thanks in advance!!!
I think you need lookahead (which is a way of combining two regular expressions, sort of).
^(?!.*-.*-)[A-Z0-9-]{10}$
The second part will match 10 characters that are A-Z, 0-9, or dash; the first part is negative lookahead that will reject a pattern that has two dashes in it.
You can use this:
^(?![^-]*+-[^-]*+-)[A-Z0-9-]{10}$
Note: If you use the matches method you can remove anchors.

Java Regular Expression for number of exactly 5 digits anywhere in the string

I'm trying to create a regular expression to parse a 5 digit number out of a string no matter where it is but I can't seem to figure out how to get the beginning and end cases.
I've used the pattern as follows \\d{5} but this will grab a subset of a larger number...however when I try to do something like \\D\\d{5}\\D it doesn't work for the end cases. I would appreciate any help here! Thanks!
For a few examples (55555 is what should be extracted):
At the beginning of the string
"55555blahblahblah123456677788"
In the middle of the string
"2345blahblah:55555blahblah"
At the end of the string
"1234567890blahblahblah55555"
Since you are using a language that supports them use negative lookarounds:
"(?<!\\d)\\d{5}(?!\\d)"
These will assert that your \\d{5} is neither preceded nor followed by a digit. Whether that is due to the edge of the string or a non-digit character does not matter.
Note that these assertions themselves are zero-width matches. So those characters will not actually be included in the match. That is why they are called lookbehind and lookahead. They just check what is there, without actually making it part of the match. This is another disadvantage of using \\D, which would include the non-digit character in your match (or require you to use capturing groups).

Categories