How to build a regular expression for a string?

How to build a regular expression for a string? - java

How can i write this as a regular expression?
"blocka#123#456"
i have used # symbol to split the parameters in the data
and the parameters are block name,startX coordinate,start Y corrdinate
this is the data embedded in my QR code.so when i scan the QR i want to check if its the right QR they're scanning. For that i need a regular expression for the above syntax.
my method body
public void Store_QR(String qr){
if( qr.matches(regular Expression here)) {
CurrentLocation = qr;
}
else // Break the operation
}

The Information you specified does not justice using a regular expression at all.
Try to from it in a more general way.
If you really need to scan for "blocka#123#456" then use qr.contains("blocka#123#456");

It depends on what you want to match.
Here are some regex propositions:
^blocka#[0-9]{3}#[0-9]{3}$
^blocka#[0-9]+#[0-9]+$
^blocka(#[0-9]{3}){2}$
^blocka(#[0-9]+){2}$
^blocka(#[0-9]{3})+$
^blocka(#[0-9]+)+$
Otherwise, just use contains() or similar.

myregexp.com is nice to do some testing.
Official Java Regex Tutorial is quite ok to learn and includes most things one needs to know.
The Pattern documentation also includes fancy predefined character classes that are missing in above tutorial.
You did not specify anything that has to be regular in that example you gave. Regular expressions make only sense if there are rules to validate the input.
If it has to be exactly "blocka#123#456" then "blocka#123#456" or "^blocka#123#456$" will work as regex. Stuff between ^ and $ means that the regex inside must span from begin to end of the input. Sometimes required and usually a good idea to put that around your regex.
If blocka is dynamic replace it with [a-z]+ to match any sequence of lowercase letters a through z with length of at least 1. block[a-z] would match blocka, blockb, etc.
And [a-z]{6} would match any sequence of exactly 6 letters. [a-zA-Z] also includes uppercase letters and \p{L} matches any letter including unicode stuff (e.g. Blüc本).
# matches #. Like any character without special regex meaning ( \ ^ $ . | ? * + ( ) [ ] { } ) characters match themselves. [^#] matches every character but #.
Regarding the numbers: [0-9]+ or \d+ is a generic pattern for several numbers, [0-9]{1,4} would match anything consisting out of 1-4 numbers like 007, 5, 9999. (?:0|[1-9][0-9]{0,3}) for example will only match numbers between 0 and 9999 and does not allow leading zeros. (?:STUFF) is a non-capturing group that does not affect the groups you can extract via Matcher#group(1..?). Useful for logical grouping with |. The meaning of (?:0|[1-9][0-9]{0,3}) is: either a single 0 OR ( 1x 1-9 followed by 0 to 3 x 0-9).
[0-9] is so common that there is a predefinition for it : \d (digit). It's \\d inside the regex String since you have to escape the \.
So some of your options are
".*" which matches absolutely everything
"^[^#]+(?:#[^#]+)+$" which matches anything separated by # like "hello #world!1# -12.f #本#foo#bar"
"^blocka(#\\d+)+$" which matches blocka followed by at least one group of numbers separated by # e.g. blocka#1#12#0007#949432149#3
"^blocka#(?:[0-9]|[1-9][0-9]|[1-3][0-9]{2})#[4-9][0-9]{2}$" which will match only if it finds blocka# followed by numbers 0 - 399, followed by a # and finally numbers 400-999
"^blocka#123#456$" which matches only exactly that string.
All that are regular expressions that match the example you gave.
But it's probably as simple as
public void Store_QR(String qr){
if( qr.matches("^blocka#\\d+#\\d+$")) {
CurrentLocation = qr;
}
else // Break the operation
}
or
private static final Pattern QR_PATTERN = Pattern.compile("^blocka#(\\d+)#(\\d+)$");
public void Store_QR(String qr){
Matcher matcher = QR_PATTERN.matcher(qr);
if(matcher.matches()) {
int number1 = Integer.valueOf(matcher.group(1));
int number2 = Integer.valueOf(matcher.group(2));
CurrentLocation = qr;
}
else // Break the operation
}
BlockName#start_X#start_Y any block name.. starting with the string"block" and followed by two integers
I guess a good regex for that would be "^block\\w+#\\d+#\\d+$", starting with "block", then any combination of a-z, A-Z, 0-9 and _ (thats the \w) followed by #, numbers, #, numbers.
Would match block_#0#0, blockZ#9#9, block_a_Unicorn666#0000#1234, but not block#1#2 because there is no name at all and would not match blockName#123#abc because letters instead of number. Would also not match Block_a#123#456 because of the uppercase B.
If the name part (\\w+) is too liberal (___, _123 would be a legal names) use e.g. "^block_?[a-zA-Z]+#\\d+#\\d+$", what won't allow numbers and names may only be separated by a single optional _ and there have to be letters after that. Would allow _a, a, _ABc, but not _, _a_b, _a9. If you want to allow numbers in names [a-zA-Z0-9] would be the character class to use.

I suggest:
[a-z]+#\d+#\d+
And if you want capture the 3 parts:
([a-z]+)#(\d+)#(\d+)
Matcher.group( 1, 2 or 3 ) returns the parts

Related

How to match a string in this way?

I need to check if a String matches this specific pattern.
The pattern is:
(Numbers)(all characters allowed)(numbers)
and the numbers may have a comma ("." or ",")!
For instance the input could be 500+400 or 400,021+213.443.
I tried Pattern.matches("[0-9],?.?+[0-9],?.?+", theequation2), but it didn't work!
I know that I have to use the method Pattern.match(regex, String), but I am not being able to find the correct regex.

Dealing with numbers can be difficult. This approach will deal with your examples, but check carefully. I also didn't do "all characters" in the middle grouping, as "all" would include numbers, so instead I assumed that finding the next non-number would be appropriate.
This Java regex handles the requirements:
"((-?)[\\d,.]+)([^\\d-]+)((-?)[\\d,.]+)"
However, there is a potential issue in the above. Consider the following:
300 - -200. The foregoing won't match that case.
Now, based upon the examples, I think the point is that one should have a valid operator. The number of math operations is likely limited, so I would whitelist the operators in the middle. Thus, something like:
"((-?)[\\d,.]+)([\\s]*[*/+-]+[\\s]*)((-?)[\\d,.]+)"
Would, I think, be more appropriate. The [*/+-] can be expanded for the power operator ^ or whatever. Now, if one is going to start adding words (such as mod) in the equation, then the expression will need to be modified.
You can see this regular expression here

In your regex you have to escape the dot \. to match it literally and escape the \+ or else it would make the ? a possessive quantifier. To match 1+ digits you have to use a quantifier [0-9]+
For your example data, you could match 1+ digits followed by an optional part which matches either a dot or a comma at the start and at the end. If you want to match 1 time any character you could use a dot.
Instead of using a dot, you could also use for example a character class [-+*] to list some operators or list what you would allow to match. If this should be the only match, you could use anchors to assert the start ^ and the end $ of the string.
\d+(?:[.,]\d+)?.\d+(?:[.,]\d+)?
In Java:
String regex = "\\d+(?:[.,]\\d+)?.\\d+(?:[.,]\\d+)?";
Regex demo
That would match:
\d+(?:[.,]\d+)? 1+ digits followed by an optional part that matches . or , followed by 1+ digits
. Match any character (Use .+) to repeat 1+ times
Same as the first pattern

Regular expression special characters not working at the starting of the string in java

After trying other variations, I use this regular expression in Java to validate a password:
PatternCompiler compiler = new Perl5Compiler();
PatternMatcher matcher = new Perl5Matcher();
pattern = compiler.compile("^(?=.*?[a-zA-Z])(?![\\\\\\\\_\-])(?=.*?[0-9])([A-Za-z0-9-/-~]
[^\\\\\\\\_\-]*)$");
But it still doesn't match my test cases as expected:
Apr#2017 match
$$Apr#2017 no match, but it should match
!!Apr#2017 no match, but it should match
!#ap#2017 no match, but it should match
-Apr#2017 it should not match
_Apr#2017 it should not match
\Apr#2017 it should not match
Except three special characters - _ \ remaining, all should match at the start of the string.
Rules:
It should accept all special characters any number of times except above three symbols.
It must and should contain one number and Capital letter at any place in the string.

You have two rules, why not create more than one regular expression?
It should accept all special characters any number of times except above three symbols.
For this one, make sure it does not match [-\\_] (note that the - is the first character in the character class or it will be interpreted as a range.
It must and should contain one number and Capital letter at any place in the string.
For this one, make sure it matches [A-Z] and [0-9]
To make it easy to modify and extend, do some abstraction:
class PasswordRule
{
private Pattern pattern;
// If true, string must match, if false string must not match
private boolean shouldMatch;
PasswordRule(String patternString, boolean shouldMatch)
{
this.shouldMatch = shouldMatch;
this.pattern = compiler.compile(patternString);
}
boolean match(String passwordString)
{
return pattern.matches(passwordString) == shouldMatch;
}
}
I don't know or care if I have the API to Perl5 matching correct in the above, but you should get the idea. Then your rules go in an array
PasswordRule rules[] =
{
PasswordRule("[-\\\\_]", false),
PasswordRule("[A-Z]", true),
PasswordRule("[0-9]", true)
};
boolean passwordIsOk(String password)
{
for (PasswordRule rule : rules)
{
if (!rule.match(password)
{
return false;
}
}
return true;
}
Using the above, your rules are far more flexible and modifiable than one monstrous regular expression.

Here's an alternative solution - reverse the condition. This regex
^(?:[^0-9]*|[^A-Z]*|[_\\-].*)$
matches non conforming passwords. This makes it much simpler to understand.
It matches either
a string free from digits
a string free from capital letters
a string containing either of _, \ or -
See it illustrated here at regex101.
There are some unclear issues remaining in your question though, so it may have to be adjusted. (The restriction in starting character I mentioned as a comment)

You seem to need
"^(?=[^a-zA-Z]*[a-zA-Z])(?=[^0-9]*[0-9])[^\\\\_-]*$"
See the regex demo
^ - start of string
(?=[^a-zA-Z]*[a-zA-Z]) - a positive lookahead that requires at least 1 ASCII letter ([a-zA-Z]) to appear after 0+ chars other than letters ([^a-zA-Z]*)
(?=[^0-9]*[0-9])- at least 1 ASCII digit (same principle of contrast as above is used here)
[^\\\\_-]* - 0+ chars other than \ (inside a Java string literal, \ should be doubled to denote 1 literal backslash, and to match a single backslash with a regex, we need double literal backslash), _, -
$ - end of string (\\z might be better though as it matches at the very end of the string).

Regex to ignore variable initializations that have already been declared

I need a regex that would ignore a token if a part of the token has already been captured before.
Example
var bold, det, bold=6, sum, k
Here, bold=6 should be ignored because bold has already been captured.
Also, var must be present before any matching can take place, the last token k should not be followed by a comma. Only the variables within var and the last token k should be followed by a comma.
Another Example
var bold=6, det, bold, sum, k
Here, bold which follows det should be ignored because bold=6 has already been captured.
i tried using this pattern (?:\\bvar\\b|\\G)\\s*(\\w+)(?:,|$), but it doesn't ignore what has been repeated.

Depends what information you need to get you can try with:
Solution working only in Java, will give you variable name and start
and end idices:
(?<=var.{0,999})(?<!=)(?!var\b)\b(?<var>\w+)\b(?<!var.{1,999}(?=\k<var>).{1,999}(?=\k<var>).{1,999})
RegexPlanet Demo
it uses ugly, but quite effective feature of Java regex: intervals
(x{min,max}) in lookbehind. As long as you use interval with
minimal and maximal length, you can use it in Java regex. So instead
of .* you can use for example .{0,999}. It will fail if there
need to be more char than 999, you can use bigger number, but I
think it is not necessary in this cese. Named group <var> is optional here, you can replece it in code with normal group.
Implementation in Java:
public class Test{
public static void main(String[] args){
String test = "var bold, det, bold=6, sum, k\n" +
"var foo=6, abc, foo, xyz, k";
Matcher matcher = Pattern.compile("(?<=var.{0,999})(?<!=)(?!var)\\b(?<var>\\w+)\\b(?<!var.{1,999}(?=\\k<var>).{1,999}(?=\\k<var>).{1,999})").matcher(test);
while(matcher.find()){
System.out.println(matcher.group("var") + "," + matcher.start("var") + "," + matcher.end("var"));
}
}
}
with output (variable name, start index, end index):
bold,4,8
det,10,13
sum,23,26
k,28,29
foo,34,37
abc,41,44
xyz,51,54
k,56,57
Explanation of regex:
(?<=var.{0,999}) - must be preceded by text var followed by any
number of characters, but not new line,
(?<!=) - should not be preceded by equal sign, to avoid matching variable name and value as different matches,
(?!var\b) - cannot be followed by var word, to avoid matching this word,
\b(?<var>\w+)\b - separate word, captured into <var> group,
(?<!var.{1,999}(?=\k<var>).{1,999}(?=\k<var>).{1,999}) - the matched word cannot by preceded by var word followed by some chars, including captured word, followed by some chars, inclusing captured word again,
But as I wrote, it will work only in Java.
If you need just variable names, you can use:
(?<=var\s|\G,\s)(?<var>\w+)(?=,|$)|(?<=var\s|\G,\s)(?<initialized>[^,\n]+)
DEMO
to get variable names without duplications. But if you want
start/end indices, it will capture into group second occurence of
duplicated variable name.

You can tweak your regex to this with a negative lookahead:
(?:\bvar\b|\G)\s*(?:(\w+)(?!.*\b\1\b)(?:=\w+)?|\S+)(?:,|\bk\b)
RegEx Demo
Rather than keeping track of what it has already matched it will skip matching a word if it is followed in rest of the string.
Here (?!.*\b\1\b) is a negative lookahead that will avoid matching a word if same word is found on RHS of input. \1 is back-reference of matched word.
RegEx Breakup:
(?:\bvar\b|\G) # match text var or \G
\s* # match 0 more spaces
(?: # start non-capturing group
(\w+)(?!.*\b\1\b) # match a word if same word is found in rest of the input
(?:=\w+)? # followed by optional = and some value
| # regex alternation
\S+ # OR match 1 or more non-space character
) # close non-capturing group
(?:,|\bk\b) # match a comma or k

validate string in java

I have a string with data separated by commas like this:
$d4kjvdf,78953626,10.0,103007,0,132103.8945F,
I tried the following regex but it doesn't match the strings I want:
[a-zA-Z0-9]+\\,[a-zA-Z0-9]+\\,[a-zA-Z0-9]+\\,[a-zA-Z0-9]+\\,[a-zA-Z0-9]+\\,[a-zA-Z0-9]+\\,

The $ at the beginning of your data string is not matching the regex. Change the first character class to [$a-zA-Z0-9]. And a couple of the comma separated values contain a literal dot. [$.a-zA-Z0-9] would cover both cases. Also, it's probably a good idea to anchor the regex at the start and end by adding ^ and $ to the beginning and end of the regex respectively. How about this for the full regex:
^[$.a-zA-Z0-9]+\\,[$.a-zA-Z0-9]+\\,[$.a-zA-Z0-9]+\\,[$.a-zA-Z0-9]+\\,[$.a-zA-Z0-9]+\\,[$.a-zA-Z0-9]+\\,$
Update:
You said number of commas is your primary matching criteria. If there should be 6 commas, this would work:
^([^,]+,){6}$
That means: match at least 1 character that is anything but a comma, followed by a comma. And perform the aforementioned match 6 times consecutively. Note: your data must end with a trailing comma as is consistent with your sample data.

Well your regular expression is certainly jarbled - there are clearly characters (like $ and .) that your expression won't match, and you don't need to \\ escape ,s. Lets first describe our requirements, you seem to be saying a valid string is defined as:
A string consisting of 6 commas, with one or more characters before each one
We can represent that with the following pattern:
(?:[^,]+,){6}
This says match one or more non-commas, followed by a comma - [^,]+, - six times - {6}. The (?:...) notation is a non-capturing group, which lets us say match the whole sub-expression six times, without it, the {6} would only apply to the preceding character.
Alternately, we could use normal, capturing groups to let us select each individual section of the matching string:
([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),?
Now we can not only match the string, but extract its contents at the same time, e.g.:
String str = "$d4kjvdf,78953626,10.0,103007,0,132103.8945F,";
Pattern regex = Pattern.compile(
"([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),?");
Matcher m = regex.matcher(str);
if(m.matches()) {
for (int i = 1; i <= m.groupCount(); i++) {
System.out.println(m.group(i));
}
}
This prints:
$d4kjvdf
78953626
10.0
103007
0
132103.8945F

How to make a regular expression that matches tokens with delimiters and separators?

I want to be able to write a regular expression in java that will ensure the following pattern is matched.
<D-05-hello-87->
For the letter D, this can either my 'D' or 'E' in capital letters and only either of these letters once.
The two numbers you see must always be a 2 digit decimal number, not 1 or 3 numbers.
The string must start and end with '<' and '>' and contain '-' to seperate parts within.
The message in the middle 'hello' can be any character but must not be more than 99 characters in length. It can contain white spaces.
Also this pattern will be repeated, so the expression needs to recognise the different individual patterns within a logn string of these pattersn and ensure they follow this pattern structure. E.g
So far I have tried this:
([<](D|E)[-]([0-9]{2})[-](.*)[-]([0-9]{2})[>]\z)+
But the problem is (.*) which sees anything after it as part of any character match and ignores the rest of the pattern.
How might this be done? (Using Java reg ex syntax)

Try making it non-greedy or negation:
(<([DE])-([0-9]{2})-(.*?)-([0-9]{2})>)
Live Demo: http://ideone.com/nOi9V3

Update: tested and working
<([DE])-(\d{2})-(.{1,99}?)-(\d{2})>
See it working: http://rubular.com/r/6Ozf0SR8Cd
You should not wrap -, < and > in [ ]

Assuming that you want to stop at the first dash, you could use [^-]* instead of .*. This will match all non-dash characters.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to build a regular expression for a string? - java

The Information you specified does not justice using a regular expression at all. Try to from it in a more general way. If you really need to scan for "blocka#123#456" then use qr.contains("blocka#123#456");

It depends on what you want to match. Here are some regex propositions: ^blocka#[0-9]{3}#[0-9]{3}$ ^blocka#[0-9]+#[0-9]+$ ^blocka(#[0-9]{3}){2}$ ^blocka(#[0-9]+){2}$ ^blocka(#[0-9]{3})+$ ^blocka(#[0-9]+)+$ Otherwise, just use contains() or similar.

I suggest: [a-z]+#\d+#\d+ And if you want capture the 3 parts: ([a-z]+)#(\d+)#(\d+) Matcher.group( 1, 2 or 3 ) returns the parts

Related

How to match a string in this way?

Regular expression special characters not working at the starting of the string in java

Regex to ignore variable initializations that have already been declared

validate string in java

How to make a regular expression that matches tokens with delimiters and separators?

Categories

Resources