I've found lot of variations on this subject on both SO and web, but most (if not all) ask for at least one letter and one digit. I need to have at least one letter.
I've tried but I haven't make it right, what I need is that String contain only letters, letters + numbers (any order), dashes and spaces are allowed but not at the beginning or the end of the string. Here is how it looks like right now:
protected static final String PATTERN = "[\u00C0-\u017Fa-zA-Z0-9']+([- ][\u00C0-\u017Fa-zA-Z0-9']+)*";
public static void main(String[] args) {
String name;
//name = "Street"; // allowed
//name = "Some-Street"; // allowed
//name = "Street "; // not allowed
//name = " Street"; // not allowed
//name = "Street-"; // not allowed
//name = "-Street"; // not allowed
//name = "Street"; // allowed
//name = "1 st street"; // allowed
//name = "street 5"; // allowed
name = "111"; // NOT allowed
if (!Pattern.matches(PATTERN, name)) {
System.out.println("ERROR!");
} else System.out.println("OK!");
}
}
How do I add check if there is at least one character?
No matter if it is at the beginning or end, or if there is space or dash between it and numbers. There just have to be at least one character.
You can use this regex for your problem:
^(?=.*\pL)[\pL\pN]+(?:[ -]+[\pL\pN]+)*$
RegEx Demo
For Java use:
final String regex = "^(?=.*\\pL)[\\pL\\pN]+(?:[ -]+[\\pL\\pN]+)*$";
RegEx Breakup:
^: Start
(?=.*\pL): Using a lookahead make sure we have at least one unicode letter somewhere
[\pL\pN]+: Match one or more unicode letter or unicode digit
(?:: Non-capturing group start
[ -]+: Match one or more space or hyphen
[\pL\pN]+: Match one or more unicode letter or unicode digit
)*: Non-capturing group end. * means zero or more of this group.
$: End
If I understand correctly, and according to what you've presented, you have the following conditions:
At least 1 letter
Can contain digits (but only if the previous condition is met)
Dashes and spaces are allowed only if they are not at the beginning or end of the string
Based on these conditions, the following regex will work:
^(?![ -]|\d+$)[[:alnum:] -]+(?<![ -])$
To see this regex in use, click this link.
This regex works as follows:
Ensure the string doesn't begin with hyphen - or space
Ensure the string isn't composed of only digits
Ensure the string contains between one and unlimited alphanumeric characters
Ensure the string doesn't end with hyphen - or space
This will give you the following matches
Street
Some-Street
Street
1 st street
street 5
The regex will fail to match the following strings (as per your examples)
Street
Street
Street-
-Street
111
Edit
Negative lookbehinds can sometimes cause issues in certain languages (like java).
Below is an adapted version of my previous regex that uses a negative lookahead instead of a negative lookbehind to ensure that the string doesn't end with hyphen - or space .
^(?![ -]|\d+$)(?:(?![ -]$)[\pL\pN -])+$
You can see this regex in use here
Following regex does the job:
(?=.*[[:alpha:]])[[:alnum:]]{1}[[:alnum:] -]*[[:alnum:]]{1}
(?=.*[[:alpha:]]) part guarantees that alpha character [A-Za-z]
exists inside word.
[[:alnum:]]{1} part guarantees that string starts with alphanumeric
character [A-Za-z0-9]
[[:alnum:] -]* alphanumeric characters, space and dash characher
might exist here.
[[:alnum:]]{1} part guarantees that string ends with alphanumeric
character [A-Za-z0-9]
To see it live https://regex101.com/r/V0lesF/1
Related
Start by disclaiming that I am horrible with Regular expressions. I want to find every instance of a Social security number in a string and mask all but the dashes (-) and the last 4 of the SSN.
Example
String someStrWithSSN = "This is an SSN,123-31-4321, and here is another 987-65-8765";
Pattern formattedPattern = Pattern.compile("^\\d{9}|^\\d{3}-\\d{2}-\\d{4}$");
Matcher formattedMatcher = formattedPattern.matcher(someStrWithSSN);
while (formattedMatcher.find()) {
// Here is my first issue. not finding the pattern
}
// my next issue is that I need to my String should look like this
// "This is an SSN,XXX-XX-4321, and here is another XXX-XX-8765"
Expected results are to find each SSN and replace. The code above should produce the string, ""This is an SSN,XXX-XX-4321, and here is another XXX-XX-8765"
You can simplify this, by doing something like the following:
String initial = "This is an SSN,123-31-4321, and here is another 987-65-8765";
String processed = initial.replaceAll("\\d{3}\\-\\d{2}(?=\\-\\d{4})","XXX-XX");
System.out.println(initial);
System.out.println(processed);
Output:
This is an SSN,123-31-4321, and here is another 987-65-8765
This is an SSN,XXX-XX-4321, and here is another XXX-XX-8765
The regex \d{3}\-\d{2}(?=\-\d{4}) captures three digits followed by two digits, separated by a dash (and then followed by a dash and 4 digits, non-capturing). Using replaceAll with this regex will then create the desired masking effect.
Edit:
If you also want 9 consecutive digits to be targeted by this replacement, you can do the following:
String initial = "This is an SSN,123-31-4321, and here is another 987658765";
String processed = initial.replaceAll("\\d{3}\\-\\d{2}(?=\\-\\d{4})","XXX-XX")
.replaceAll("\\d{5}(?=\\d{4})","XXXXX");
System.out.println(initial);
System.out.println(processed);
Output:
This is an SSN,123-31-4321, and here is another 987658765
This is an SSN,XXX-XX-4321, and here is another XXXXX8765
The regex \d{5}(?=\d{4}) captures five digits (followed by 4 digits, non-capturing). Using a second call of replaceAll will target these sequences with the appropriate replacement.
Edit:
Here's a more robust version of the previous regex, and a longer demonstration of how the new regex works:
String initial = "123-45-6789 is a SSN that starts at the beginning of the string,
and still matches. This is an SSN, 123-31-4321, and here is another 987658765. These
have 10+ digits, so they don't match: 123-31-43214, and 98765876545.
This (123-31-4321-blah) has 9 digits, but is followed by a dash, so it doesn't match.
-123-31-4321 is preceded by a dash, so it doesn't match as well. :123-31-4321 is
preceded by a non-colon/digit, so it does match. Here's a 4-2-4 non-SSN that would've
tricked the initial regex: 1234-56-7890. Here's two SSNs in parentheses: (777777777)
(777-77-7777), and here's four invalid SSNs in parentheses: (7777777778) (777-77-77778)
(777-778-7777) (7778-77-7777). At the end of the string is a matching SSN:
998-76-4321";
String processed = initial.replaceAll("(?<=^|[^-\\d])\\d{3}\\-\\d{2}(?=\\-\\d{4}([^-\\d]|$))","XXX-XX")
.replaceAll("(?<=^|[^-\\d])\\d{5}(?=\\d{4}($|\\D))","XXXXX");
System.out.println(initial);
System.out.println(processed);
Output:
123-45-6789 is a SSN that starts at the beginning of the string, and still matches. This is an SSN, 123-31-4321, and here is another 987658765. These have 10+ digits, so they don't match: 123-31-43214, and 98765876545. This (123-31-4321-blah) has 9 digits, but is followed by a dash, so it doesn't match. -123-31-4321 is preceded by a dash, so it doesn't match as well. :123-31-4321 is preceded by a non-colon/digit, so it does match. Here's a 4-2-4 non-SSN that would've tricked the initial regex: 1234-56-7890. Here's two SSNs in parentheses: (777777777) (777-77-7777), and here's four invalid SSNs in parentheses: (7777777778)(777-77-77778) (777-778-7777) (7778-77-7777). At the end of the string is a matching SSN: 998-76-4321
XXX-XX-6789 is a SSN that starts at the beginning of the string, and still matches. This is an SSN, XXX-XX-4321, and here is another XXXXX8765. These have 10+ digits, so they don't match: 123-31-43214, and 98765876545. This (123-31-4321-blah) has 9 digits, but is followed by a dash, so it doesn't match. -123-31-4321 is preceded by a dash, so it doesn't match as well. :XXX-XX-4321 is preceded by a non-colon/digit, so it does match. Here's a 4-2-4 non-SSN that would've tricked the initial regex: 1234-56-7890. Here's two SSNs in parentheses: (XXXXX7777) (XXX-XX-7777), and here's four invalid SSNs in parentheses: (7777777778)(777-77-77778) (777-778-7777) (7778-77-7777). At the end of the string is a matching SSN: XXX-XX-4321
How can I make my string only pass a test if every character in the string is in the regex?
Here is what I have so far:
String w = theApplet.Word.getText().toLowerCase();
if(w.matches(".*[a-z-_]+.*")){
theApplet.words.add(w);
theApplet.str.setText("The word: "+w+" has been added to the list");
}
However, the string is valid even if it contains invalid characters, as long as it contains at least 1 of the characters in the regex.
.* means "match any character zero or more times"
[a-z-_]+ means "match any lowercase character or dash (-) or underscore (_) one or more times".
So the first part is consuming nearly the entire string and the regex is returning true if there is at least one lowercase character/dash/underscore.
Simply remove the .*'s to force all characters to be lowercase characters/dashes/underscores.
I have String like below ,I want to get subString If any special character is there.
String myString="Regular $express&ions are <patterns <that can# be %matched against *strings";
I want out like below
express
inos
patterns
that
matched
Strings
Any one help me.Thanks in Advance
Note: as #MaxZoom pointed out, it seems that I didn't understand the OP's problem properly. The OP apparently does not want to split the string on special characters, but rather keep the words starting with a special character. The former is adressed by my answer, the latter by #MaxZoom's answer.
You should take a look at the String.split() method.
Give it a regexp matching all the characters you want, and you'll get an array of all the strings you want. For instance:
String myString = "Regular $express&ions are <patterns <that can# be %matched against *strings";
String[] words = myString.split("[$&<#%*]");
This regex will select words that starts with special character:
[$&<%*](\w*)
explanation:
[$&<%*] match a single character present in the list below
$&<%* a single character in the list $&<%* literally (case sensitive)
1st Capturing group (\w*)
\w* match any word character [a-zA-Z0-9_]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
g modifier: global. All matches (don't return on first match)
DEMO
MATCH 1 [9-16] express
MATCH 2 [17-21] ions
MATCH 3 [27-35] patterns
MATCH 4 [37-41] that
MATCH 5 [51-58] matched
MATCH 6 [68-75] strings
Solution in Java code:
String str = "Regular $express&ions are <patterns <that can# be %matched against *strings";
Matcher matcher = Pattern.compile("[$&<%*](\\w*)").matcher(str);
List<String> words = new ArrayList<>();
while (matcher.find()) {
words.add(matcher.group(1));
}
System.out.println(words.toString());
// prints [express, ions, patterns, that, matched, strings]
I have a string with data separated by commas like this:
$d4kjvdf,78953626,10.0,103007,0,132103.8945F,
I tried the following regex but it doesn't match the strings I want:
[a-zA-Z0-9]+\\,[a-zA-Z0-9]+\\,[a-zA-Z0-9]+\\,[a-zA-Z0-9]+\\,[a-zA-Z0-9]+\\,[a-zA-Z0-9]+\\,
The $ at the beginning of your data string is not matching the regex. Change the first character class to [$a-zA-Z0-9]. And a couple of the comma separated values contain a literal dot. [$.a-zA-Z0-9] would cover both cases. Also, it's probably a good idea to anchor the regex at the start and end by adding ^ and $ to the beginning and end of the regex respectively. How about this for the full regex:
^[$.a-zA-Z0-9]+\\,[$.a-zA-Z0-9]+\\,[$.a-zA-Z0-9]+\\,[$.a-zA-Z0-9]+\\,[$.a-zA-Z0-9]+\\,[$.a-zA-Z0-9]+\\,$
Update:
You said number of commas is your primary matching criteria. If there should be 6 commas, this would work:
^([^,]+,){6}$
That means: match at least 1 character that is anything but a comma, followed by a comma. And perform the aforementioned match 6 times consecutively. Note: your data must end with a trailing comma as is consistent with your sample data.
Well your regular expression is certainly jarbled - there are clearly characters (like $ and .) that your expression won't match, and you don't need to \\ escape ,s. Lets first describe our requirements, you seem to be saying a valid string is defined as:
A string consisting of 6 commas, with one or more characters before each one
We can represent that with the following pattern:
(?:[^,]+,){6}
This says match one or more non-commas, followed by a comma - [^,]+, - six times - {6}. The (?:...) notation is a non-capturing group, which lets us say match the whole sub-expression six times, without it, the {6} would only apply to the preceding character.
Alternately, we could use normal, capturing groups to let us select each individual section of the matching string:
([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),?
Now we can not only match the string, but extract its contents at the same time, e.g.:
String str = "$d4kjvdf,78953626,10.0,103007,0,132103.8945F,";
Pattern regex = Pattern.compile(
"([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),?");
Matcher m = regex.matcher(str);
if(m.matches()) {
for (int i = 1; i <= m.groupCount(); i++) {
System.out.println(m.group(i));
}
}
This prints:
$d4kjvdf
78953626
10.0
103007
0
132103.8945F
How can i write this as a regular expression?
"blocka#123#456"
i have used # symbol to split the parameters in the data
and the parameters are block name,startX coordinate,start Y corrdinate
this is the data embedded in my QR code.so when i scan the QR i want to check if its the right QR they're scanning. For that i need a regular expression for the above syntax.
my method body
public void Store_QR(String qr){
if( qr.matches(regular Expression here)) {
CurrentLocation = qr;
}
else // Break the operation
}
The Information you specified does not justice using a regular expression at all.
Try to from it in a more general way.
If you really need to scan for "blocka#123#456" then use qr.contains("blocka#123#456");
It depends on what you want to match.
Here are some regex propositions:
^blocka#[0-9]{3}#[0-9]{3}$
^blocka#[0-9]+#[0-9]+$
^blocka(#[0-9]{3}){2}$
^blocka(#[0-9]+){2}$
^blocka(#[0-9]{3})+$
^blocka(#[0-9]+)+$
Otherwise, just use contains() or similar.
myregexp.com is nice to do some testing.
Official Java Regex Tutorial is quite ok to learn and includes most things one needs to know.
The Pattern documentation also includes fancy predefined character classes that are missing in above tutorial.
You did not specify anything that has to be regular in that example you gave. Regular expressions make only sense if there are rules to validate the input.
If it has to be exactly "blocka#123#456" then "blocka#123#456" or "^blocka#123#456$" will work as regex. Stuff between ^ and $ means that the regex inside must span from begin to end of the input. Sometimes required and usually a good idea to put that around your regex.
If blocka is dynamic replace it with [a-z]+ to match any sequence of lowercase letters a through z with length of at least 1. block[a-z] would match blocka, blockb, etc.
And [a-z]{6} would match any sequence of exactly 6 letters. [a-zA-Z] also includes uppercase letters and \p{L} matches any letter including unicode stuff (e.g. Blüc本).
# matches #. Like any character without special regex meaning ( \ ^ $ . | ? * + ( ) [ ] { } ) characters match themselves. [^#] matches every character but #.
Regarding the numbers: [0-9]+ or \d+ is a generic pattern for several numbers, [0-9]{1,4} would match anything consisting out of 1-4 numbers like 007, 5, 9999. (?:0|[1-9][0-9]{0,3}) for example will only match numbers between 0 and 9999 and does not allow leading zeros. (?:STUFF) is a non-capturing group that does not affect the groups you can extract via Matcher#group(1..?). Useful for logical grouping with |. The meaning of (?:0|[1-9][0-9]{0,3}) is: either a single 0 OR ( 1x 1-9 followed by 0 to 3 x 0-9).
[0-9] is so common that there is a predefinition for it : \d (digit). It's \\d inside the regex String since you have to escape the \.
So some of your options are
".*" which matches absolutely everything
"^[^#]+(?:#[^#]+)+$" which matches anything separated by # like "hello #world!1# -12.f #本#foo#bar"
"^blocka(#\\d+)+$" which matches blocka followed by at least one group of numbers separated by # e.g. blocka#1#12#0007#949432149#3
"^blocka#(?:[0-9]|[1-9][0-9]|[1-3][0-9]{2})#[4-9][0-9]{2}$" which will match only if it finds blocka# followed by numbers 0 - 399, followed by a # and finally numbers 400-999
"^blocka#123#456$" which matches only exactly that string.
All that are regular expressions that match the example you gave.
But it's probably as simple as
public void Store_QR(String qr){
if( qr.matches("^blocka#\\d+#\\d+$")) {
CurrentLocation = qr;
}
else // Break the operation
}
or
private static final Pattern QR_PATTERN = Pattern.compile("^blocka#(\\d+)#(\\d+)$");
public void Store_QR(String qr){
Matcher matcher = QR_PATTERN.matcher(qr);
if(matcher.matches()) {
int number1 = Integer.valueOf(matcher.group(1));
int number2 = Integer.valueOf(matcher.group(2));
CurrentLocation = qr;
}
else // Break the operation
}
BlockName#start_X#start_Y any block name.. starting with the string"block" and followed by two integers
I guess a good regex for that would be "^block\\w+#\\d+#\\d+$", starting with "block", then any combination of a-z, A-Z, 0-9 and _ (thats the \w) followed by #, numbers, #, numbers.
Would match block_#0#0, blockZ#9#9, block_a_Unicorn666#0000#1234, but not block#1#2 because there is no name at all and would not match blockName#123#abc because letters instead of number. Would also not match Block_a#123#456 because of the uppercase B.
If the name part (\\w+) is too liberal (___, _123 would be a legal names) use e.g. "^block_?[a-zA-Z]+#\\d+#\\d+$", what won't allow numbers and names may only be separated by a single optional _ and there have to be letters after that. Would allow _a, a, _ABc, but not _, _a_b, _a9. If you want to allow numbers in names [a-zA-Z0-9] would be the character class to use.
I suggest:
[a-z]+#\d+#\d+
And if you want capture the 3 parts:
([a-z]+)#(\d+)#(\d+)
Matcher.group( 1, 2 or 3 ) returns the parts