How to construct specific Java regexp - java

I need to check a login name. It has to (it's political, not technical decision) have:
from 5 to 30 characters;
characters must be from group [a-zA-Z0-9*];
at least one character must be number;
it's not possible to have all characters just from numbers if login name has 5 characters but if it has 6 or more character, it can be constructed just from numbers.
I have regexp (?=[a-zA-Z0-9*]*[0-9])[a-zA-Z0-9*]{5,30} wich works for points 1-3, but can't imagine how to include check for point 4.

Use regex with negative look ahead assertion
(?!\d{5}$)(?=[a-zA-Z\d*]*\d)[a-zA-Z\d*]{5,30}
Regex explanation here.

It is always tempting to validate all aspects of a string using a single, pretty complicated regular expression. But keep in mind that this thing might be hard to extend, maintain in the future.
Meaning: depending on the rate of "changes" to your validation rules, there might be better designs. For example, one could envision something like:
interface NameValidator {
isValid(String name) throws InvalidNameException;
}
class LengthValidator implements NameValidator ...
class XyzValidator implements NameValidator ...
class NameValidation {
private final List validators = Arrays.toList(new LengthValidator(), ...
void isValid(String name) {
run all validators ...
This way, adding / changing one of your validation rules ... becomes much more straight forward ... than tampering a single regular expression, potentially breaking some other part of it.
Beyond that, you can even build different rule set; by combining different instances of NameValidation implementers; whilst avoiding code duplication .

As others have pointed out, you don’t have to do it with a single regex. Even if that’s possible, it will be obnoxiously complex and difficult for others to understand.
The simplest approach is the best:
boolean passwordValid = password.matches("[a-zA-Z0-9*]{5,30}")
&& password.matches(".*[0-9].*")
&& !password.matches("[0-9]{5}");

I'd like to propose a different approach: don't use a regex but check each character and collect password properties. That way you are able to implement more complex requirements later on, e.g. "it must have 3 out of 4".
Example:
String pw = "1a!cde";
Set<PwProperty> passwordProperties = new HashSet<>();
for( char c : pw.toCharArray() ) {
if( isDigit( c ) ) {
passwordProperties.add( PwProperty.DIGIT );
}
else if ( isSpecialChar( c ) ) {
passwordProperties.add( PwProperty.SPECIAL);
}
else if( isUpperCase( c ) ) {
passwordProperties.add( PwProperty.UPPER_CASE);
}
else if( isLowerCase( c ) ) {
passwordProperties.add( PwProperty.LOWER_CASE);
}
else {
passwordProperties.add( PwProperty.UNKNOWN );
}
}
Then you could check that like this (pseudo code):
if( !pw.length() in range ) {
display "password too short" or "password too long"
}
if( passwordProperties.contains( PwProperty.UNKNOWN ) {
display "unsupported characters used"
}
Set<PwProperty> setOf4 = { DIGIT, SPECIAL, LOWER_CASE, UPPER_CASE }
if( intersection( passwordProperties, setOf4 ).size() < 3 ) {
display "use at least 3 out of 4"
}
if( !passwordProperties.contains( DIGIT ) ) {
display "must contain digits"
}
display "password strength" being intersection( passwordProperties, setOfGoodProperties ).size()
etc.
This can then be expanded e.g. with properties like DIGIT_SEQUENCE which might be unwanted etc.
The main advantage is that you have more detailed information on the password rather than "it matches a certain regex or not" and you can use that information to guide the user.

Related

Java Regex for international phone number [duplicate]

I have a database with millions of phone numbers with free-for-all formatting. Ie, the UI does not enforce any constraints and the users are typing in whatever they want.
What I'm looking for is a Java API that can make a best-effort to convert these into a consistent format. Ideally, the API would take the free text value and a country code and produce a valid international phone number or throw an exception.
For example, a phone number in the system might look like any of the following:
(555) 478-1123
555-478-1123
555.478.1123
5554781123
Given the country of US, the API would produce the value "+1 (555) 478-1123" for all these. The exact format does not matter, as long as it's consistent.
There are also numbers in the system without area codes, such as "478-1123". In that case, I would expect a NoAreaCodeException, or something similar.
There could also be data such as "abc", which should also throw exceptions.
Of course, there are countless variations of the examples I have posted, as well as the enormous complication of international phone numbers, which have quite complicated validation rules. This is why I would not consider rolling my own.
Has anyone seen such an API?
You could write your own (for US phone # format):
Strip any non-numeric characters from the string
Check that the remaining string is ten characters long
Put parentheses around the first three characters and a dash between the sixth and seventh character.
Prepend "+1 " to the string
Update:
Google recently released libphonenumber for parsing, formatting, storing and validating international phone numbers.
You could try this Java phone number formatting library https://github.com/googlei18n/libphonenumber
It has data for hundreds of countries and formats.
Simple regex parser
/**
* #param pPhoneNumber
* #return true if the phone number is correct
*/
private boolean isPhoneNumberCorrect(String pPhoneNumber) {
Pattern pattern = Pattern
.compile("((\\+[1-9]{3,4}|0[1-9]{4}|00[1-9]{3})\\-?)?\\d{8,20}");
Matcher matcher = pattern.matcher(pPhoneNumber);
if (matcher.matches()) return true;
return false;
}
Format
I made this according to my needs, and it accepts numbers:
CountryCode-Number
Number
Country Codes:
They may have a: +, or either one or two zeros.
Then, it may be followed by a -.
Accepts:
+456
00456
+1234
01234
All above may or may not followed by a -
Rejects:
0456
it should be:
00456 or+456 or04444
Number
A simple number with 8-20 digits.
Accepts:
00456-12345678
+457-12345678
+45712345678
0045712345678
99999999
Extend it?
Feel free, so you may include support for . or '(' separators. Just make sure you escape them, e.g. for ( use \(.
I don't know of such an API but it looks like could be done by using regular expressions. Probably you can't convert all numbers to a valid format but most of them.
The recent versions of http://code.google.com/p/libphonenumber/ have added metadata for many new countries and added a lot more detail for some of the countries previously listed.
The current source code version is r74 and the .jar file is at version 2.6. Previous .jar files were compiled for Java 1.6, but as of libphonenumber version 2.5.1 onwards they are now compiled for Java 1.5 and above.
Don't forget there is also a direct port of the code to JavaScript. It can be found in the source code tree at http://code.google.com/p/libphonenumber/source/browse/#svn%2Ftrunk%2Fjavascript
Bug reports are welcome. Updates to metadata are actively encouraged, as even the official government-published area code lists for many countries are either incomplete or out of date.
Don't re-invent the wheel; use an API, e.g. http://libphonenumber.googlecode.com/
This API gives you nice formatting, too.
Example:
String number = "(555) 478-1123";
PhoneNumberUtil phoneNumberUtil = PhoneNumberUtil.getInstance();
try {
Phonenumber.PhoneNumber phoneNumber = phoneNumberUtil.parse(number, Locale.US.getCountry());
} catch (NumberParseException e) {
// error handling
}
You could even use the phoneNumber object to nicely format it a valid phone number before saving it to the DB or whatever.
For French number which look like "01 44 55 66 77", we can use the following logic.
DecimalFormatSymbols dfs = new DecimalFormatSymbols();
dfs.setGroupingSeparator(' '); // sometimes '.' is used
DecimalFormat decfmt = new DecimalFormat("0,0", dfs); // enable grouping
decfmt.setMinimumIntegerDigits(10); // we always have 10 digits
decfmt.setGroupingSize(2); // necessary in order to group digits by 2 orders
System.out.println(decfmt.format(144556677)); // outputs "01 44 55 66 77"
Once this could be done, with google's phone number API the others mentioned, we can parse these sequences easily and reformat them into other forms such as "+33 1 44 55 66 77" like the following:
Iterable<PhoneNumberMatch> numbers = PhoneNumberUtil.getInstance().findNumbers(textWithPhoneNums, "FR");
for(Iterator<PhoneNumberMatch> iterator = numbers.iterator(); iterator.hasNext(); ){
PhoneNumberMatch pnm = iterator.next();
PhoneNumber number = pnm.number();
System.out.println(PhoneNumberUtil.getInstance().formatOutOfCountryCallingNumber(number, null));
}
I don't think there is a way of recognizing the lack of an area code unless your numbers are all from one country (presumably the USA), as each country has its own rules for things like area codes.
I'd start looking for detailed information here, here, and here - if there are APIs to handle it (in Java or otherwise), they might be linked to there as well.
There are commercial programs that format and validate international telephone numbers, like this one which even checks for valid area codes in some countries. For North America, the NANPA provides some resources for validating area codes.
The best i found was javax.telephony, to be found here: http://java.sun.com/products/javaphone/
It has an Address class, but sadly that class did not solve your problem :(
Well, maybe you can find a solution by digging deeper into it.
Apart of that, my first idea was to use regex. However, that seems to be a kind of bad solution to this specific problem.
My own needs were very simple. I just needed to take a 7 or 10-digit number and put separators (a dash, period, some string of characters, etc.) between the area code, exchange, and exchange number. Any value passed into the method that is not all digits or is not a length of 7 or 10 is simply returned. A null value returns an empty string and a null value for the separator is treated like an empty string. My code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
// ...
private String formatPhoneNumber(String phnumber, String separator) {
phnumber = (phnumber == null) ? "" : phnumber;
if ((phnumber.length() != 7) && (phnumber.length() != 10)) { return phnumber; }
// If we get here, 'phnumber' is for sure either 7 or 10 chars long
separator = (separator == null) ? "" : separator;
Pattern p = Pattern.compile("([0-9]*)");
Matcher m = p.matcher(phnumber);
if (m.matches()) {
if (phnumber.length() == 7) {
return phnumber.substring(0, 3) + separator + phnumber.substring(4);
} else {
return phnumber.substring(0, 3) + separator + phnumber.substring(3, 6)
+ separator + phnumber.substring(6);
}
// If we get here, it means 1 or more of the chars in 'phnumber'
// is not a digit and so 'phnumber' is returned just as passed in.
return phnumber;
}
I have created a Helper class using libphonenumber, But it is still on possibilities i.e (users mostly saves the local numbers in local contacts as local format) I mean without country code since it is local number but will save the International numbers with country code. and this helper works for both of these scenario's if the number is in global format it will simply keep it as it is while converts the local numbers to internal format. Below is the Code and Usage
class PhoneNumberHelper {
companion object {
fun correctNumber(number: String, context: Context): String? {
val code = StorageAdapter.get(context).userCountryCode
return validateNumber(number, code)
}
private fun validateNumber(number: String, mUserCountryCode: Int): String? {
return Utils.formatNumber(Utils.removeDelimetersFromNumber(number), mUserCountryCode)
}
fun formatNumber(destinationNumber: String, countryCode: Int): String? {
try {
val phoneUtil = PhoneNumberUtil.getInstance()
val regionCode = phoneUtil.getRegionCodeForCountryCode(countryCode)
var formattedNumber = formatNumber(destinationNumber, regionCode)
if (TextUtils.isEmpty(formattedNumber)) {
formattedNumber = destinationNumber
}
return formattedNumber
} catch (exp: Exception) {
Log.e("formatNumber", exp.toString())
}
return destinationNumber
}
fun formatNumber(destinationNumber: String, regionCode: String): String? {
if (TextUtils.isEmpty(regionCode)) {
return null
}
var number: String? = null
try {
val phoneUtil = PhoneNumberUtil.getInstance()
val phoneNumber = phoneUtil.parse(destinationNumber, regionCode)
if (phoneUtil.isValidNumber(phoneNumber)) {
/*
* E164 format is as per international format but no
* formatting applied e.g. no spaces in between
*/
number = phoneUtil.format(phoneNumber, PhoneNumberUtil.PhoneNumberFormat.E164)
number = number!!.replace("+", "00")
}
} catch (e: Exception) {
// number would be returned as null if it catches here
}
return number
}
}
}
Here Is how You will use it:
var globalnumber = PhoneNumberHelper.correctNumber(contact.mobile, context)
Clarification:
val code = StorageAdapter.get(context).userCountryCode
This is the Country code you should Store at during Signup. e.g. 0044 or +44
Don't forget to Include Dependency for libphone :
implementation 'com.googlecode.libphonenumber:libphonenumber:8.8.0'

Determine if a JTextField contains an integer

I'm making a word guessing game. The JTextField can't be empty, can't be longer or smaller than 5 and cannot contain numbers. How do I do the last part?
#Override
public void actionPerformed(ActionEvent e) {
if (text.getText().isEmpty()) {
showErrorMessage("You have to type something");
} else if (text.getText().length() != 5) {
showErrorMessage("Currently only supporting 5-letter words");
} else if (contains integer){
showErrorMessage("Words don't contain numbers!");
} else {
r.setGuess(text.getText());
}
}
Rather than explicitly checking for numbers, I would suggest whitelisting characters which are allowed. Would letters with accents be allowed, for example? Both upper and lower case? Punctuation?
Once you've worked out your requirements, I suggest you express them as a regular expression - and then check whether the text matches the regular expression.
For example:
private static final Pattern VALID_WORD = Pattern.compile("^[A-Za-z]*$");
...
if (!VALID_WORD.matcher(text.getText()).matches()) {
// Show some appropriate error message
}
Note that I haven't included length checks in there, as you're already covering those - and it may well be worth doing so in order to give more specific error messages.
You might also want to consider preventing your text box from accepting the invalid characters to start with - just reject the key presses, rather than waiting for a submission event. (You could also change the case consistently at the same time, for example.) As noted in comments, JFormattedTextField is probably a good match - see the tutorial to get started.
create a method that checks if the JTextField has a number like this:
private boolean isContainInt(JTextField field) {
Pattern pt = Pattern.compile("\\d+");
Matcher mt = pt.matcher(field.getText());
return mt.find();
}
if (!text.getText().matches("[a-zA-Z]*")) {
// something wrong
}

Replacing tokens in free-text only between certain other tokens in the text in Java

I have a situation in a free-text file, where between any pair of two string matches of my choice - e.g.
<hello> and </hello>
I want to replace the occurrence of a third string-match with a different string e.g. '=' with '&EQ;'
e.g.
hi=I want this equals sign to stay the same,but=<hello>
<I want="this one in the hello tag to be replaced"/>
</hello>,and=of course this one outside the tag to stay the same
becomes
hi=I want this equals sign to stay the same,but=<hello>
<I want&EQ;"this one in the hello tag to be replaced"/>
</hello>,and=of course this one outside the tag to stay the same
Basically this is because an XML body is being sent in a value-pair and it is royally screwing things up (I am sent this format by a venue and don't have control over it
My immediate approach was to start with a BufferedReader and parse into a StringBuilder going through line by line using String.indexOf( ) to toggle on and off whether we are in tags or not, but 20 minutes in to this approach it occurred to me this may be a bit brute-force and there might be an existing solution to this kind of problem
I know this approach will work eventually but my question is, is there a better way (that is one that is higher level and uses existing Java libraries / common frameworks e.g. Apache Commons, etc. which would make it less error-prone and more maintainable. I.e. is there a more intelligent way of solving this problem than the approach I am taking? Which is effectively brute-force parsing.
If you want to escape XML, have a look at Apache Commons Lang StringEscapeUtils, specifically StringEscapeUtils.escapeXML, it should do what you need.
My great impenetrable solution is as follows, and it seems to work.
I do apologise that it's so hard to follow but it basically came down to this from factorising and re-factorising, many times over combining similar pieces of code.
It will replace all the occurences of String 'replace' with String 'with' between tokens of 'openToken' and 'closeToken' and should be started with mode=false to begin with
As with most things in life, there's probably a really clever succinct way to do this with RegEx
boolean mode=false
StringBuilder output
while( String line = newLine ) {
mode = bodge( "<hello>", "</hello>", "=", "&EQ;", output, mode );
}
private static boolean bodge( String openToken, String closeToken, String replace, String with, String line, StringBuilder out, boolean mode ) {
String comparator = mode ? closeToken : openToken;
int index = line.indexOf( comparator );
// drop through straight if nothing interesting
if( index == -1 ) {
String outLine = mode ?
replacer( line , replace, with ) :
line;
out.append( outLine );
out.append( "\r\n" );
return mode;
}
else {
int endOfToken = index + comparator.length();
String outLine = line.substring(0, endOfToken);
outLine = mode ?
replacer( outLine , replace, with ) :
outLine;
out.append(outLine );
return bodge( openToken, closeToken, replace, with, line.substring( endOfToken ), out, !mode );
}
}

Determine if a string can be transformed to another string when only insert/remove/replace operations are allowed

I must write a function that takes two words (strings) as arguments, and determines if the first word can be transformed into the second word using only one first-order transformation.
First-order transformations alter only one letter in a word
The allowed transformations are: insert, remove and replace
insert = insert a letter at any position in the word
remove = delete a letter from any position in the word
replace = replace a letter with another one
Any suggestions? Any Java examples would be great!
Think: If you're only allowed a single transformation, then the difference in length between the "before" and "after" words should give you a very strong hint as to which of those three transformations has any chance of being successful. By the same token, you can tell at a glance which transformations will be simply impossible.
Once you've decided on which transformation, the rest of the problem becomes a job for Brute Force Man and his sidekick, Looping Lady.
This does look like homework so I'm not going to give you the answer, but any time you approach a problem like this the best thing to do is start sketching out some ideas. Break the problem down into smaller chunks, and then it becomes easier to solve.
For example, let's look at the insert operation. To insert an letter, what is that going to do to the length of the word in which we are inserting the letter? Increase it or decrease it? If we increase the length of the word, and the length of this new word is not equal to the length of the word we are trying to match, then what does that tell you? So one condition here is that if you are going to perform an insert operation on the first word to make it match the second word, then there is a known length that the first word must be.
You can apply similar ideas to the other 2 operations.
So once you establish these conditions, it becomes easier to develop an algorithm to solve the problem.
The important thing in any type of assignment like this is to think through it. Don't just ask somebody, "give me the code", you learn nothing like that. When you get stuck, it's ok to ask for help (but show us what you've done so far), but the purpose of homework is to learn.
If you need to check if there is one and exactly one edit from s1 to s2, then this is very easy to check with a simple linear scan.
If both have the same length, then there must be exactly one index where the two differ
They must agree up to a common longest prefix, then skipping exactly one character from both, they must then agree on a common suffix
If one is shorter than the other, then the difference in length must be exactly one
They must agree up to a common longest prefix, then skipping exactly one character from the longer one, they must then agree on a common suffix
If you also allow zero edit from s1 to s2, then simply check if they're equal.
Here's a Java implementation:
static int firstDifference(String s1, String s2, int L) {
for (int i = 0; i < L; i++) {
if (s1.charAt(i) != s2.charAt(i)) {
return i;
}
}
return L;
}
static boolean oneEdit(String s1, String s2) {
if (s1.length() > s2.length()) {
return oneEdit(s2, s1);
}
final int L = s1.length();
final int index = firstDifference(s1, s2, L);
if (s1.length() == s2.length() && index != L) {
return s1.substring(index+1).equals(s2.substring(index+1));
} else if (s2.length() == L + 1) {
return s1.substring(index).equals(s2.substring(index+1));
} else {
return false;
}
}
Then we can test it as follows:
String[][] tests = {
{ "1", "" },
{ "123", "" },
{ "this", "that" },
{ "tit", "tat" },
{ "word", "sword" },
{ "desert", "dessert" },
{ "lulz", "lul" },
{ "same", "same" },
};
for (String[] test : tests) {
System.out.printf("[%s|%s] = %s%n",
test[0], test[1], oneEdit(test[0], test[1])
);
}
This prints (as seen on ideone.com):
[1|] = true
[123|] = false
[this|that] = false
[tit|tat] = true
[word|sword] = true
[desert|dessert] = true
[lulz|lul] = true
[same|same] = false
You can use the Levenshtein distance and only allow distances of 1 (which means, one char must be altered). There are several implementations just google "Levenshtein java" or so.
The other "not so smart" but working thing would be the good old brute force. Just try out every situation with every char and you get what you want. :-)

Generating the permutations from a number of Characters

I'm working on a predictive text solution and have all the words being retrieved from a Trie based on input for a certain string of characters, i.e. "at" will give all words formed with "at" as a prefix. The problem that I have now, is that we are also supposed to return all other possibilities from pressing these 2 buttons, Button 2 and button 8 on the mobile phone, which would also give words formed with, "au, av, bt, bu, bv, ct, cu, cv" (most of which won't have any actual words.
Can anyone suggest a solution and how I would go about doing this for calculating the different permutations?
(at the moment, I'm prompting the user to enter the prefix (not using a GUI right now)
Welcome to concepts like recursivity and combinatorial-explosion :)
Due to combinatorial explosion, you have to be "smart" about it: if the user wants to enter a legitimate 20 letters word, it is unacceptable for your solution to "hang" trying stupidly tens of millions of possibilities.
So you should only recurse when the trie has at least one entry for your prefix.
Here's a way to generate all prefixes and only recurse when there's a match.
In this example, I faked a trie always saying there's an entry. I made this in five minutes so it can surely be beautified/simplified.
The advantage of such a solution is that it works if the user presses on one, two, three, four or 'n' keys, without needing to change your code.
Note that you probably do not want to add all the words starting with 'x' letters when there are too many. It's up to you to find the strategy that matches best your need (wait for more keypresses to reduce candidates or add most common matches as candidates etc.).
private void append( final String s, final char[][] chars, final Set<String> candidates ) {
if ( s.length() >= 2 && doesTrieContainAnyWordStartingWith( s ) ) {
candidates.add( s + "..." ); // TODO: here add all words starting with 's' instead of adding 's'
}
if ( doesTrieContainAnyWordStartingWith( s ) && chars.length > 0 ) {
final char[][] next = new char[chars.length-1][];
for (int i = 1; i < chars.length; i++) {
next[i-1] = chars[i];
}
// our three recursive calls, one for each possible letter
// (you'll want to adapt for a 'real' keyboard, where some keys may only correspond to two letters)
append( s + chars[0][0], next, candidates );
append( s + chars[0][1], next, candidates );
append( s + chars[0][2], next, candidates );
} else {
// we do nothing, it's our recursive termination condition and
// we are sure to hit it seen that we're decreasing our 'chars'
// length at every pass
}
}
private boolean doesTrieContainAnyWordStartingWith( final String s ) {
// You obviously have to change this
return true;
}
Note the recursive call (only when there's a matching prefix).
Here's how you could call it: I faked the user pressing '1', then '2' and then '3' (I faked this in the chars char[][] array I created):
public void testFindWords() {
// Imagine the user pressed 1 then 2 then 3
final char[][] chars = {
{'a','b','c'},
{'d','e','f'},
{'g','h','i'},
};
final Set<String> set = new HashSet<String>();
append( "", chars, set ); // We enter our recursive method
for (final String s : set ) {
System.out.println( "" + s );
}
System.out.println( "Set size: " + set.size() );
}
That example shall create a set containing 36 matches because I "faked" that every prefix is legit and that every prefix leads to exactly one word (and I only added the "word" when it's made of at least two letters). Hence 3*3*3 + 3*3, which gives 36.
You can try the code, it's fully working but you'll have to adapt it of course.
In my fake example (user pressing 1,2 then 3), it creates this:
cdh...
afi...
adi...
beg...
cf...
adh...
cd...
afg...
adg...
bei...
ceg...
bfi...
cdg...
beh...
aeg...
ce...
aeh...
afh...
bdg...
bdi...
cfh...
ad...
cdi...
ceh...
bfh...
aei...
cfi...
be...
af...
bdh...
bf...
cfg...
bfg...
cei...
ae...
bd...
Set size: 36
Welcome to real coding :)

Categories