Remove Spaces and Special Characters (between Numbers only) in a string - java

I am novice in RegEx. I am trying to strip all whitespaces and special characters between numbers in a string. Please know that string may contain other characters with numbers.
For Example take this string,
String s1 = "This is Sample AmericanExp Card Number 3400 1000 2000 009";
What I am trying is :-
String s1 = "This is Sample AmericanExp Card Number 3400 1000 2000 009";
String regExp = "[^\\w]+";
String replacement = "";
String changed= s1.replaceAll(regExp, replacement);
System..out.println("changed->"+content);
Its giving output as ThisisSampleAmericanExpCardNumber340000000000009,
The Required output is "This is Sample AmericanExp Card Number 340010002000009".
Appreciate The Help and Please let me know the concept behind it.
EDIT:-
Now I am masking the card Number and Its Pin (PCI), So I have this formula
^((4\\d{3})|(5[1-5]\\d{2})|(6011))-?\\d{4}-?\\d{4}-?\\d{4}|3[4,7]\\d{13}$
Which Checks for some type of credit cards. I am modifying it to check for its PIN and CVV also.(Matching 4 and 6 digit numbers also)
Sample String = "Sample AmericanExp Card Number 3400 1000 2000 009 and PIN is 1234 , CVV = 654321"
I modified the formula as :
^((4\\d{3})|(5[1-5]\\d{2})|(6011))-?\\d{4}-?\\d{4}-?\\d{4}|3[47]\\d{13}$|^[0-9]{4}$|^[0-9]{6}$
Which Doesn't gives me the correct output (Matching 4 and 6 digit numbers also).

You may use
.replaceAll("(?<=\\d)[\\W_]+(?=\\d)", "")
Or, if you need to deal with Unicode strings:
.replaceAll("(?U)(?<=[0-9])[\\W_]+(?=[0-9])", "")
See the regex. Details:
(?<=\d) - a positive lookbehind that matches a position immediately preceded with a digit
[\W_]+ - one or more non-word or underscore characters
(?=\d) - a positive lookahead that matches a location immediately followed with a digit.
Note that the (?U), Pattern.UNICODE_CHARACTER_CLASS embedded option, will make \W Unicode aware and it will no longer match Cyrillic, etc. letters.
See the Java demo:
String s1 = "This is Sample AmericanExp Card Number 3400 1000 2000 009";
System.out.println("changed -> " + s1.replaceAll("(?<=\\d)[\\W_]+(?=\\d)", ""));
// => changed -> This is Sample AmericanExp Card Number 340010002000009

Related

Why does this regex fails to check accurately?

I have the following regex method which does the matches in 3 stages for a given string. But for some reason the Regex fails to check some of the things. As per whatever knowledge I have gained by working they seem to be correct. Can someone please correct me what am I doing wrong here?
I have the following code:
public class App {
public static void main(String[] args) {
String identifier = "urn:abc:de:xyz:234567.1890123";
if (identifier.matches("^urn:abc:de:xyz:.*")) {
System.out.println("Match ONE");
if (identifier.matches("^urn:abc:de:xyz:[0-9]{6,12}.[0-9]{1,7}.*")) {
System.out.println("Match TWO");
if (identifier.matches("^urn:abc:de:xyz:[0-9]{6,12}.[a-zA-Z0-9.-_]{1,20}$")) {
System.out.println("Match Three");
}
}
}
}
}
Ideally, this code should generate the output
Match ONE
Match TWO
Match Three
Only when the identifier = "urn:abc:de:xyz:234567.1890123.abd12" but it provides the same output event if the identifier does not match the regex such as for the following inputs:
"urn:abc:de:xyz:234567.1890123"
"urn:abc:de:xyz:234567.1890ANC"
"urn:abc:de:xyz:234567.1890123"
"urn:abc:de:xyz:234567.1890ACB.123"
I am not understanding why is it allowing the Alphanumeric characters after the . and also it does not care about the characters after the second ..
I would like my Regex to check that the string has the following format:
String starts with urn:abc:de:xyz:
Then it has the numbers [0-9] which range from 6 to 12 (234567).
Then it has the decimal point .
Then it has the numbers [0-9] which range from 1 to 7 (1890123)
Then it has the decimal point ..
Finally it has the alphanumeric character and spcial character which range from 1 to 20 (ABC123.-_12).
This is an valid string for my regex: urn:abc:de:xyz:234567.1890123.ABC123.-_12
This is an invalid string for my regex as it misses the elements from point 6:
urn:abc:de:xyz:234567.1890123
This is also an invalid string for my regex as it misses the elements from point 4 (it has ABC instead of decimal numbers).
urn:abc:de:xyz:234567.1890ABC.ABC123.-_12
This part of the regex:
[0-9]{6,12}.[0-9]{1,7} matches 6 to 12 digits followed by any character followed by 1 to 7 digits
To match a dot, it needs to be escaped. Try this:
^urn:abc:de:xyz:[0-9]{6,12}\.[0-9]{1,7}\.[a-zA-Z0-9\-_]{1,20}$
This will match with any number of dot alphanum at the end of the string as your examples:
^urn:abc:de:xyz:\d{6,12}\.\d{1,7}(?:\.[\w-]{1,20})+$
Demo & explanation

How to extract specific substring from an expression in java

I have the following text:
Units Currently On Bed List
[total beds=0]
Number Of Beds Unit Interval Select All
The number after '=' is dynamic and subject to change. How can I extract the number in java using regex?
If you mean "number after = is dynamic and subject to change.", then for your example data you could for example capture the number in a group:
\[.+?=(\d+)\]
Match a \[
Match any character one or more times non greedy .+?
Match an equal sign =
Capture 1 or more digits (\d+)
Match a \]
Not sute using regex but this is one way :
int first = str.indexOf('=') +1;
int last = str.lastIndexOf(']');
String nbr = str.subString(first,last );
int number = Integer.parseInt(nbr);

Formatting numbers in a mathematical expression from String values in java

I know how to set thousands separator for numbers but i need to display numbers with thousands separator in mathematical expression from String values.
Basically, I want these results:
"123456 + 36514" becomes "123,456 + 36,514"
"12345678 + 36542 * 69541 / 987654" becomes "12,345,678 + 36,542 * 69,541 / 987,654"
and ...
You want to insert comma after a digit that is followed by an exact number of 3-digit blocks, so:
(?<=\d) Positive lookbehind: Match a digit
(?= Positive lookahead:
(?:\d{3})+ One or more sequences of 3 digits
\b Word-boundary, i.e. end of digit sequence
)
That will match the empty spaces where you want commas, so do a replaceAll():
str = str.replaceAll("(?<=\\d)(?=(?:\\d{3})+\\b)", ",");
See regex101 for demo.
I can provide you an algorithm rather giving the code:
Loop: String of repression one character at a time
{
If the current char is 0 to 9 append to tempNumber string
else
{
//by the time you reach this line the number is in the tempNumber string
Format the number stored in tempNumber // java.text.NumberFormat
append the formatted number in targetExpression string
append the current char to the targetExpression string
}
}

Matching a sequence of dot-separated digits of variable length with regular expressions

I am parsing text from an Excel spread sheet using Java.
I need to validate whether a sequence of 3 integers is present in the text.
The sequence of integers is:
comma separated inside
whitespace-delimited outside
Integers in the sequence can either have 1 or 2 digits.
This is my attempt:
*((\d|\d\d)[^\w](\d|\d\d)[^\w](\d|\d\d))*
With the * meaning that I can have characters before it, and the [\d|\d\d] being a number of either one or two digits, and [^\w] being a non word character?
Valid text: CPI WEIGHTS 05.1.2 : CARPETS & OTHER FLOOR COVERINGS
Invalid text: CPIH INDEX 05.2 : HOUSEHOLD TEXTILES 2005=100
Your last comment actually clarifies the question a bit.
Assuming you are looking for a dot-separated sequence of 1 or 2 digits, externally delimited by whitespace, here's an example:
String ok = "CPI WEIGHTS 05.1.2 : CARPETS & OTHER FLOOR COVERINGS";
String notOk = "CPIH INDEX 05.2 : HOUSEHOLD TEXTILES 2005=100";
Pattern p = Pattern.compile("(\\d{1,2}(\\.|\\s)){3}");
Matcher m = p.matcher(ok);
while (m.find()) {
System.out.printf("Found: %s%n", m.group());
}
m = p.matcher(notOk);
while (m.find()) {
System.out.printf("Found: %s%n", m.group());
}
Output
Found: 05.1.2

Regular expression for phone number starting with '00' or '+'

I've got a regex problem: I'm trying to force a phone number beginning with either "00" or "+" but my attempt doesn't work.
String PHONE_PATTERN = "^[(00)|(+)]{1}[0-9\\s.\\/-]{6,20}$";
It still allows for example "0123-45678". What am i doing wrong?
Inside character class every character is matched literally, which means [(00)|(+)] will match a 0 or + or | or ( or )
Use this regex:
String PHONE_PATTERN = "^(?:00|\\+)[0-9\\s.\\/-]{6,20}$";
if you have removed spaces, hyphens and whatever from the number, and you want to catch either +xxnnnnnnnn or 00xxnnnnnnnn where xx is the country code of course and n is the 9 digit number OR 0nnnnnnnnn where a non international number starting with a zero is followed by 9 digits then try this regex
String PHONE_PATTERN = "^(?:(?:00|\+)\d{2}|0)[1-9](?:\d{8})$"

Categories