REGEX to format phone number in java - java

given a phone number with spaces and + allowed, how would you right a regular expression to format it so that non-digits and extra spaces are removed?
I have this so far
String num = " Ken's Phone is + 123 2213 123 (night time)";
System.out.println(num.replaceAll("[^\\d|+|\\s]", "").replaceAll("\\s\\s+", " ").replaceAll("\\+ ", "\\+").trim());
Would you simplify it so that the same result is obtained?
Thank you

I would put trim() first, or at least before you replace every multiple spaces.
Also keep in mind that \s means whitespaces: [ \t\n\x0B\f\r], if you only mean ' ' then use it.
A nicer way to express that you only want at least two spaces to be replaced would be
replaceAll("\\s{2,}", " ")

First extract the number-with-spaces part, then compress multiple spaces to single spaces. then finally remove all spaces that follow a plus sign:
String numberWithSpaces = str.replaceAll("^[^\\d+]*([+\\d\\s]+)[^\\d]*$", "$1").replaceAll("\\s+", " ").replaceAll("\\+\\s*", "+");
I tested this code and it works.

You can simplify it as:
num.replaceAll("[^\\d+\\s]", "") // [^\\d|+|\\s] => [^\\d+\\s]
.replaceAll("\\s{2,}", " ") // \\s\\s+ => \\s{2,}
.replaceAll("\\+\\s", "+") // \\+ => +
.trim()

Related

Regex to remove all whitespace except around keywords and between quotes

I want to:
remove all whitespaces unless it's right before or after (0-1 space before and 0-1 after) the predefined keywords (for example: and, or, if then we leave the spaces in " and " or " and" or "and " unchanged)
ignore everything between quotes
I've tried many patterns. The closest I've come up with is pretty close, but it still removes the space after keywords, which I'm trying to avoid.
regex:
\s(?!and|or|if)(?=(?:[^"]*"[^"]*")*[^"]*$)
Test String:
if (ans(this) >= ans({1,2}) and (cond({3,4}) or ans(this) <= ans({5,6})), 7, 8) and {111} > {222} or ans(this) = "hello my friend and or " and(cond({1,2}) $1 123
Ideal result:
if (ans(this)>=ans({1,2}) and (cond({3,4}) or ans(this)<=ans({5,6})),7,8) and {111}>{222} or ans(this)="hello my friend and or " and(cond({1,2})$1123
I then can use str = str.replaceAll in java to remove those whitespaces. I don't mind doing multiple steps to get to the result, but I am not familiar with regex so kinda stuck.
any help would be appreciated!
Note: I edited the result. Sorry about that. For the space around keywords: shrunk to 1 if there are spaces. Either leave it or add 1 space if it's 0 (I just don't want "or ans" becomes "orans", but "and(cond" becomes "and (cond)" is fine (shrink to 1 space before and 1 space after if exists). Ignore everything between quotes.
You make an intelligent use of capturing groups. The general idea here would be
match_this|or_this|or_even_this|(but_capture_this)
In terms of a regular expression this could be
(?:(?:\s+(?:and|or|if)\s+)|"[^"]+")|(\s+)
You'd then need to replace the match only if the first capturing group is not empty.
See a demo on regex101.com (with (*SKIP*)(*FAIL) which serves the same purpose).
You may use
String example = " if (ans(this) >= ans({1,2}) and (cond({3,4}) or ans(this) <= ans({5,6})), 7, 8) and {111} > {222} or ans(this) = \"hello my friend and or \" and(cond({1,2}) $1 123 ";
String rx = "\\s*\\b(and|or|if)\\b\\s*|(\"[^\"]*\")|(\\s+)";
Matcher m = Pattern.compile(rx).matcher(example);
example = m.replaceAll(r -> r.group(3) != null ? "" : r.group(2) != null ? r.group(2) : " " + r.group(1) + " ").trim();
System.out.println( example );
See the Java demo.
The pattern matches
\s*\b(and|or|if)\b\s* - 0+ whitespaces, word boundary, Group 1: and, or, if, word boundary and then 0+ whitespaces
| - or
(\"[^\"]*\") - Group 2: ", any 0+ chars other than " and then a "
| - or
(\s+) - Group 3: 1+ whitespaces.
If Group 3 matches, they are removed, if Group 2 matches, it is put back into the result and if Group 1 matches, it is wrapped with spaces and pasted back. The whole result is .trim()ed.

How to remove spaces from string only if it occurs once between two words but not if it occurs thrice?

I am a beginner working on a diff and regenerate algorithm but for Strings. I store the patch in a file. To regenerate the new string from old I use that file. Although the code works, I face a problem when using space.
I use replaceAll(" ", ""); for removing spaces. This is fine when the string is [char][space][char], but creates problem when it is like [space][space][space]. Here, I want that the space be retained(only one).
I thought of doing replaceAll(" ", " ");. But this would leave spaces in type [char][space][char]. I am using scanner to scan through the string.
Is there a way to achieve this?
Input Output
c => c
cc => cc
c c => cc
c c => This is not possible. Since there will be padding of one space for each character
c c => c c
We can also split the string on where there are more than one white space, then join the resulting array by into a string using the Stream and Collector API.
Also we would replace the single spaces by using replaceAll() in a Stream#map operation:
String test = " this is a test of space in string ";
//using the pattern \\s{n,} for splitting at multi spaces
String[] arr = test.split("\\s{2,}");
String s = Arrays.stream(arr)
.map(str -> str.replaceAll(" ", ""))
.collect(Collectors.joining(" "));
System.out.println(s);
Output:
this isatestof spaceinstring
You could use lookarounds to do your replacement:
String newText = text
.replaceAll("(?<! ) (?! )", "")
.replaceAll(" +", " ");
The first replaceAll removes any space not surrounded by spaces; the second one replaces the remaining sequences of spaces by a single one.
Ideone example. Sequences of two or more spaces become a single space, and single spaces are removed.
Lookarounds
A lookaround in the context of regular expressions is a collective term for lookbehinds and lookaheads. These are so-called zero-width assertions, that means they match a certain pattern, but do not actually consume characters. There are positive and negative lookarounds.
A short example: the pattern Ira(?!q) matches the substring Ira, but only if it's not followed by a q. So if the input string is Iraq, it won't match, but if the input string is Iran, then the match is Ira.
More info:
https://www.regular-expressions.info/lookaround.html
If you want to replace any group of space by one you could use:
value.replaceAll("\\s+", " ")
I had to use two replacements:
String e = "a b c";
e = e.replaceAll("([A-Z|a-z])\\s([A-Z|a-z])", "$1$2");
e = e.replaceAll(" "," ");
System.out.println(e);
Which prints
ab c
The first one replaces any letter-space-letter combo with just the two letters, and then the second replaces any triple-space with a single space.
The first replacement is using backreferences. $1 refers to the part inside the first set of parenthesis that matches the first letter, and $2 refers to the part inside the second set of parenthesis.
If you have leading/trailing spaces on the input, you can call trim() before doing the replacements.
e = e.trim()

Replacing certain combination of characters

I'm trying to remove the first bad characters (CAP letter + dot + Space) of this.
A. Shipping Length of Unit
C. OVERALL HEIGHT
Overall Weigth
X. Max Cutting Height
I tried something like that, but it doesn't work:
string.replaceAll("[A-Z]+". ", "");
The result should look like this:
Shipping Length of Unit
OVERALL HEIGHT
Overall Weigth
Max Cutting Height
This should work:
string.replaceAll("^[A-Z]\\. ", "")
Examples
"A. Shipping Length of Unit".replaceAll("^[A-Z]\\. ", "")
// => "Shipping Length of Unit"
"Overall Weigth".replaceAll("^[A-Z]\\. ", "")
// => "Overall Weigth"
input.replaceAll("[A-Z]\\.\\s", "");
[A-Z] matches an upper case character from A to Z
\. matches the dot character
\s matches any white space character
However, this will replace every character sequence that matches the pattern.
For matching a sequence at the beginning you should use
input.replaceAll("^[A-Z]\\.\\s", "");
Without looking your code it is hard to tell the problem. but from my experience this is the common problem which generally we make in our initial days:
String string = "A. Test String";
string.replaceAll("^[A-Z]\\. ", "");
System.out.println(string);
String is an immutable class in Java. what it means once you have create a object it can not be changed. so here when we do replaceAll in existing String it simply create a new String Object. that you need to assign to a new variable or overwrite existing value something like below :
String string = "A. Test String";
string = string.replaceAll("^[A-Z]\\. ", "");
System.out.println(string);
Try this :
myString.replaceAll("([A-Z]\\.\\s)","")
[A-Z] : match a single character in the range between A and Z.
\. : match the dot character.
\s : match the space character.

Java Regex - alphanumeric, allowing leading whitespace but not blank string

I've been trying to make a java regex that allows only alphanumeric characters, which can have white spaces, BUT the whole string cannot be blank...
Few examples..
" hello world123" //fine
"hello123world" //fine
"hello123world " //fine
" " //not allowed
So far I've gotten
^[a-zA-Z0-9][a-zA-Z0-9\s]*$
though this does not allow any leading whitespace and so any string with x number leading whitespace is not being matched.
Any ideas what I could add to the expression to allow leading whitespace?
How about just ^\s*[\da-zA-Z][\da-zA-Z\s]*$. 0 or more spaces at start, follow by at least 1 digit or letter followed by digits/letters/spaces.
Note: I did not use \w because \w includes "_", which is not alphanumeric.
Edit: Just tested all your cases on regexpal, and all worked as expected. This regex seems like the simplest one.
Just use a look ahead to assert that there's at least one non-blank:
(?=.*[^ ])[a-zA-Z0-9 ]+
This may be used as-is with String.matches():
if (input.matches("(?=.*[^ ])[a-zA-Z0-9 ]+")) {
// input is OK
}
You can test it using look-ahead mechanism ^(?=\\s*[a-zA-Z0-9])[a-zA-Z0-9\s]*$
^(?=\\s*[a-zA-Z0-9]) will make regex to check if at start of the string it contains zero of more spaces \\s* and then character from class [a-zA-Z0-9].
Demo:
String[] data = {
" hello world123", //fine
"hello123world", //fine
"hello123world ", //fine
" " //not allowed
};
for(String s:data){
System.out.println(s.matches("(?=.*\\S)[a-zA-Z0-9\\s]*"));
}
output
true
true
true
false

How to add spaces between numbers in string with word and integer?

Having string like this
"APM35 2FAST4YOU -5ABBA STEVE0.5&Tom"
and using regular expression Im not getting result as I want to. How can I add space before and after of each integer?
Code:
String s = "APM35 2FAST4YOU -5ABBA STEVE0.5&Tom";
s = s.replaceAll("(\\d)([A-Za-z])", "\\1 \\2");
System.out.println(s);
I'm getting such result:
APM35 1 2AST1 2OU -1 2BBA STEVE0.5&Tom
and I'd like get this string as result:
APM 35 2 FAST 4 YOU -5 ABBA STEVE 0.5 &Tom
You could do it in two steps:
String s = "APM35 2FAST4YOU -5ABBA STEVE0.5&Tom";
//add a space after the numbers
String step1 = s.replaceAll("(-?\\d\\.?\\d*)([^\\d\\s])", "$1 $2");
//add a space before the numbers
String step2 = step1.replaceAll("([^0-9\\-\\s])(-?\\d\\.?\\d*)", "$1 $2");
Try this:
s.replaceAll("([^\\d-]?)(-?[\\d\\.]+)([^\\d]?)", "$1 $2 $3").replaceAll(" +", " ");
First regexp can generate some extra spaces, they are removed by second one.
You could use the expression "(-?[0-9]+(\.[0-9]+)?)"
(0.5 is no integer, if you only want integers (-?[0-9]+) should be enough)
and replace it with " \1 " or " $1 " (Dont know which is the right one for Java) (Spaces before and after)
Sorry I wrote too fast, but I might as well ask: are you sure Java's regex API is able to identify a group (\1 and \2)?
Because it seems that parts of the string are replaced by actual 1s and 2s so this might not be the correct syntax.
(And it seems that you are only checking for numbers followed by text, not the other way arround.)

Categories