Regex Replacing issue understanding

Regex Replacing issue understanding - java

I'm trying to program a replacement logic for invalid phone numbers, which I provide with a Map
I read through a few Regex expressions threads, but I don't know if this actually is possible.
Example:
Input phone number: +410712345678
regex I'm trying to use:
"^\\+(?:[0-9] ?){6,14}[0-9]$"
number after regex and filtering should be: +41712345678. So actually removing the first Instance of 0.
Second example:
input phone number: +41(071)2345678
regex I'm trying to use:
"^\\+(?:[0-9] ?)\\({0,3}\\){3,11}[0-9]$"
number after regex and filtering should be: +41712345678. So actually removing the First Instance of 0 and also the braces.
I'm trying to user some kind of pattern to automatically remove those invalid pieces from those phone numbers. The numbers need to be formatted that way to work with my VOIP application.
Is there any way to create a filter pattern like that with regex?

Seems like you should only apply that rule for Switzerland phone number, i.e. for +41 numbers, because simply removing the first 0 from any international number is wrong.
So, ph = ph.replaceFirst("^(\\+41)\\(?0?([0-9]{2})\\)?", "$1$2").
See regex101 for how it works.

Thank you for your answer.
I applied the Regex to my TestImport with the following code:
//...
log.debug("Applying Regex :" + SearchString + " with Replace: " + ReplaceString);
log.debug("Applying Regex for Number:" + Person.get(EPerson.Rufnummer));
Person.put(EPerson.Rufnummer, Person.get(EPerson.Rufnummer).replaceFirst(SearchString, ReplaceString));
log.debug("New Number is:" +Person.get(EPerson.Rufnummer));
log.debug("Applying Regex for Number:" + Person.get(EPerson.RufnummerMobil));
Person.put(EPerson.RufnummerMobil, Person.get(EPerson.RufnummerMobil).replaceFirst(SearchString, ReplaceString));
log.debug("New Number is:" +Person.get(EPerson.RufnummerMobil));
//...
DEBUG [AddressbookFactory] Applying Numberfilter to: {Vorname=Testinator, Nachname=Test, Rufnummer=+410717271818, RufnummerMobil=, RufnummerPrivat=+41(071)7271818, Fax=, Strasse=, PLZ=, Stadt=, Bundesland=, Email=, Firma=, URL=}
DEBUG [AddressbookFactory] Regex Detected
DEBUG [AddressbookFactory] Applying Regex :^(+41)(?0?([0-9]{2}))? with Replace: $1$2
DEBUG [AddressbookFactory] Applying Regex for Number:+410717271818
DEBUG [AddressbookFactory] New Number is: +41717271818
DEBUG [AddressbookFactory] Applying Regex for Number:+41(071)7271818
DEBUG [AddressbookFactory] New Number is: +41717271818
...
And it worked!
Thank you so much for your Quick Response!
I marked your answer as useful, but trough my "newbie" Reputation it does not indicate it.
This Question is resolved.
Sincerly Fabian95qw

Related

Any suggestions how to create Regex for this in java for String.replaceAll()?

My String is like this.
{\\\"692950841314120\\\":[{\\\"type\\\":\\\"ads_management\\\",\\\"call_count\\\":3,\\\"total_cputime\\\":1,\\\"total_time\\\":5,\\\"estimated_time_to_regain_access\\\":0}]}
Since the key here is a variable value I am trying to replace this 692950841314120(or the values which I get from sever) with a constant like ID. My main goal is to parse this as POJO. I have tried using..
string.replaceAll("^[0-9]{15}$","ID")
but due to Slashes I think i am not able to get the desired value. Is there any better way to do this. I know I can do below Code but I don't want any ID123 if I added extra value and distort any other info in JSON.
string.replaceAll("[0-9]{15}","ID")

Strictly speaking, if you have a valid JSON string, you should parse it using something like GSON, rather than using regex. That being said, if you must use regex, you could try removing the starting and ending anchors:
string.replaceAll("[0-9]{15}", "ID")
Or maybe use double quotes instead:
string.replaceAll("\"[0-9]{15}\"", "ID")

It is safer to assume the value is inisde \" and \":.
You can then use
.replaceAll("(\\\\\")[0-9]{15}(\\\\\":)", "$1ID$2")
The regex is (\\")[0-9]{15}(\\":) and it means:
(\\") - match and capture \" substring into Group 1
[0-9]{15} - fifteen digits
(\\":) - Group 2: a \": substring.
The $1 and $2 are placeholders holding the Group 1 and 2 values.

You should use "A word boundary" \b.
Try this.
public static void main(String[] args) {
String input = "{\\\"692950841314120\\\":"
+ "[{\\\"type\\\":\\\"12345678901234567890\\\","
+ "\\\"call_count\\\":3,"
+ "\\\"total_cputime\\\":1,"
+ "\\\"total_time\\\":5,"
+ "\\\"estimated_time_to_regain_access\\\":0}]}";
System.out.println(input.replaceAll("\\b[0-9]{15}\\b", "ID"));
}
output:
{\"ID\":[{\"type\":\"12345678901234567890\",\"call_count\":3,\"total_cputime\":1,\"total_time\":5,\"estimated_time_to_regain_access\":0}]}

Set RegEx in Java to be non-greedy by default

I have Strings like the following:
"parameter: param0=true, param1=401230 param2=asset client: desktop"
"parameter: param0=false, param1=15230 user: user213 client: desktop"
"parameter: param0=false, param1=51235 param2=asset result: ERROR"
The pattern is parameter:, then the param's, and after the params either client: and/or user: and/or result.
I want to match the stuff between parameter: and the first occurrence of either client:, user: or result:
So for the 2nd String it should match param0=false, param1=15230.
My regex is:
parameter:\s+(.*)\s+(result|client|user):
But now if I match the 2nd String it captures param0=false, param1=15230 user: user213 (looks like regex is matching greedy)
How to fix this? parameter:\s+(.*)\s+(result|client|user)+?: won't fix it
With this regex tester I can add the modifier U to the regex to make regex lazy by default, is this possible in Java too?

Try putting the ? character inside the first captured group (the subpattern you intend to extract):
parameter:\\s+(.*?)\\s+(result|client|user):

No. There is no ungreedy modifier in Java. You have to use ? behind modifiers to make the quantifiers as lazy capture.
This means you should denote all quantifiers with a ?, see the following pattern:
"parameter:\\s+?(.*?)\\s+?(result|client|user):"
Specified by:http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Java Regular Expressions - Matching the First Occurrence of a Pattern

I'm matching URLs against a regular expression, testing if they reflect a "shutdown" command.
Here's a URL that performs a shutdown:
/exec?debug=true&command=shutdown&f=0
Here's another, legitimate but confusing URL that performs shutdown:
/exec?commando=yes&zcommand=34&command=shutdown&p
Now, I must ensure there's only one command=... parameter and it is command=shutdown. Alternatively, I can live with ensuring the first command=... parameter is command=shutdown.
Here's my test for the requested regular expression:
/exec?version=0.4&command=shutdown&out=JSON&zcommand=1
Should match
/exec?version=0.4&command=startup&out=JSON&zcommand=1&commando=shutdown
Should fail to match
/exec?command=shutdown&out=JSON
Should match
/exec?version=0.4&command=admin&out=JSON&zcommand=1&command=shutdown
Should fail to match
Here's my baseline - a regular expression that passes the above tests - all but the last one:
^/exec?(.*\&)*command=shutdown(\&.*)*$
The problem is with the occurrence of more than one command=..., where the first one is not shutdown.
I tried using lookbehind:
^/exec?(.*\&)*(?<!(\&|\?)command=.*)command=shutdown(\&.*)*$
But I'm getting:
Look-behind group does not have an obvious maximum length near index 31
I even tried atomic grouping. To no avail. I can't make the following expression NOT match:
/exec?version=0.4&command=admin&out=JSON&zcommand=1&command=shutdown
Can anyone help with a regular expression that passes all the tests?
Clarifications
I see I owe you some context.
My task is to configure a Filter that guards the entrance of all our system’s servlets, and verifies there’s an open HTTP session (in other words: that a successful Login has occurred). The filter also allows configuring which URLs do not require login.
Some exceptions are easy: /login does not need login. Calls to localhost do not need login.
But sometimes it gets complicated. Like the shutdown command that cannot require login while other commands can and should (the strange reason for that is out of the scope of my question).
Since it’s a security matter, I can’t allow users to merely append &command=shutdown to a URL and bypass the filter.
So I really need a regular expression, or otherwise I’ll need to redefine the configuration specs.

You would need to do it in multiple steps:
(1) Find match of ^(?=\/exec\?).*?(?<=[?&])command=([^&]+)
(2) Check if match is shutdown

Ok. I thank you all for your great answers! I tried some of the suggestions, struggled with others, and all in all I have to agree that even if the right regex exists, it looks terrible, non maintainable, and can serve well as a nasty university exercise, but not in a real system configuration.
I also realize that since a Filter is involved here, and the Filter already parses its own URI, it is absolutely ridiculous to glue back all the URI parts into a string and match it against a regular expression. What was I thinking??
I'll therefore redesign the Filter and its configuration.
Thanks a lot, people! I appreciate the help :)
Noam Rotem.
P.S. - why was I getting a userXXXX nick? Very strange...

This tested (and fully commented) regex solution meets all your requirements:
import java.util.regex.*;
public class TEST {
public static void main(String[] args) {
Pattern re = Pattern.compile(
" # Match URI having command=shutdown query variable value. \n" +
" ^ # Anchor to start of string. \n" +
" (?:[^:/?\\#\\s]+:)? # URI scheme (Optional). \n" +
" (?://[^/?\\#\\s]*)? # URI authority (Optional). \n" +
" [^?\\#\\s]* # URI path. \n" +
" \\? # Literal start of URI query. \n" +
" # Match var=value pairs preceding 'command=xxx'. \n" +
" (?: # Zero or more 'var=values' \n" +
" (?!command=) # only if not-'command=xxx'. \n" +
" [^&\\#\\s]* # Next var=value. \n" +
" & # var=value separator. \n" +
" )* # Zero or more 'var=values' \n" +
" command=shutdown # variable and value to match. \n" +
" # Match var=value pairs following 'command=shutdown'. \n" +
" (?: # Zero or more 'var=values' \n" +
" & # var=value separator. \n" +
" (?!command=) # only if not-'command=xxx'. \n" +
" [^&\\#\\s]* # Next var=value. \n" +
" )* # Zero or more 'var=values' \n" +
" (?:\\#\\S*)? # URI fragment (Optional). \n" +
" $ # Anchor to end of string.",
Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE | Pattern.COMMENTS);
String s = "/exec?version=0.4&command=shutdown&out=JSON&zcommand=1";
// Should match
// String s = "/exec?version=0.4&command=startup&out=JSON&zcommand=1&commando=shutdown";
// Should fail to match
// String s = "/exec?command=shutdown&out=JSON";
// Should match
// String s = "/exec?version=0.4&command=admin&out=JSON&zcommand=1&command=shutdown";
// Should fail to match";
Matcher m = re.matcher(s);
if (m.find()) {
// Successful match
System.out.print("Match found.\n");
} else {
// Match attempt failed
System.out.print("No match found.\n");
}
}
}
The above regex matches any RFC3986 valid URI having any scheme, authority, path, query or fragment components, but it must have one (and only one) query "command" variable whose value must be exactly, but case insensitively: "shutdown".
A carefully crafted complex regex is perfectly fine (and maintainable) to use when written with proper indentation and commented steps (like shown above). (For more information on using regex to validate a URI, see my article: Regular Expression URI Validation)

If you can live with just accepting the first match, you could just use '\\Wcommand=([^&]+) and fetch the first group.
Otherwise, you could just call Matcher.find twice to test for subsequent matches, and eventually use the first match, why do you want to do this with a single complex regex?

I am not a Java coder, but try this one (works in Perl) >>
^(?=\/exec\?)(?:[^&]+(?<![?&]command)=[^&]+&)*(?<=[?&])command=shutdown(?:&|$)

To match the first occurrence of command=shutdown use this:
Pattern.compile("^((?!command=).)+command=shutdown.*$");
The results will look like this:
"/exec?version=0.4&command=shutdown&out=JSON&zcommand=1" => false
"/exec?command=shutdown&out=JSON" => true
"/exec?version=0.4&command=startup&out=JSON&zcommand=1&commando=shutdown" => false
"/exec?commando=yes&zcommand=34&command=shutdown&p" => false
If you want to match strings that ONLY contain one 'command=' use this:
Pattern.compile("^((?!command=).)+command=shutdown((?!command=).)+$");
Please note that using "not" qualifiers in regular expressions is not something they are intended for and performance might not be the best.

If this can be done with a single regular expression, and it may well could be; it will be so complex as to be un-readable, and thus un-maintainable as the intent of the logic will be lost. Even if it is "documented" it will still be much less obvious to someone who just knows Java.
A much better approach would be to use the URI object parse the entire thing, domain and all and pull off the query parameters and then write a simple loop that walks through them and decides based on your business logic what is a shutdown and what isn't. Then it will be simple, self-documenting and probably more efficient ( not that that should be a concern ).

Try this:
Pattern p = Pattern.compile(
"^/exec\\?(?:(?:(?!\\1)command=shutdown()|(?!command=)\\w+(?:=[^&]+)?)(?:&|$))+$\\1");
Or a little more readably:
^/exec\?
(?:
(?:
(?!\1)command=shutdown()
|
(?!command=)\w+(?:=[^&]+)?
)
(?:&|$)
)+$
\1
The main body of the regex is an alternation that matches either a shutdown command or a parameter whose name is not command. If it does match a shutdown command, the empty group in that branch "captures" an empty string. It doesn't need to consume anything, because we're only using it as a checkbox, confirming en passant that one of the parameters was a shutdown command.
The negative lookahead - (?!\1) - prevents it from matching two or more shutdown commands. I don't know if that's really necessary, but it's a good opportunity to demonstrate (1) how to negate a "back-assertion", and (2) that a backreference can appear before the group it refers to in certain circumstances (what's known as a forward reference).
When the whole URL has been consumed, the backreference (\1) acts like a zero-width assertion. If one of the parameters was command=shutdown, the backreference will succeed. Otherwise it will fail even though it's only trying to match an empty string, because the group it refers to didn't participate in the match.
But I have to concur with the other responders: when your regexes get this complicated, you should be thinking seriously about switching to a different approach.
EDIT: It works for me. Here's the demo.

java regexp for reluctant matching

need to find an expression for the following problem:
String given = "{ \"questionID\" :\"4\", \"question\":\"What is your favourite hobby?\",\"answer\" :\"answer 4\"},{ \"questionID\" :\"5\", \"question\" :\"What was the name of the first company you worked at?\",\"answer\" :\"answer 5\"}";
What I want to get: "{ \"questionID\" :\"4\", \"question\":\"What is your favourite hobby?\",\"answer\" :\"*******\"},{ \"questionID\" :\"5\", \"question\" :\"What was the name of the first company you worked at?\",\"answer\" :\"******\"}";
What I am trying:
String regex = "(.*answer\"\\s:\"){1}(.*)(\"[\\s}]?)";
String rep = "$1*****$3";
System.out.println(test.replaceAll(regex, rep));
What I am getting:
"{ \"questionID\" :\"4\", \"question\":\"What is your favourite hobby?\",\"answer\" :\"answer 4\"},{ \"questionID\" :\"5\", \"question\" :\"What was the name of the first company you worked at?\",\"answer\" :\"******\"}";
Because of the greedy behaviour, the first group catches both "answer" parts, whereas I want it to stop after finding enough, perform replacement, and then keep looking further.

The pattern
("answer"\s*:\s*")(.*?)(")
Seems to do what you want. Here's the escaped version for Java:
(\"answer\"\\s*:\\s*\")(.*?)(\")
The key here is to use (.*?) to match the answer and not (.*). The latter matches as many characters as possible, the former will stop as soon as possible.
The above pattern won't work if there are double quotes in the answer. Here's a more complex version that will allow them:
("answer"\s*:\s*")((.*?)[^\\])?(")
You'll have to use $4 instead of $3 in the replacement pattern.

The following regex works for me :
regex = "(?<=answer\"\\s:\")(answer.*?)(?=\"})";
rep = "*****";
replaceALL(regex,rep);
The \ and " might be incorrectly escaped since I tested without java.
http://regexr.com?303mm

Why isnt this regexp backtrack working

I have tried to use the following kind of regex
([_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4}))|(FakeEmail:)|(Email:)|(\1\2)|(\1\3)
(pretend the \1 is the email regex group, and \2 is FakeEmail: and \3 is Email: because I didnt count the parens to figure out the real grouping)
What I am trying to do is say "Find the word email: and if you find it, pick up any email address following the word."
That email regex I got off some other question on stack overflow.
my test string could be something like
"This guy is spamming me from
FakeEmail: fakeemailAdress#someplace.com
but here is is real info:
Email: testemail#someplace.com"
Any tips? Thanks

I'm either quite confused as to what you're trying to do, or your Regex is just very wrong. In particular:
Why do you have Email: at the end, instead of the beginning - to match your example?
Why do you have both your Email: and your \1\2 separated by pipe characters, almost as if they're in fields? This is compiling the pattern as ORs. (Find the email pattern, OR the word "Email:", OR whatever \1\2 will end up meaning as it is out of context here.)
If all you're trying to do is match something like Email: testemail#someplace.com, you don't need any backtracking.
Something like this is probably all you need:
Email:\s+([_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4}))
Also, I'd strongly advise against trying to validate an email address so strictly. You may want to read http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx . I'd simplify the pattern to something more along the lines of:
Email:\s+(\S+)*#(\S+\.\S+)

Try:
(Fake)?Email: *([_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4}))
And captured group \1 will be empty if it's a real email and contain "Fake" if it's a fake email, while \2 will be the email itself.
Do you actually want to capture it if it's FakeEmail though? If you want to capture all Email but ignore all FakeEmail then do:
\bEmail: *([_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4}))
The word boundary prevents the Email bit from matching "FakeEmail".
UPDATE: note your regex only matches lowercase since it's got a-z in the [] everywhere but not [A-Z]. Make sure you feed your regex into the java match function with the ignore case switch. i.e.:
Pattern.compile("(Fake)?Email: .....", Pattern.CASE_INSENSITIVE)

You can use following code to match all type of email address:
String text = "This guy is spamming me from\n" +
"FakeEmail: fakeemail+Adress#someplace.com\n" +
"fakeEmail: \n" +
"fakeemail#someplace.com" +
"but here is is real info:\n" +
"Email: test.email+info#someplace.com\n";
Matcher m = Pattern.compile("(?i)(?s)Email:\\s*([_a-z\\d\\+-]+(\\.[_a-z\\d\\+-]+)*#[a-z\\d-]+(\\.[a-z\\d-]+)*(\\.[a-z]{2,4}))").matcher(text);
while(m.find())
System.out.printf("Email is [%s]%n", m.group(1));
This will match email text:
appearing on different lines by using (?s)
ignoring case comparison by using (?i)
Email address with a period . in it
Email address with a plus sign + in it
OUTPUT: From above code is
Email is [fakeemail+Adress#someplace.com]
Email is [fakeemail#someplace.comb]
Email is [test.email+info#someplace.com]

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex Replacing issue understanding - java

Seems like you should only apply that rule for Switzerland phone number, i.e. for +41 numbers, because simply removing the first 0 from any international number is wrong. So, ph = ph.replaceFirst("^(\\+41)\\(?0?([0-9]{2})\\)?", "$1$2"). See regex101 for how it works.

Related

Any suggestions how to create Regex for this in java for String.replaceAll()?

Set RegEx in Java to be non-greedy by default

Java Regular Expressions - Matching the First Occurrence of a Pattern

java regexp for reluctant matching

Why isnt this regexp backtrack working

Categories

Resources