Java regex replaceAll not working

Java regex replaceAll not working - java

regex not working as wanted
Code example:
widgetCSS = "#widgetpuffimg{width:100%;position:relative;display:block;background:url(/images/small-banner/Dog-Activity-BXP135285s.jpg) no-repeat 50% 0; height:220px;}
someothertext #widgetpuffimg{width:100%;position:relative;display:block;}"
newWidgetCSS = widgetCSS.replaceAll("#widgetpuffimg\\{(.*?)\\}","");
I want all occurrences in the string that match the pattern "#widgetpuffimg{anycharacters}" to be replaced by nothing
Resulting in newWidgetCSS = someothertext

Update: After edit of question
I think the regex is working properly according to your requirements if you are escaping your { as mentioned below. The exact output I am getting is " someothertext ".
It has to be newWidgetCSS = widgetCSS.replaceAll("#widgetpuffimg\\{(.*?)\\}","");
You need to use \\{ instead of \{ for escaping { properly.

This should work :
String resultString = subjectString.replaceAll("(?s)\\s*#widgetpuffimg\\{.*?\\}\\s*", "");
Explanation :
"\\s" + // Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
"*" + // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"#widgetpuffimg" + // Match the characters “#widgetpuffimg” literally
"\\{" + // Match the character “{” literally
"." + // Match any single character
"*?" + // Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
"}" + // Match the character “}” literally
"\\s" + // Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
"*" // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
As an added bonus it trims the whitespace.

Related

Regex XML tags having angle brackets inside

I need a regex which will give me one XML tag e.g. <ABC/> or <ABC></ABC>
So, here if I use <(.)+?>, it will give me <ABC> or <ABC> or </ABC>. This is fine.
Now, the problem:
I have one XML as
<VALUE ABC="10000" PQR="12422700" ADJ="" PROD_TYPE="COCOG EFI LWL P&C >1Y-5Y" SRC="BASE" DATA="data" ACTION="INSERT" ID="100000" GRC_PROD=""/>
Here, if you see, PROD_TYPE="COCOG EFI LWL P&C >1Y-5Y" has a greater than symbol in the value of an attribute.
So, the regex returns me
<VALUE ABC="10000" PQR="12422700" ADJ="" PROD_TYPE="COCOG EFI LWL P&C >
instead of complete
<VALUE ABC="10000" PQR="12422700" ADJ="" PROD_TYPE="COCOG EFI LWL P&C >1Y-5Y" SRC="BASE" DATA="data" ACTION="INSERT" ID="100000" GRC_PROD=""/>
I need some regex which will not consider the less than and greater than symbols which are part of value i.e. enclosed in double quotes.

You may try this:
(?i)<[a-z][\w:-]+(?: [a-z][\w:-]+="[^"]*")*/?>
And the explanation goes here below:
(?i) # Match the remainder of the regex with the options: case insensitive (i)
< # Match the character “<” literally
[a-z] # Match a single character in the range between “a” and “z”
[\\w:-] # Match a single character present in the list below
# A word character (letters, digits, and underscores)
# The character “:”
# The character “-”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?: # Match the regular expression below
\\ # Match the character “ ” literally
[a-z] # Match a single character in the range between “a” and “z”
[\\w:-] # Match a single character present in the list below
# A word character (letters, digits, and underscores)
# The character “:”
# The character “-”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
=\" # Match the characters “=\"” literally
[^\"] # Match any character that is NOT a “\"”
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\" # Match the character “\"” literally
)* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
/ # Match the character “/” literally
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
> # Match the character “>” literally
And if you like like to include open, close or self-closed tags then try below RegEx:
(?i)(?:<([a-z][\w:-]+)(?: [a-z][\w:-]+="[^"]*")*>.+?</\1>|<([a-z][\w:-]+)(?: [a-z][\w:-]+="[^"]*")*/>)
A java code frag implementing the same:
try {
boolean foundMatch = subjectString.matches("(?i)(?:<([a-z][\\w:-]+)(?: [a-z][\\w:-]+=\"[^\"]*\")*>.+?</\\1>|<([a-z][\\w:-]+)(?: [a-z][\\w:-]+=\"[^\"]*\")*/>)");
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
Hope this helps...

To expand on the point of G_H’s link: Don't use regex to parse XML. Use XPath to return a Node, and pass that Node to an identity Transformer:
Node valueElement = (Node)
XPathFactory.newInstance().newXPath().evaluate("//VALUE",
new InputSource(new StringReader(xmlDocument)),
XPathConstants.NODE);
StringWriter result = new StringWriter();
TransformerFactory.newInstance().newTransformer().transform(
new DOMSource(valueElement), new StreamResult(result));
String valueElementMarkup = result.toString();

Also try this:
<.*?(".*?".*?)*?>
It grabs everything between < and > only if even number of " double quotes are present. Pairs of double quotes mean that stuff is enclosed in. Otherwise it skips > symbol and keep searching further for the next one > (which should be happen after closing " quote)

How to filtrate a long string (dynamic) with regex?

I have stored the response from a web-application in a string. The string contains several URL:s, and it is dynamic. Could be anything from 10-1000 URL:s.
I work with performance engineering, but this time I have to code a plugin in java, and I am far from an expert in programming.
The problem I have is that in my response-string, I have a lot of gibberish that I don't need, and I don't know how to filtrate it. In my print/request I only want to send the URLS.
I've come this far:
responseData = "http://xxxx-f.akamaihd.net/i/world/open/20150426/1370235-005A/EPISOD-65354-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/segment1_4_av.ts?null=" +
"#EXTINF:10.000, " +
"http://xxxxx-f.akamaihd.net/i/world/open/20150426/1370235-005A/EPISOD-65365-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/segment2_4_av.ts?null=" +
"#EXTINF:fgsgsmoregiberish, " +
"http://xxxx-f.akamaihd.net/i/world/open/20150426/1370235-005A/EPISOD-6353-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/segment2_4_av.ts?null=";
pattern = "^(http://.*\\.ts)";
pr = Pattern.compile(pattern);
math = pr.matcher(responseData);
if (math.find()) {
System.out.println(math.group());
// in this print, I get everything from the response. I only want the URLS (dynamic. could be different names, but they all start with http and end with .ts).
}
else {
System.out.println("No Math");
}

Depending of how looks your URLs, you can use this naive pattern that works for your examples and stops before the ? (written in java style):
\\bhttps?://[^?\\s]+
to ensure there is .ts at the end, you can change it to:
\\bhttps?://[^?\\s]+\\.ts
or
\\bhttps?://[^?\\s]+\\.ts(?=[\\s?]|\\z)
to check that the end of the path is reached.
Note that these patterns don't deal with URLs that contain spaces between double quotes.

Just make you regex lazy with .*? instead of greedy .*, i.e.:
pr = Pattern.compile("(https?.*?\\.ts)");
Regex demo:
https://regex101.com/r/nQ5pA7/1
Regex Explanantion:
(https?.*?\.ts)
Match the regex below and capture its match into backreference number 1 «(https?.*?\.ts)»
Match the character string “http” literally (case sensitive) «http»
Match the character “s” literally (case sensitive) «s?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match any single character that is NOT a line break character (line feed, carriage return, next line, line separator, paragraph separator) «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “.” literally «\.»
Match the character string “ts” literally (case sensitive) «ts»

Use the following regex pattern:
(((http|ftp|https):\/{2})+(([0-9a-z_-]+\.)+([a-z]{2,4})(:[0-9]+)?((\/([~0-9a-zA-Z\#\+\%#\.\/_-]+))?(\?[0-9a-zA-Z\+\%#\/&\[\];=_-]+)?)?))\b
Explanation:
contains http or https or ftp with // : ((http|ftp|https):\/{2})
now add '+' sign to add next part in the same string
URL name with one . : ([0-9a-z_-]+.)
domain name : ([a-z]{2,4})
any digit occurs no or one time (here ? denote non or one time) : (:[0-9]+)?
rest url occurs non or one time : '(/([~0-9a-zA-Z#+\%#./_-]+))?(\?[0-9a-zA-Z+\%#/&[];=_-]+)?)'

Replace all content within braces?

In the end I need a regex which basically converts me a phone number into a E164 conform number. As for now i got this:
result = s.replaceAll("[(*)|+| ]", "");
It replaces everything fine: the spaces, the "+"-sign and also the braces "()". But it does not match the content of its braces, so that e.g. the number +49 (0)11 111 11 11 will be replaced to 49111111111.
How can I get this to work?

You can do it, but what if there's more than just a zero between parentheses?
result = s.replaceAll("\\([^()]*\\)|[*+ ]+", "");
As a verbose regex:
result = s.replaceAll(
"(?x) # Allow comments in the regex. \n" +
"\\( # Either match a ( \n" +
"[^()]* # then any number of characters except parentheses \n" +
"\\) # then a ). \n" +
"| # Or \n" +
"[*+\\ ]+ # Match one or more asterisks, pluses or spaces", "");

[(*)|+| ]
is a character class, matching any single parenthesis, asterisk, bar, plus or space character. Get rid of the square brackets and use something like
s.replaceAll("\\(.*?\\)|\\D", "");
This will remove anything between (and including) parentheses, as well as anything else that is not a digit. Note that this will not handle nested parentheses very well - it will eat everything from an open parenthesis to the first closing one it finds, so would change (123(45)67) into 67 (the unbalanced close parenthesis being removed as it's a \D)

You might try this: "(\\(\\d+\\))|\\+|\\s". Removes the paren's and contents, plus sign, and space.

I think you are expecting a little too much magic from character classes. Firstly, in character classes, don't use |. It is just another character that will be matched by the character class. Simply list all the characters you want to include without any delimiters.
Secondly, a character class really just matches single characters. So (*) inside a character class can by definition do nothing more than remove (, or * (literally) or ). If you are 100% sure that your input will never have nested parentheses or unmatched parentheses or something, then you can do something like this:
"(?:\\([^)]\\)|\\D)+"

Password matching with regex

I'm using java.util.regex.Pattern to match passwords that meet the following criteria:
At least 7 characters
Must consist of only letters and digits
At least one letter and at least one digit
I have 1 & 2 covered, but I can't think of how to do 3.
1 & 2 - [\\w]{7,}
Any ideas?

You can use this. This basically uses lookahead for achieving the 3rd requirement.
(?=.*\d)(?=.*[a-zA-Z])\w{7,}
or the Java string
"(?=.*\\d)(?=.*[a-zA-Z])\\w{7,}"
Explanation
"(?=" + // Assert that the regex below can be matched, starting at this position (positive lookahead)
"." + // Match any single character
"*" + // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"\\d" + // Match a single digit 0..9
")" +
"(?=" + // Assert that the regex below can be matched, starting at this position (positive lookahead)
"." + // Match any single character
"*" + // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"[a-zA-Z]" + // Match a single character present in the list below
// A character in the range between “a” and “z”
// A character in the range between “A” and “Z”
")" +
"\\w" + // Match a single character that is a “word character” (letters, digits, and underscores)
"{7,}" // Between 7 and unlimited times, as many times as possible, giving back as needed (greedy)
Edit
If you want to include unicode letter support, then use this
(?=.*\d)(?=.*\pL)[\pL\d]{7,}

Doing this with only Regex will very easily become convoluted and very difficult to understand/read if you ever need to change the credentials for a password.
Instead iterate over the password in a loop and count the different types of characters and then do simple if-checks.
Such as (untested):
if (password.length() < 7) return false;
int countDigit = 0;
int countLetter = 0;
for (int i = 0; password.length(); i++) {
if (Character.isDigit(password.charAt(i)) {
countDigit++;
}
else if (Character.isLetter(password.charAt(i)) {
countLetter++;
}
}
if (countDigit == 0 || countLetter == 0) {
return false;
}
return true;

You won't need a character class for using \w, it is a character class by itself. However it also matches underscore which you didn't mention. So it might be better to use a custom character class.
To the "at least one" part, use look aheads:
/(?=.*\d)(?=.*[A-Za-z])[A-Za-z0-9]{7,}/
You may need to add some extra escapes to make it work with Java*.
* which unfortunately I can't help!

It's possible to do this in a single regexp, but I wouldn't as it'll be hard to maintain.
I would just do:
if (pass.matches("[a-zA-Z0-9]{7,}") &&
pass.matches("[a-zA-Z]") &&
pass.matches("\\d"))
{
// password is OK
}
It then becomes obvious how to apply additional constraints to the password - they just get added on with additional && ... clauses.
NB: I've deliberately used [a-z] rather than \w because I'm unsure what happens to \w if you use it in alternate locales where other characters might be considered "letters".

I would add another regex to cover the 3rd criteria (you don't have to nail them all in one regex, but may want to combine them). I would go with somthing like ^(?=.*\d)(?=.*[a-zA-Z])
taken from here-
http://www.mkyong.com/regular-expressions/10-java-regular-expression-examples-you-should-know/

Repeat pattern in RegEx

I've got an string parts which match to following pattern.
abcd|(|a|ab|abc)e(fghi|(|f|fg|fgh)jklmn)
But problem I have got is, my whole string is repeated combination of above like patterns. And my whole string must contain more than 14 sets of above pattern.
Can anyone one help me to improve my above RegEx to wanted format.
Thanks
Update
Input examples:
Matched string parts : abcd, abefgjkln, efjkln, ejkln
But whole string is : abcdabefgjklnefjklnejkln (Combination of above 4 parts)
There must be more than 15 parts in whole string. Above one have only 4 parts. So, it's wrong.

This will try to match your "parts" at least 15 times in a string.
boolean foundMatch = false;
try {
foundMatch = subjectString.matches("(?:(?:ab(?:cd|efgjkln))|(?:(?:ef?jkln))){15,}");
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
If there are at least 15 repetitions of any of the above parts foundMatch will be true, else it will remain false.
Breakdown :
"(?:" + // Match the regular expression below
"|" + // Match either the regular expression below (attempting the next alternative only if this one fails)
"(?:" + // Match the regular expression below
"ab" + // Match the characters “ab” literally
"(?:" + // Match the regular expression below
// Match either the regular expression below (attempting the next alternative only if this one fails)
"cd" + // Match the characters “cd” literally
"|" + // Or match regular expression number 2 below (the entire group fails if this one fails to match)
"efgjkln" + // Match the characters “efgjkln” literally
")" +
")" +
"|" + // Or match regular expression number 2 below (the entire group fails if this one fails to match)
"(?:" + // Match the regular expression below
"(?:" + // Match the regular expression below
"e" + // Match the character “e” literally
"f" + // Match the character “f” literally
"?" + // Between zero and one times, as many times as possible, giving back as needed (greedy)
"jkln" + // Match the characters “jkln” literally
")" +
")" +
"){15,}" // Between 15 and unlimited times, as many times as possible, giving back as needed (greedy)

What about this:
(?:a(?:b(?:c(?:d)?)?)?ef(?:g(?:h(?:i)?)?)?jklmn){15,}
Explanation: you create a non-capturing group (with (?: ... )), and say that this should be repeated >=15 times, hence the curly braces in the end.

First, it seems that your pattern can be simplified. Really pattern a is a subset of ab that is a subset of abc, so if pattern abc matches it means that a matches too. Think about this and change your pattern appropriately. Right now it probably not what you really want.
Second, to repeat something is puttern use {N}, i.e. abc{5} means "abc repeated five times". You can also use {3,}, {,5}, {3,5} that mean repeat>=3, repeat<=5, 3<=repeat<=5.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java regex replaceAll not working - java

Related

Regex XML tags having angle brackets inside

How to filtrate a long string (dynamic) with regex?

Replace all content within braces?

Password matching with regex

Repeat pattern in RegEx

Categories

Resources