replace string between xml elements with variable name spaces - java

Using Java, String and replaceAll I have to replace values of elements that may come with different name spaces:
from
<tns:p>to be replaced</tns:p>
<sss:p>to be replaced</sss:p>
to
<tns:p>replaced</tns:p>
<sss:p>replaced</sss:p>
Could you please, help to find regular expression for this replace?
P.S. The elements may appear more than once in a given string:
<tns:p>to be replaced</tns:p>
<tns:w>not to be replaced</tns:w>
<tns:p>to be replaced</tns:p>
I have a problem with variable name spaces in front of elements.
Without them I'd do like this:
str.replaceAll("(?<=<p>)(.*?)(?=</p)", "replacement")

The problem is that look-behinds can't have variable lengths, but if your input is well formed (that is tags are closed with matching tags), and you text to be replaced isn't a CDATA element that itself contains the closing tag (seems unlikely), this will work:
str = str.replaceAll("(?<=[:<]p>)[^<]*(?=</(\\w+:)?p>)", "replacement");
This regex makes the replacement whether or not there's a namespace.
Here's some test code:
String str = "<p>to be replaced</p><tns:p>to be replaced</tns:p><tns:w>not to be replaced</tns:w><tns:p>to be replaced</tns:p>";
str = str.replaceAll("(?<=[:<]p>)[^<]*(?=</(\\w+:)?p>)", "replacement");
System.out.println(str);
Output:
replacementreplacementnot to be replacedreplacement
If you input is not well formed and simple, ie the closing tag namespace may not be the same, you can do it by capturing the namespace, using a back-reference to assert it's the same in the closing tag, and putting it back in the replacement:
str = str.replaceAll("(<(\\w+:)?p>)[^<]*(?=</(\\2)p>)", "$1replacement");
The namespace is still optional, but now the namespace in the closing tag must match that of the opening tag.

Lookbehinds in java regex don't support repitive operators, So unfortunatly this is not possible with just one String#replaceAll(String, String)

Related

Escaping regex variable [duplicate]

Does Java have a built-in way to escape arbitrary text so that it can be included in a regular expression? For example, if my users enter "$5", I'd like to match that exactly rather than a "5" after the end of input.
Since Java 1.5, yes:
Pattern.quote("$5");
Difference between Pattern.quote and Matcher.quoteReplacement was not clear to me before I saw following example
s.replaceFirst(Pattern.quote("text to replace"),
Matcher.quoteReplacement("replacement text"));
It may be too late to respond, but you can also use Pattern.LITERAL, which would ignore all special characters while formatting:
Pattern.compile(textToFormat, Pattern.LITERAL);
I think what you're after is \Q$5\E. Also see Pattern.quote(s) introduced in Java5.
See Pattern javadoc for details.
First off, if
you use replaceAll()
you DON'T use Matcher.quoteReplacement()
the text to be substituted in includes a $1
it won't put a 1 at the end. It will look at the search regex for the first matching group and sub THAT in. That's what $1, $2 or $3 means in the replacement text: matching groups from the search pattern.
I frequently plug long strings of text into .properties files, then generate email subjects and bodies from those. Indeed, this appears to be the default way to do i18n in Spring Framework. I put XML tags, as placeholders, into the strings and I use replaceAll() to replace the XML tags with the values at runtime.
I ran into an issue where a user input a dollars-and-cents figure, with a dollar sign. replaceAll() choked on it, with the following showing up in a stracktrace:
java.lang.IndexOutOfBoundsException: No group 3
at java.util.regex.Matcher.start(Matcher.java:374)
at java.util.regex.Matcher.appendReplacement(Matcher.java:748)
at java.util.regex.Matcher.replaceAll(Matcher.java:823)
at java.lang.String.replaceAll(String.java:2201)
In this case, the user had entered "$3" somewhere in their input and replaceAll() went looking in the search regex for the third matching group, didn't find one, and puked.
Given:
// "msg" is a string from a .properties file, containing "<userInput />" among other tags
// "userInput" is a String containing the user's input
replacing
msg = msg.replaceAll("<userInput \\/>", userInput);
with
msg = msg.replaceAll("<userInput \\/>", Matcher.quoteReplacement(userInput));
solved the problem. The user could put in any kind of characters, including dollar signs, without issue. It behaved exactly the way you would expect.
To have protected pattern you may replace all symbols with "\\\\", except digits and letters. And after that you can put in that protected pattern your special symbols to make this pattern working not like stupid quoted text, but really like a patten, but your own. Without user special symbols.
public class Test {
public static void main(String[] args) {
String str = "y z (111)";
String p1 = "x x (111)";
String p2 = ".* .* \\(111\\)";
p1 = escapeRE(p1);
p1 = p1.replace("x", ".*");
System.out.println( p1 + "-->" + str.matches(p1) );
//.*\ .*\ \(111\)-->true
System.out.println( p2 + "-->" + str.matches(p2) );
//.* .* \(111\)-->true
}
public static String escapeRE(String str) {
//Pattern escaper = Pattern.compile("([^a-zA-z0-9])");
//return escaper.matcher(str).replaceAll("\\\\$1");
return str.replaceAll("([^a-zA-Z0-9])", "\\\\$1");
}
}
Pattern.quote("blabla") works nicely.
The Pattern.quote() works nicely. It encloses the sentence with the characters "\Q" and "\E", and if it does escape "\Q" and "\E".
However, if you need to do a real regular expression escaping(or custom escaping), you can use this code:
String someText = "Some/s/wText*/,**";
System.out.println(someText.replaceAll("[-\\[\\]{}()*+?.,\\\\\\\\^$|#\\\\s]", "\\\\$0"));
This method returns: Some/\s/wText*/\,**
Code for example and tests:
String someText = "Some\\E/s/wText*/,**";
System.out.println("Pattern.quote: "+ Pattern.quote(someText));
System.out.println("Full escape: "+someText.replaceAll("[-\\[\\]{}()*+?.,\\\\\\\\^$|#\\\\s]", "\\\\$0"));
^(Negation) symbol is used to match something that is not in the character group.
This is the link to Regular Expressions
Here is the image info about negation:

Replace and modify String using regex in java

I have a part of HTML from a website in the below String format:
srcset=" /tesla_theme/assets/img/homepage/mobile/homepage-models--touch#200w.jpg?20170808 200w, /tesla_theme/assets/img/homepage/mobile/homepage-models--touch#338w.jpg?20170808 338w, /tesla_theme/assets/img/homepage/mobile/homepage-models--touch#445w.jpg?20170808 445w, tesla_theme/assets/img/homepage/mobile/homepage-models--touch#542w.jpg?20170808 542w, /tesla_theme/assets/img/homepage/mobile/homepage-models--touch#750w.jpg?20170808 750w"
I want to add http://tesla.com in front of all the urls in the srcset element like http://tesla_theme/assets/img/homepage/mobile/homepage-models--touch#750w.jpg?20170808 750w
I believe this could be done using regex, but I am not sure.
How do I do this using Java if I have multiple srcset elements in a html string variable, and I want to replace all of the srcset url.'s and add the server url in front?
Note: The /tesla_theme will not be consistent, so I cannot use replaceAll, instead, i will have to use regex.
You can simply use String Class replace method as below, It will replace all "/_tesla" in the given String. No special regex required unless you have a kind of pattern instead of "/tesla"
String srcset=" /tesla_theme/assets/img/homepage/mobile/homepage-models--touch#200w.jpg?20170808 200w, /tesla_theme/assets/img/homepage/mobile/homepage-models--touch#338w.jpg?20170808 338w, /tesla_theme/assets/img/homepage/mobile/homepage-models--touch#445w.jpg?20170808 445w, tesla_theme/assets/img/homepage/mobile/homepage-models--touch#542w.jpg?20170808 542w, /tesla_theme/assets/img/homepage/mobile/homepage-models--touch#750w.jpg?20170808 750w";
String requiredSrcSet = srcset.replace("/tesla_", "http://tesla_");

Modifying xml via regex in java

I have the following XML String:
<asd1:content></asd1:content>
The namespace prefix asd1 could be different at different places in the XML file.
I want to modify it to :
<asd1:content>*</asd1:content>
I am trying to do it via regex as follows:
myString.replaceAll("<.*:content></.*:content>","replacement text");
The problem is that I don,t want to lose the namespace prefix. What should I do?
Please note that you've 2 typos:
cotent instead of content
replaceAlll instead of replaceAll
If you still need a regex, you can use:
String resultString = subjectString.replaceAll("(?ism)<(.*?):content></(.*?)\\.content>", "<$1:content>*</$2.content>");

Unable to split a string

I have a string
Mr praneel PIDIKITI
When I use this regular expression
String[] nameParts = name.split("\\s+");
instead of getting three parts I am only getting two, Mr and Praneel PIDIKITI.
I am unable to split the second string. Does anyone know what could be the problem?
I even used split(" ");.
The problem is I used replaceAll("\\<.*?>", " ").trim(); to convert html into this string and then I am using name.split("\\s+"); to get the name value.
I think it must be something other than space (some special character).
Your code should work. I suspect your input. There could be a non printable junk character between Praneel and PIDIKITI. For example,
String name = "Mr praneel" + (char)1 +"PIDIKITI";
String[] nameParts = name.split("\\s+");
for(String s : nameParts)
System.out.println(s);
Are you sure that there is no junk character between Praneel and PIDIKITI?
Remove non printable characters like this:
// remove non printable characters excluding white space characters
name = name.replaceAll("[^\\p{Print}\\s]","");
If you're parsing HTML, may I recommend JSoup? Its a good HTML parser for java

How to escape text for regular expression in Java?

Does Java have a built-in way to escape arbitrary text so that it can be included in a regular expression? For example, if my users enter "$5", I'd like to match that exactly rather than a "5" after the end of input.
Since Java 1.5, yes:
Pattern.quote("$5");
Difference between Pattern.quote and Matcher.quoteReplacement was not clear to me before I saw following example
s.replaceFirst(Pattern.quote("text to replace"),
Matcher.quoteReplacement("replacement text"));
It may be too late to respond, but you can also use Pattern.LITERAL, which would ignore all special characters while formatting:
Pattern.compile(textToFormat, Pattern.LITERAL);
I think what you're after is \Q$5\E. Also see Pattern.quote(s) introduced in Java5.
See Pattern javadoc for details.
First off, if
you use replaceAll()
you DON'T use Matcher.quoteReplacement()
the text to be substituted in includes a $1
it won't put a 1 at the end. It will look at the search regex for the first matching group and sub THAT in. That's what $1, $2 or $3 means in the replacement text: matching groups from the search pattern.
I frequently plug long strings of text into .properties files, then generate email subjects and bodies from those. Indeed, this appears to be the default way to do i18n in Spring Framework. I put XML tags, as placeholders, into the strings and I use replaceAll() to replace the XML tags with the values at runtime.
I ran into an issue where a user input a dollars-and-cents figure, with a dollar sign. replaceAll() choked on it, with the following showing up in a stracktrace:
java.lang.IndexOutOfBoundsException: No group 3
at java.util.regex.Matcher.start(Matcher.java:374)
at java.util.regex.Matcher.appendReplacement(Matcher.java:748)
at java.util.regex.Matcher.replaceAll(Matcher.java:823)
at java.lang.String.replaceAll(String.java:2201)
In this case, the user had entered "$3" somewhere in their input and replaceAll() went looking in the search regex for the third matching group, didn't find one, and puked.
Given:
// "msg" is a string from a .properties file, containing "<userInput />" among other tags
// "userInput" is a String containing the user's input
replacing
msg = msg.replaceAll("<userInput \\/>", userInput);
with
msg = msg.replaceAll("<userInput \\/>", Matcher.quoteReplacement(userInput));
solved the problem. The user could put in any kind of characters, including dollar signs, without issue. It behaved exactly the way you would expect.
To have protected pattern you may replace all symbols with "\\\\", except digits and letters. And after that you can put in that protected pattern your special symbols to make this pattern working not like stupid quoted text, but really like a patten, but your own. Without user special symbols.
public class Test {
public static void main(String[] args) {
String str = "y z (111)";
String p1 = "x x (111)";
String p2 = ".* .* \\(111\\)";
p1 = escapeRE(p1);
p1 = p1.replace("x", ".*");
System.out.println( p1 + "-->" + str.matches(p1) );
//.*\ .*\ \(111\)-->true
System.out.println( p2 + "-->" + str.matches(p2) );
//.* .* \(111\)-->true
}
public static String escapeRE(String str) {
//Pattern escaper = Pattern.compile("([^a-zA-z0-9])");
//return escaper.matcher(str).replaceAll("\\\\$1");
return str.replaceAll("([^a-zA-Z0-9])", "\\\\$1");
}
}
Pattern.quote("blabla") works nicely.
The Pattern.quote() works nicely. It encloses the sentence with the characters "\Q" and "\E", and if it does escape "\Q" and "\E".
However, if you need to do a real regular expression escaping(or custom escaping), you can use this code:
String someText = "Some/s/wText*/,**";
System.out.println(someText.replaceAll("[-\\[\\]{}()*+?.,\\\\\\\\^$|#\\\\s]", "\\\\$0"));
This method returns: Some/\s/wText*/\,**
Code for example and tests:
String someText = "Some\\E/s/wText*/,**";
System.out.println("Pattern.quote: "+ Pattern.quote(someText));
System.out.println("Full escape: "+someText.replaceAll("[-\\[\\]{}()*+?.,\\\\\\\\^$|#\\\\s]", "\\\\$0"));
^(Negation) symbol is used to match something that is not in the character group.
This is the link to Regular Expressions
Here is the image info about negation:

Categories