I want to replace every <img> tag with closing <img></img> tags in a string. The string is actually an html document where the img tag are generated by me and always look like this :
<img src="some_source.jpg" style="some style attributes and values">
Src is user input so it can be anything.
I made a regex expression, not sure if correct because it's my first time using it but upon testing it was working. The problem is that I don't know how to keep the content of the src.
/<img\ssrc=".+?"\sstyle=".+?">/g
But I have difficulties replacing the tags in the string.
and all I got is this:
Pattern p = Pattern.compile("/<img\\ssrc=\".+?\"\\sstyle=\".+?\">/g");
Matcher m = p.matcher(str);
List<String> imgStrArr = new ArrayList<String>();
while (m.find()) {
imgStrArr.add(m.group(0));
}
Matcher m2 = p.matcher(str);
You can use the following regex to match:
(<img[^>]+>)
And replace with $1</img>
Code:
str = str.replaceAll("(<img[^>]+>)", "$1</img>");
Edit: Considering #MarcusMüller's advice you can do the following:
Regex: (<img[^>]+)>
Replace with $1/>
Code:
str = str.replaceAll("(<img[^>]+)>", "$1/>");
You don't have to use Pattern and Matcher classes, you can use the regular replace method like this:
str = str.replaceAll("(<img.*?>)", "$1</img>");
IdeOne working demo
Related
this is my problem.
String pattern1 = "<pre.*?>(.+?)</pre>";
Matcher m = Pattern.compile(pattern1).matcher(html);
if(m.find()) {
String temp = m.group(1);
System.out.println(temp);
}
temp does not retain line breaks...it flows as a single line. How to keep the line breaks within temp?
You shouldn't parse HTML with regular expressions, but to fix this use the dotall modifier ...
String pattern1 = "(?s)<pre[^>]*>(.+?)</pre>";
↑↑↑↑
|_______ Forces the . to span across newline sequences.
Using JSoup: html parser
It's very well known that you shouldn't use regex to parse html content, you should use a html parser instead. You can see below how to do it with JSoup:
String html = "<p>lorem ipsum</p><pre>Hello World</pre><p>dolor sit amet</p>";
Document document = Jsoup.parse(html);
Elements pres = document.select("pre");
for (Element pre : pres) {
System.out.println(pre.text());
}
Pattern.DOTALL: single line compiled flag
However, if you still want to use regex, bear in mind that dot it's a wildcard which doesn't match \n unless you specify it intentionally, so you can achieve this in different ways, like using Pattern.DOTALL
String pattern1 = "<pre.*?>(.+?)</pre>";
Matcher m = Pattern.compile(pattern1, Pattern.DOTALL).matcher(html);
if(m.find()) {
String temp = m.group(1);
System.out.println(temp);
}
Inline Single line flag:
Or using the s flag inline in the regex like this:
String pattern1 = "(?s)<pre.*?>(.+?)</pre>";
Matcher m = Pattern.compile(pattern1).matcher(html);
if(m.find()) {
String temp = m.group(1);
System.out.println(temp);
}
Regex trick
Or you can also use a regex trick that consists of using complementary sets like [\s\S], [\d\D], [\w\W], etc.. like this:
String pattern1 = "<pre.*?>([\\s\\S]+?)</pre>";
Matcher m = Pattern.compile(pattern1).matcher(html);
if(m.find()) {
String temp = m.group(1);
System.out.println(temp);
}
But as nhahtdh pointed in his comment this trick might impact in the regex engine performance.
I have a string which contains text inside parenthesis. I need to extract text present inside "[[ ]]" parenthesis using Java. Also, there are multiple occurrences of "[[ ]]" parenthesis. I would like to extract text from all of them.
For example:
String text = "[[test]] {{test1}} [[test2]]";
Expected Output:
test
test2
Can anyone help please?
It's a simple regular expression match:
Pattern p = Pattern.compile("\\[\\[.*?\\]\\]");
Use a Matcher with lookingAt() method to get the result.
To remove the "[[" and "]]" after that, just add a String#replace().
you can use this:
String text = "[[test]] {{test1}} [[test2]]";
Pattern p = Pattern.compile("\\[\\[(.*?)]]", Pattern.DOTALL);
Matcher m = p.matcher(text);
while (m.find()) {
System.out.print(m.group(1));
}
I want to replace all links on a HTML page with a defined URL and the original link in the query string.
Here is an example:
"http://www.ex.com abc http://www.anotherex.com"
Should be replaced by:
"http://www.newex.com?old=http://www.ex.com ABC http://www.newex.com?old=http://www.anotherex.com"
I thought about using replaceAll, but I dont know exactly how to reuse the regex pattern in the replacement.
something like
String processed = yourString.replaceAll([ugly url regexp],"http://www.newex.com?old=$0")
$0 being a reference to the main capture group of the regexp. see the documentation for Matcher.appendReplacement
for a worthy regexp, you can have your pick from here for example
I would go about this by doing something like:
List<String> allMatches = new ArrayList<String>();
Matcher m = Pattern.compile("regex here")
.matcher(StringHere);
while (m.find()) {
allMatches.add(m.group());
}
for(String myMatch : allMatches)
{
finalString = OriginalString.replace(myMatch, myNewString+myMatch);
}
I didn't test any of this, but it should give you an idea of how to approach it
I have the following regex snippet to parse the URL of an ahref as follow:
(?<=href=)[^\"']+(?=(\"|'))?>
What I m trying to do is replace the following snippet with data, i populate at runtime:
<a href=$tracking_url$&langding_url=google.com>
<img src="irreleavnt" />
</a>
When i try replaceAll() as follows, it fails
String fragment = <a href=$click_tracking_url$&landing_url=google.com><img src=\"10.gif\" /></a>
String processedFragment = fragment.replaceAll(AHREF_REGEX, ahrefurl);
The error is :
java.lang.IllegalArgumentException: Illegal group reference
at java.util.regex.Matcher.appendReplacement(Matcher.java:724)
at java.util.regex.Matcher.replaceAll(Matcher.java:824)
at java.lang.String.replaceAll(String.java:1572)
How can i fix the regex to match <a href=$click_tracking_url$ ? How can i escape $ from regex?
Well.. I tried your regexp and didn't get the error. However, the regexp didn't replace the $click_tracking_url$ but the whole text until the end of the element.
In case you need to dynamically replace placeholders, try something like this:
Map<String, String> placeholders = new HashMap<String, String>();
// init your placeholders somehow...
placeholders.put("click_tracking_url", "something");
String fragment = "<img src=\"10.gif\" />";
Matcher m = Pattern.compile("\\$(\\w+)\\$").matcher(fragment);
if (m.find()) {
String processedFragment = m.replaceAll(placeholders.get(m.group(1)));
System.out.println(processedFragment);
}
Here, give this a shot. Replace this...
(?<=href=)(['"]?)\\$([^$>]+)\\$
...with this:
$1$url
And don't forget to escape your backslashes.
If I understand your question correctly, you're trying to replace patterns in the form of $someIdentifier$ with some value in your application you're using someIdentifier to dereference.
It seems you would want to use the pattern \$([^\$]+)\$ and find each occurrence in the string, grab the value of group one (1), look up the value and then replace all occurrences of that specific sequence with the value you looked up.
String someString = "some$string$withatoken";
Pattern tokenPattern = Pattern.compile("\\$([^\\$]+)\\$");
Matcher tokenMatcher = tokenPattern.matcher(someString);
// find not matches. matches will compare the entire string against the pattern
// basically enforcing ^{pattern}$.
while (tokenMatcher.find()) {
String tokenName = tokenMatch.group(1);
String tokenValue = tokenMap.get(tokenName); // lookup value of token name.
someString = someString.replaceAll("\\$" + tokenName + "\\$", tokenValue);
// resets the target of the matcher with the new value of someString.
tokenMatcher.reset(someString);
}
I'd like to get a portion of a matched string coming from a Matcher, like this:
Pattern pat = Pattern.compile("a.*l.*z");
Matcher match = pat.matcher("abcdlmnoz"); // I'd want to get bcd AND mno
ArrayList<String> values = match.magic(); //here is where your magic happens =)
ArrayList<String> is only for this example, I could be happy to recieve either a List or individual String items. The best would be what.htaccess files and RewriteRule's do:
RewriteRule (.*)/path?(.*) $1/$2/modified-path/
Well, putting those (.*) into $arguments would be as cool as an ArrayList or accessing String separately. I've been looking for something at Java Matcher API, but I didn't happen to see anything useful inside.
Thanks in advance, guys.
You can capture groups in a regexp match using (_):
Pattern pat = Pattern.compile("a(.*)l(.*)z");
boolean b = match.matches(); // don't forget to attempt the match
Then use match.group(n) to get that portion of the capture. The groups are stored in the match object.
Capturing GroupsOracle
Look at the matcher's "group" method and peruse the doc you linked to for references to groups, which is what the parentheses in the regex do :)
...
String testStr = "abcdlmnoz";
String myRE = "a(.*)l(.*)z";
Pattern myRECompiled = Pattern.compile (myRE,
DOTALL);
Matcher myMatcher = myRECompiled.matcher (testStr);
myMatcher.find ();
System.out.println (myMatcher.group (1));
System.out.println (myMatcher.group (2));
...