How to append a string with dynamic data
I have an HTML string, want to add !important-tag in a font-size element.
Conditions:
The font size is dynamic value - like 36pt, 24pt, 14pt - Its not constant.
There is no order span tags. It's not coming regular span order format.
I have tried the regexp and replace method but it's not working. Please help me.
Input String
String htmlData = "<span style=\"font-weight: bold;\">First</span><span style=\"font-size: 36pt;\">Second</span><span style=\"font-family: Arial;\">third</span>";
Output
!important added in font-size element
String htmlData = "<span style=\"font-weight: bold;\">First</span><span style=\"font-size: 36pt;!important\">Second</span><span style=\"font-family: Arial;\">third</span>";
It depends on what the input string ends up being. Your list of 'conditions' is not adequate.
Any HTML, really
Okay, then, you need a heck of a lot more information about which font size style needs to be changed. All of them, anywhere in the document? As in, 'take the input HTML, find any and all spans anywhere with a style attribute, then parse those style attributes to find any font-size CSS keys, and add !important to those.
Then the answer is very, very tricky. HTML is not regular and thus cannot be parsed with regular expressions. You'd need to add a third party dep like JSoup, use that to parse this HTML and change it.
Furthermore, CSS parsing is not exactly trivial either, so on top of this you'd need a CSS parser.
Really, you need to go back to the drawing board. You have systems in place that ended up in 'I need to parse any HTML for CSS in style attributes and modify those', and the solution to that problem lies further up the chain. Uses classes instead of style attributes, or find the place that makes these style attributes and fix it there.
It's always this exact form
This will fail horribly unless, your input is exactly like this:
a span with a style and nothing else, with plain text content.
Optionally, any number of those spans, but not nested.
If style is present, then via style=, and all style keys necessarily end in a semi-colon, even though that is optional in HTML.
The font size is always specified using the exact spelling font-size.
Well, as long as you take great care to write down someplace that the HTML you input into this algorithm is restricted to remain that simple, you could in theory do this with regular expressions, but be aware that any fancying up of this HTML is going to break this, and the only true answer, capable of dealing with future changes, remains JSoup!
input.replaceAll("font-size\\s*:(.*?);", "font-size:$1 !important;");
will do the job.
NB: If font-size appears as text inside the spans, that's bad. You'd have to extend your regex to look for e.g. too, but strings are hard to parse with regexes either (specifically, trying to dance around backslash-escaped quotes is tricky). This all goes back to: You have found yourself in a nasty place; you really should fix this elsewhere in the chain.
You can try using replaceAll:
String html = "<span style=\"font-weight: bold;\">First</span><span style=\"font-size: 36pt;\">Second</span><span style=\"font-family: Arial;\">third</span>";
String replaced = html.replaceAll("font-size: ([0-9]*)pt;", "font-size: $1pt !important;");
System.out.println(replaced);
i think you can create a string such as
"font-size: "+value+"pt;" and in your html string, you can simply find and replace this string with "font-size: "+value+"pt;!important" by using htmlData.replaceFirst(string1, string2);
It is better to convert your string to html, and to set the important option (this solution let you change every thing you want in the html and css)
But if you only need to do this specific change "!important" then you need to chose the easy way with :
input.replaceAll
Related
I'm using StringBuidler in Java to build a HTML page.
I want to know how to escape all quotes (") without placing a "\" every time?
For example, every time when I append a string like this :
StringBuilder a ;
a.append(<div id = \"Name\" ...>)
I want to write directly :
a.append(<div id = "Name" ..>
Thanks.
Short answer: There is no way around this in Java
Long answer: Java does not have multiple ways to enclose Strings. You always do it with double quotes, so if you want to have double quotes in your String you have to escape them.
But if they really annoy you you can apply some trickery:
put your Strings in a text file and read them from there.
use a different character instead of the quote character and use replace to put in the proper quotes. Of course your replacement character must not appear anywhere else in the string.
Write the code in question in a different programming language like Groovy, which has different ways to delimit Strings.
Since you seem to generate HTML: use a proper templating engine, which really is option 1 on steroids.
When building a HTML template, the easiest solution is to use a text file.
You can do this as
a simple text file where you replace() tags with code you want to alter
use a properties file for the sections of text to inline.
use a library which has a fluent API for generating HTML
use velocity to perform the substitution for you.
use one of the other many web page formats like JSP.
However, there is no way to avoid escaping " in Java code. The only alternative is you use another character like ” (Alt-Graphic-B) which you replace at the end.
You can't, which is only one of the reasons it's a bad idea to fill a StringBuilder with HTML code by hand.
It exists in other language than Java, but with Java is not possible.
With coffescript, you can, for example :
html = """
<div id="Name" > ... </div>
"""
There's no proper way to do it, but you might be able to put a rarely used substitute character (a tilde or something) in your String and then call .replace() on it.
Ideally, you should be loading the data from a file if you want the raw string.
I need to clean an html string from accents and html accents code, and of course I have found a lot of codes that do this, however, none seems to work with the file i need to clean.
This file contains words like Postulación Ayudantías and also Gestión or Árbol
I found a lot of codes with text.normalize and regex use to clean the String, which work well with short strings but I'm using very long strings and those codes, which work with short string, doesn't work with long Strings
I am really lost here and I need help please!
This are the codes I tried and didnt work
Easy way to remove UTF-8 accents from a string? (return "?" for every accent in the String)
and I used regular expression to remove the html accent code but neither is working:
string=string.replaceAll("á","a");
string=string.replaceAll("é","e");
string=string.replaceAll("í","i");
string=string.replaceAll("ó","o");
string=string.replaceAll("ú","u");
string=string.replaceAll("ñ","n");
Edit: nvm the replaceAll is working I wrote it wrong ("/á instead of "á)
Any help or ideas?
I think there are several options that would work. I would suggest that you first
use StringEscapeUtils.unescapeHtml4(String) to unescape your html entities (that is convert them to their normal Java "utf-8" form).
Then you could use an ASCIIFoldingFilter to filter to "ASCII" equivalents.
You need to differentiate whether you're talking about a whole HTML document containing tags and so forth or just a string containing HTML encoded data.
If you're working with an entire HTML document, say, something returned by fetching a web page, then the solution is really more than could fit into a stack overflow answer, since you basically need an HTML parser to navigate the data.
However, if you're just dealing with a string that's HTML encoded, then you first need to decode it. There are lots of utilities to do so, such as the Apache Commons Lang library StringEscapeUtils class. See this question for an example.
Once you've decoded the string, you need to iterate over it character by character and replace anything that's unwanted. Your current method won't work for hex encoded items, and you're going to end up having to build a huge table to cover all the possible HTML entities.
i am facing with a very difficult problem, which is following:
I have a number of HTML-formatted Strings. they were generated by a Document-Element, where the text was edited in RTF and saved in HTML (to display it on a website).
the problem now is, that some RTF-Elements which are parset to HTML seems to be unusable in html, which leads it to crash. One of the in html disallowed chars is e.g. the %0b
according to http://www.tutorialspoint.com/html/html_url_encoding.htm it has no function, or i can't figure out why it is needed (in fact, it isn't even copyable).
My question now is: Is there a function out there (I already searched) which is able to eliminate all non-html characters of such a formatted rtf2html-string?
I just need to eliminate them when the html is loaded, so there aren't any display problems
Use methods provided by Apache Commons Lang
import org.apache.commons.lang.StringEscapeUtils;
String afterDecoding = StringEscapeUtils.unescapeHtml(beforeDecoding);
Credit to: #jlordo
Or you can use replaceAll("%0b", "");
Ok, so I know this question has been asked in different forms several times, but I am having trouble with specific syntax. I have a large string which contains html snippets. I need to find every link tag that does not already have a target= attribute (so that I can add one as needed).
^((?!target).)* will give me text leading up to 'target', and <a.+?>[\w\W]+?</a> will give me a link, but thats where I'm stuck. An example:
<a href="http://www.someSite.com>Link</a> (This should be a match)
Link (this should not be a match).
Any suggestions? Using DOM or XPATH are not really options since this snippet is not well-formed html.
You are being wilfully evil by trying to parse HTML with Regexes. Don't.
That said, you are being extra evil by trying to do everything in one regexp. There is no need for that; it makes your code regex-engine-dependent, unreadable, and quite possibly slow. Instead, simply match tags and then check your first-stage hits again with the trivial regex /target=/. Of course, that character string might occur elsewhere in an HTML tag, but see (1)... you have alrady thrown good practice out of the window, so why not at least make things un-obfuscated so everyone can see what you're doing?
If you insist on doing it with Regex a pattern such as this should help...
<a(?![^>]*target=) [^>]*>.*?</a>
It's by no means 100% perfect technically speaking a tag can contain a > in places other than then end so it won't work for all HTML tags.
NB. I work with PHP, you may have to make slight syntax adjustments for Java.
You could try a negative lookahead like this:
<a(?!.*?target.*?).*?>[\w\W]+?</a>
I didn't test this and spent about a minute writing it, but for your specific example if you can do it on the client-side, try this via the DOM:
var links = document.getElementsByTagName("a");
for (linkIndex=0; linkIndex < links.length; linkIndex++) {
var link = links[linkIndex];
if (link.href && !link.target) {
link.target = "someTarget"
// or link.setAttribute("target", "someTarget");
}
}
Possible duplicate: RegEx matching HTML tags and extracting text
I need to get the text between the html tag like <p></p> or whatever. My pattern is this
Pattern pText = Pattern.compile(">([^>|^<]*?)<");
Anyone knows some better pattern, because this one its not very usefull. I need it to get for index the content from web page.
Thanks
SO is about to descend on you. But let me be the first to say, don't use regular expressions to parse HTML. Here is a list of Java HTML Parsers. Look around until you see an API that suits your fancy and use that instead.
It looks like you are trying to use the | operator inside a negative set, which is neither working nor needed. Just specify the characters that you don't want to match:
Pattern pText = Pattern.compile(">([^<>]*?)<");
Don't use regular expressions when parsing HTML.
Use XPath instead (if your HTML is well formed). You can reference text nodes using the text() function very easily.