Java equivalent of System.Xml.XmlNode.InnerXml

Java equivalent of System.Xml.XmlNode.InnerXml - java

Is there a java equivalent of .NET's System.Xml.XmlNode.InnerXml?
I need to replace some words in an XML document.
I cannot use Java's org.w3c.dom.Node.setTextContent() because this removes the XML nodes.
Thanks!
Source:
<body>
<title>Home Owners Agreement</title>
<p>The <b>good</b> thing about a Home Owners Agreement is that...</p>
</body>
Desired output:
<body>
<title>Home Owners Agreement</title>
<p>The <b>good</b> thing about a HOA is that...</p>
</body>
I only want text in <p> tags to be replaced. I tried the following:
replaceText(string term, string replaceWith, org.w3c.dom.Node p){
p.setTextContent(p.getTextContent().replace(term, replaceWith));
}
The problem with the above code is that all the child nodes of p get lost.

You could have a look at jdom.
Something like document.getRootElement().getChild("ELEMENT1").setText("replacement text");
You have a little bit of work to do in converting the document into a JDOM document, but there are adapters that make that fairly easy for you. Or, if the XML is in a file, you can just use a JDOM Builder class to create the DOM that you want to manipulate.
`

Okay, I figured out the solution.
The key to this is that you don't want to replace the text of the actual node. There is a actually a child representation of just the text. I was able to accomplish what I needed with this code:
private static void replace(Node root){
if (root.getNodeType() == root.TEXT_NODE){
root.setTextContent(root.getTextContent().replace("Home Owners Agreement", "HMO"));
}
for (int i = 0; i < root.getChildNodes().getLength(); i++){
outputTextOfNode(root.getChildNodes().item(i));
}
}

Related

Create a Java parser for html content similar to mailchimp one

I'd like creare a simple parser in Java that analize a string containing html and replace custom tags and if else/elseif statements, similar to Mailchimp.
Actually I simply replace my custom tags for example: *|NAME|*
with the name of the recipient, *|AGE|* whith the age of the recipient and so on.
I'd like to add conditional statements to permit expressions like:
<html>
<head>....</head>
<body>
*|IF:GENDER=M|*
Hello Mr.<b>*|NAME|*</b>
*|ELSEIF:GENDER=F|*
Hello Mrs.<i>*|NAME|*</i>
*|ELSE:|*
Dear customer
*|END:IF|*
<p>Bla bla bla</p>
....
....
</body>
</html>
In short, a very similar sintax to Mailchimp one.
The user can write his own html and add a custom syntax to customize the content based on data.
I think I'm able to create a code that works but I'm searching for the best practice to implement it. I looking for a good way to follow to implements a good code.
Would be nice if the parser can manage nested if/elseif/else statements.

Jsoup remove ONLY html tags

What is proper way to remove ONLY html tags (preserve all custom/unknown tags) with JSOUP (NOT regex)?
Expected input:
<html>
<customTag>
<div> dsgfdgdgf </div>
</customTag>
<123456789/>
<123>
<html123/>
</html>
Expected output:
<customTag>
dsgfdgdgf
</customTag>
<123456789/>
<123>
<html123/>
I tried to use Cleaner with WhiteList.none(), but it removes custom tags also.
Also I tried:
String str = Jsoup.parse(html).text()
But it removes custom tags also.
This answer isn't good for me, because number of custom tags is infinity.

you might want to try something like this:
String[] tags = new String[]{"html", "div"};
Document thing = Jsoup.parse("<html><customTag><div>dsgfdgdgf</div></customTag><123456789/><123><html123/></html>");
for (String tag : tags) {
for (Element elem : thing.getElementsByTag(tag)) {
elem.parent().insertChildren(elem.siblingIndex(),elem.childNodes());
elem.remove();
}
}
System.out.println(thing.getElementsByTag("body").html());
Please note that <123456789/> and <123> don't conform to the xml standard, so they get escaped. Another downside may be that you have to explicitly write down all tags you don't like (aka all html tags) and it may be sloooooow. Haven't looked at how fast this is going to run.
MFG
MiSt

Using OutputRaw in Java Tapestry

I have a web application running Java Tapestry, with a lot of user-inputted content. The only formatting that users may input is linebreaks.
I call a text string from a database, and output it into a template. The string contains line breaks as /r, which I replace with < br >. However, these are filtered on output, so the text looks like b<br>text text b<br> text. I think I can use outputRaw or writeRaw to fix this, but I can't find any info for how to add outputRaw or writeRaw to a Tapestry class or template.
The class is:
public String getText() {
KMedium textmedium = getTextmedium();
return (textmedium == null || textmedium.getTextcontent() == null) ? "" : textmedium.getTextcontent().replaceAll("\r", "<br>");
}
The tml is:
<p class="categorytext" id="${currentCategory.id}">
${getText()}
</p>
Where would I add the raw output handling to have my line breaks display properly?

To answer my own question, this is how to output the results of $getText() as raw html:
Change the tml from this:
<p class="categorytext" id="${currentCategory.id}">
${getText()}
</p>
To this:
<p class="categorytext" id="${currentCategory.id}">
<t:outputraw value="${getText()}"/>
</p>

Note that this is quite dangerous as you are likely opening your site to an XSS attack. You may need to use jsoup or similar to sanitize the input.

An alternative might be:
<p class="categorytext" id="${currentCategory.id}">
<t:loop source="textLines" value="singleLine">
${singleLine} <br/>
</t:loop>
</p>
This assumes a a getTextLines() method that returns a List or array of Strings; it could use the same logic as your getText() but split the result on CRs. This would do a better job when the text lines contain unsafe characters such as & or <. With a little more work, you could add the <br> only between lines (not after each line) ... and this feels like it might be a nice component as well.

How to remove CDATA from XML in Java and do some conversion?

I am trying create Java Servlet which will modify existing XML.
This a part of my orginal XML:
<customfieldvalues>
<div id="errorDiv" style="display:none;"/>
<![CDATA[
Vinduer, dører
]]>
</customfieldvalues>
I want to get the following result:
<customfieldvalues>
<div id="errorDiv" style="display:none;"/>
Vinduer, dører
</customfieldvalues>
I iterate over the XML structure with:
Document doc = parseXML(connection.getInputStream());
NodeList descNodes = doc.getElementsByTagName("customfieldvalues");
for (int i=0; i<descNodes.getLength();i++) {
Node node = descNodes.item(i);
// how to ?
}
So, I need to remove CDATA and convert the content.
I saw that I can use this for the conversion.

javax.xml.parsers.DocumentBuilderFactory.setCoalescing API
Specifies that the parser produced by this code will
convert CDATA nodes to Text nodes and append it to the
adjacent (if any) text node. By default the value of this is set to
false

java dom xml parser get html tags(<p color="something">some text</p>) from xml

I have an xml file with html tags like:
<?xml version="1.0" encoding="utf-8" ?>
<blog>
<blogid>49</blogid>
<title>[FIXED] Job requests page broken</title>
<fulltext>
<img title="page broken" src="images/west/blog/site-broken.jpg" alt="page broken" />
<p><span style="background-color: #ccffcc;">Update 28/05/2011</span>: Job requests page seems to be working OK now. If you find any issues please use the contact page to notify us. Thank you for your patience!</p>
<p>Â </p>
<p>Well, what can I say? Why does it always have to be that way? You are trying to create something new and something else gets broken on the way...</p>
</fulltext>
Now I want the whole html part between tag as it is.
What I get right now is blank as I think dom is parsing html tags as well.
I tried xpath but it is not working with android.

I don't think you can get this not well-formed XML into a DOM as-is. (EDIT: or is it well-formed?)
You would need to a) either escape the characters - making the XML well-formed and parseable (but probably not into a DOM you want, I guess you want to display the HTML in a different system) or b) parse it using a stream processor or c) fix it using string manipulation (add <[[CDATA .. ]]>) and then parse it into a DOM.
HTH

HTML is a sub-language of XML (without getting into details related to XHTML). Therefore, there is no reason for the DOM parser not to treat those inner tags as XML tags.
Maybe what you're looking for is a way to flatten what's inside <fulltext>?

use a library like Jsoup for this purpose.
public static void main(String args[]){
String html = "<?xml version="1.0"?><foo>" +
"<bar>Some text — invalid!</bar></foo>";
Document doc = Jsoup.parse(html, "", Parser.xmlParser());
for (Element e : doc.select("bar")) {
System.out.println(e);
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java equivalent of System.Xml.XmlNode.InnerXml - java

Related

Create a Java parser for html content similar to mailchimp one

Jsoup remove ONLY html tags

Using OutputRaw in Java Tapestry

How to remove CDATA from XML in Java and do some conversion?

java dom xml parser get html tags(<p color="something">some text</p>) from xml

Categories

Resources