Bug reading CDATA from xml element in Java

Bug reading CDATA from xml element in Java - java

been scratching my head for a while here...
So I have a Java application. In this application I need to read an XML file, get the Character Data from an element, pass it into a new DOM Document, change some of the elements, and convert the new Document back into CDATA, reattach it to the original message and send it off.
So... Here is the message I need to read in, and the function which reads it in:
private static String getCharacterDataFromElement(Node e) {
Node child = e.getFirstChild();
if (child instanceof CharacterData) {
CharacterData cd = (CharacterData) child;
System.out.println(cd.getBaseURI());
System.out.println(cd.getData());
return cd.getBaseURI();
}
return "error...";
}
And here is the xml file which needs to be changed
<RLSOLVE_MSG version="5.0">
<MESSAGE>
<SOURCE_ID>DP01</SOURCE_ID>
<TRANS_NUM>000001</TRANS_NUM>
</MESSAGE>
<POI_MSG type="interaction">
<INTERACTION name="posPrintReceipt">
<RECEIPT type="merchant" format="xml">
<![CDATA[<RECEIPT>
<AUTH_CODE>06130</AUTH_CODE>
<CARD_SCHEME>VISA</CARD_SCHEME>
<CURRENCY_CODE>GBP</CURRENCY_CODE>
<CUSTOMER_PRESENCE>internet</CUSTOMER_PRESENCE>
<FINAL_AMOUNT>1.00</FINAL_AMOUNT>
<MERCHANT_NUMBER>8888888</MERCHANT_NUMBER>
<PAN_NUMBER>454420******0382</PAN_NUMBER>
<PAN_EXPIRY>12/15</PAN_EXPIRY>
<TERMINAL_ID>04176421</TERMINAL_ID>
<TOKEN>454420bbbbbkqrm0382</TOKEN>
<TOTAL_AMOUNT>1.00</TOTAL_AMOUNT>
<TRANSACTION_DATA_SOURCE>keyed</TRANSACTION_DATA_SOURCE>
<TRANSACTION_DATE>14/02/2014</TRANSACTION_DATE>
<TRANSACTION_NUMBER>000001</TRANSACTION_NUMBER>
<TRANSACTION_RESPONSE>06130</TRANSACTION_RESPONSE>
<TRANSACTION_TIME>17:13:17</TRANSACTION_TIME>
<TRANSACTION_TYPE>purchase</TRANSACTION_TYPE>
<VERIFICATION_METHOD>unknown</VERIFICATION_METHOD>
<DUPLICATE>false</DUPLICATE>
</RECEIPT>]]>
</RECEIPT>
</INTERACTION>
</POI_MSG>
</RLSOLVE_MSG>
When cd.getData() is executed, it returns "\n \t \t \t \t"
Aaaany ideas?

Look closely at your XML. If I write them in one line it's actually
<RECEIPT type="merchant" format="xml">\n\t\t\t<![CDATA[...]]>\n\t\t\t</RECEIPT>
So the tree will actually look something like:
RECEIPT
/ | \
\n\t\t\t CDATA \n\t\t\t
So you got three children. As you are only getting the first child, you're only getting \n\t\t\t.
Loop through all the children and concat their data and you should have everything.

Related

How to get the whole text of StAX XMLEvent object?

I am using StAX Iterator api to read an xml.
XML:
<FormData OID="QUAL">
<IGData IGRepeatKey="1" IGOID="SQUAL" TType="Insert">
<IData Value="0859" IOID="SID"></IData>
<IData Value="DM" IOID="RDOMAIN"></IData>
</IGData>
<IGData IGRepeatKey="1" IGOID="SQUAL" TType="Insert">
<IData Value="0860" IOID="SID"></IData>
<IData Value="2013-01-03T02:00" IOID="QVAL"></IData>
</IGData>
</FormData>
And Stax code:
while(xmlEventReader.hasNext()){
xmlEvent = xmlEventReader.nextEvent();
eventString = xmlEvent.toString();
if(xmlEvent.isStartElement() && eventString.contains("FormData") && eventString.contains("QUAL")){
//do something
}
}
It is working (eventString has whole text of xmlEvent) in my local environment.
But when i deploy this into server, eventString contains like "Stax Event #1". So if condition is returning false.
I thought both are using different XMLEvent implementations. So i checked it through code, and jar is same in both environments: jre1.8.0_73/lib/rt.jar!/javax/xml/stream/events/XMLEvent.class
How to get the whole text of XMLEvent object? Am i doing anything wrong here? Please suggest any other alternatives.

Every XML Event has 3 states
Start Element
is Characters
End Element
for eg if you need to access the data for " IGRepeatKey " from your xml file then in the state ( Start Element ), you need to check if IGData tag has started , if its true . Start a new Iterator which will iterate over all the tags i.e IGRepeatKey , IGOID , TType .
Try Something like this
Iterator<Attribute> iterator = element.getAttributes();
while (iterator.hasNext())
{
Attribute attribute = (Attribute)iterator.next();
QName name = attribute.getName();
String value = attribute.getValue();
System.out.println(name+" + "+value);
}
Add this iterator in xml.isStartElement() block .

unable to parse a node from xml string with dom4j

I'm parsing a xml string with dom4j and I'm using xpath to select some element from it, the code is :
String test = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><epp xmlns=\"urn:ietf:params:xml:ns:epp-1.0\"><response><result code=\"1000\"><msg lang=\"en-US\">Command completed successfully</msg></result><trID><clTRID>87285586-99412370</clTRID><svTRID>52639BB8-1-ARNES</svTRID></trID></response></epp>";
SAXReader reader = new SAXReader();
reader.setIncludeExternalDTDDeclarations(false);
reader.setIncludeInternalDTDDeclarations(false);
reader.setValidation(false);
Document xmlDoc;
try {
xmlDoc = reader.read(new StringReader(test));
xmlDoc.getRootElement();
Node nodeStatus = xmlDoc.selectSingleNode("//epp/response/result");
System.out.print(nodeStatus.getText());
} catch (DocumentException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
I always get null for the nodeStatus variable. I actualy nead to read the code from the result noad from the xml
<result code="1000">
This is the XML that I am reading from the String test:
<?xml version="1.0" encoding="UTF-8"?>
<epp xmlns="urn:ietf:params:xml:ns:epp-1.0">
<response>
<result code="1000">
<msg lang="en-US">Command completed successfully</msg>
</result>
<trID>
<clTRID>87285586-99412370</clTRID>
<svTRID>52639BB8-1-ARNES</svTRID>
</trID>
</response>
</epp>
Any hints?

Your XML has a namespace. DOM4J returns null because it won't find your nodes.
To make it work, you first have to register the namespaces you are using. You will need a prefix. Any one. And you will have to use that prefix in your XPath.
You could use tns for "target namespace". Then you have to create a xpath object with it like this:
XPath xpath = new DefaultXPath("/tns:epp/tns:response/tns:result");
To register the namespaces you will need to create a Map, add the namespace with the prefix you used in the xpath expression, and pass it to the setNamespaceURIs() method.
namespaces.put("tns", "urn:ietf:params:xml:ns:epp-1.0");
xpath.setNamespaceURIs(namespaces);
Now you can call selectSingleNode, but you will call it on your XPath object passing the document as the argument:
Node nodeStatus = xpath.selectSingleNode(xmlDoc);
From there you can extract the data you need. getText() won't give you the data you want. If you want the contents of the result node as XML, you can use:
nodeStatus.asXML()
Edit: to retrieve just the code, change your XPath to:
/tns:epp/tns:response/tns:result/#code
And retrieve the result with
nodeStatus.getText();
I replaced the double slash // (which means descendant-or-self) with / since the expression contains the full path and / is more efficient. But if you only have one result node in your whole file, you can use:
//result/#code
to extract the data. It will match all descendants. If there is more than one result, it will return a node-set.

JAXB Unmarshalling an subset of Unknown XML content

I have a requirement to unmarshall a subset of Unknown XML content, with that unmarshalled object, I need modify some contents and re-bind the same XML content(subset) with the Original XML.
Sample Input XML:
<Message>
<x>
</x>
<y>
</y>
<z>
</z>
<!-- Need to unmarshall this content to "Content" - java Object -->
<Content>
<Name>Robin</Name>
<Role>SM</Role>
<Status>Active</Status>
</Content>
.....
</Message>
Need to unmarshall the <Content> tag alone, by keeping the other XML part as same. Need to modify the elements in <Content> tag and bind the modified XML part with the original as shown below:
Expected Output XML:
<Message>
<x>
</x>
<y>
</y>
<z>
</z>
<!-- Need to unmarshall this content to "Content" - java Object -->
<Content>
<Name>Robin_123</Name>
<Role>Senior Member</Role>
<Status>1</Status>
</Content>
.....
</Message>
My Questions:
What is the possible solution for this Requirement ? (Except DOM parsing - as XML contnet is very huge)
Is there any option to do this in JAXB2.0 ?
Please provide your suggestions on this.

Consider cutting your source document down to size using the StAX API.
For the given sample, this code creates a DOM document with a root element of the Content element:
class ContentFinder implements StreamFilter {
private boolean capture = false;
#Override public boolean accept(XMLStreamReader xml) {
if (xml.isStartElement() && "Content".equals(xml.getLocalName())) {
capture = true;
} else if (xml.isEndElement() && "Content".equals(xml.getLocalName())) {
capture = false;
return true;
}
return capture;
}
}
XMLInputFactory inFactory = XMLInputFactory.newFactory();
XMLStreamReader reader = inFactory.createXMLStreamReader(inputStream);
reader = inFactory.createFilteredReader(reader, new ContentFinder());
Source src = new StAXSource(reader);
DOMResult res = new DOMResult();
TransformerFactory.newInstance().newTransformer().transform(src, res);
Document doc = (Document) res.getNode();
This can then be passed to JAXB as a DOMSource.
Similar techniques can be used when rewriting the XML on output.
JAXB doesn't seem to accept a StreamSource directly, at least in the Oracle 1.7 implementation.

You can annotate an Object property on your class with #XmlAnyElement and by default the unmapped content will be captured as a DOM nodes. If you specify a DomHandler on the #XmlAnyElement then you can control the format. Here is a link to an example where the content is kept as a String.
JAXB use String as it is

Error when using xmlstream reader

I am using xmlstreamreader in java to read attribute values and other data. This is the xml String:
<?xml version="1.0" encoding="UTF-8" standalone="no"?><samlp:AuthnReques
t xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol" AssertionConsumerServiceURL
="http://localhost:8080/consumer.jsp" **ID="abc"** **IssueInstant="2012-04-14T11:44:49
:796"** ProtocolBinding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" Version="
2.0">**<saml:Issuer xmlns:saml="urn:oasis:names:tc:SAML:2.0:assertion">http://loca
lhost:8080/saml/SProvider.jsp</saml:Issuer>**<Signature xmlns="http://www.w3.org/2
000/09/xmldsig#"><SignedInfo><CanonicalizationMethod Algorithm="http://www.w3.or
g/2001/10/xml-exc-c14n#WithComments"/><SignatureMethod Algorithm="http://www.w3.
org/2000/09/xmldsig#rsa-sha1"/><Reference URI=""><Transforms><Transform Algorith
m="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/></Transforms><DigestM
ethod Algorithm="http://www.w3.org/2001/04/xmlenc#sha256"/><DigestValue>VzKYOu1g
ert3DDrNUSO1/Au3PGeD1PEyPuJeI2GO6ec=</DigestValue></Reference></SignedInfo><Sign
atureValue>k7hVlbsEhGW5ryelSbrwWWyJq3cdyDuVeQCOqRilbky8hEk/1sHI9DNOvOlPZ7OC9bI4d
EHm46R1
CDXoXkyOoXdq+3M/HbUakHM7eNvF5+j+NUXUX9dijb/rDzq05VNHcSIDXRpvMc1IRBremi0voVqX
ZuHRn+IBeD8hSK1LXsE=</SignatureValue></Signature></samlp:AuthnRequest>
Then I tried to read the attribute ID, IssueInstant and the element Issuer. all the 3 are highlighted(actually between **) in the above string. I have used the following code:
while(reader.hasNext()){
reader.next();
if(reader.getEventType() == XMLStreamReader.START_ELEMENT){
if(reader.getLocalName().equals("AuthnRequest"))
{
String ns=reader.getNamespaceURI();
System.out.println(ns);
id=reader.getAttributeValue(ns,"ID");
rec_instant=reader.getAttributeValue(ns,"IssueInstant");
System.out.println("1"+id);
System.out.println("2"+rec_instant);
}
else if(reader.getLocalName().equals("Issuer"))
{
rec_issuer=reader.getElementText();
System.out.println("4"+reader.getElementText());
}
}
}
But I am getting the folowing output:
1null
2null
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,436]
Message: parser must be on START_ELEMENT to read next text
What is the issue?

You are using ns for the attributes, but the attributes are in fact in the null ns (they have no namespace). As for the exception, you are calling getElementText twice. This method is not a pure getter, it also advances the reader to the end element (as per its Javadoc).

As Marko suggests, the exception is due to calling getElementText() twice in a row.
If I change this:
String rec_issuer=reader.getElementText();
System.out.println("4"+reader.getElementText());
to this:
String rec_issuer = reader.getElementText();
System.out.println("4" + rec_issuer);
then I get the following output:
urn:oasis:names:tc:SAML:2.0:protocol
1null
2null
4http://localhost:8080/saml/SProvider.jsp
If I also change the getAttributeValue calls to use null instead of ns, like this:
String id = reader.getAttributeValue(null,"ID");
String rec_instant = reader.getAttributeValue(null,"IssueInstant");
I get:
urn:oasis:names:tc:SAML:2.0:protocol
1abc
22012-04-14T11:44:49:796
4http://localhost:8080/saml/SProvider.jsp
That's using your original XML.

How to get node contents from JDOM

I'm writing an application in java using import org.jdom.*;
My XML is valid,but sometimes it contains HTML tags. For example, something like this:
<program-title>Anatomy & Physiology</program-title>
<overview>
<content>
For more info click here
<p>Learn more about the human body. Choose from a variety of Physiology (A&P) designed for complementary therapies.&#160; Online studies options are available.</p>
</content>
</overview>
<key-information>
<category>Health & Human Services</category>
So my problem is with the < p > tags inside the overview.content node.
I was hoping that this code would work :
Element overview = sds.getChild("overview");
Element content = overview.getChild("content");
System.out.println(content.getText());
but it returns blank.
How do I return all the text ( nested tags and all ) from the overview.content node ?
Thanks

content.getText() gives immediate text which is only useful fine with the leaf elements with text content.
Trick is to use org.jdom.output.XMLOutputter ( with text mode CompactFormat )
public static void main(String[] args) throws Exception {
SAXBuilder builder = new SAXBuilder();
String xmlFileName = "a.xml";
Document doc = builder.build(xmlFileName);
Element root = doc.getRootElement();
Element overview = root.getChild("overview");
Element content = overview.getChild("content");
XMLOutputter outp = new XMLOutputter();
outp.setFormat(Format.getCompactFormat());
//outp.setFormat(Format.getRawFormat());
//outp.setFormat(Format.getPrettyFormat());
//outp.getFormat().setTextMode(Format.TextMode.PRESERVE);
StringWriter sw = new StringWriter();
outp.output(content.getContent(), sw);
StringBuffer sb = sw.getBuffer();
System.out.println(sb.toString());
}
Output
For more info clickhere<p>Learn more about the human body. Choose from a variety of Physiology (A&P) designed for complementary therapies.&#160; Online studies options are available.</p>
Do explore other formatting options and modify above code to your need.
"Class to encapsulate XMLOutputter format options. Typical users can use the standard format configurations obtained by getRawFormat() (no whitespace changes), getPrettyFormat() (whitespace beautification), and getCompactFormat() (whitespace normalization). "

You could try using method getValue() for the closest approximation, but what this does is concatenate all text within the element and descendants together. This won't give you the <p> tag in any form. If that tag is in your XML like you've shown, it has become part of the XML markup. It'd need to be included as <p> or embedded in a CDATA section to be treated as text.
Alternatively, if you know all elements that either may or may not appear in your XML, you could apply an XSLT transformation that turns stuff which isn't intended as markup into plain text.

Well, maybe that's what you need:
import java.io.StringReader;
import org.custommonkey.xmlunit.XMLTestCase;
import org.custommonkey.xmlunit.XMLUnit;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import org.testng.annotations.Test;
import org.xml.sax.InputSource;
public class HowToGetNodeContentsJDOM extends XMLTestCase
{
private static final String XML = "<root>\n" +
" <program-title>Anatomy & Physiology</program-title>\n" +
" <overview>\n" +
" <content>\n" +
" For more info click here\n" +
" <p>Learn more about the human body. Choose from a variety of Physiology (A&P) designed for complementary therapies.&#160; Online studies options are available.</p>\n" +
" </content>\n" +
" </overview>\n" +
" <key-information>\n" +
" <category>Health & Human Services</category>\n" +
" </key-information>\n" +
"</root>";
private static final String EXPECTED = "For more info click here\n" +
"<p>Learn more about the human body. Choose from a variety of Physiology (A&P) designed for complementary therapies.&#160; Online studies options are available.</p>";
#Test
public void test() throws Exception
{
XMLUnit.setIgnoreWhitespace(true);
Document document = new SAXBuilder().build(new InputSource(new StringReader(XML)));
List<Content> content = document.getRootElement().getChild("overview").getChild("content").getContent();
String out = new XMLOutputter().outputString(content);
assertXMLEqual("<root>" + EXPECTED + "</root>", "<root>" + out + "</root>");
}
}
Output:
PASSED: test on instance null(HowToGetNodeContentsJDOM)
===============================================
Default test
Tests run: 1, Failures: 0, Skips: 0
===============================================
I am using JDom with generics: http://www.junlu.com/list/25/883674.html
Edit: Actually that's not that much different from Prashant Bhate's answer. Maybe you need to tell us what you are missing...

If you're also generating the XML file you should be able to encapsulate your html data in <![CDATA[]]> so that it isn't parsed by the XML parser.

The problem is that the <content> node doesn't have a text child; it has a <p> child that happens to contain text.
Try this:
Element overview = sds.getChild("overview");
Element content = overview.getChild("content");
Element p = content.getChild("p");
System.out.println(p.getText());
If you want all the immediate child nodes, call p.getChildren(). If you want to get ALL the child nodes, you'll have to call it recursively.

Not particularly pretty but works fine (using JDOM API):
public static String getRawText(Element element) {
if (element.getContent().size() == 0) {
return "";
}
StringBuffer text = new StringBuffer();
for (int i = 0; i < element.getContent().size(); i++) {
final Object obj = element.getContent().get(i);
if (obj instanceof Text) {
text.append( ((Text) obj).getText() );
} else if (obj instanceof Element) {
Element e = (Element) obj;
text.append( "<" ).append( e.getName() );
// dump all attributes
for (Attribute attribute : (List<Attribute>)e.getAttributes()) {
text.append(" ").append(attribute.getName()).append("=\"").append(attribute.getValue()).append("\"");
}
text.append(">");
text.append( getRawText( e )).append("</").append(e.getName()).append(">");
}
}
return text.toString();
}
Prashant Bhate's solution is nicer though!

If you want to output the content of some JSOM node just use
System.out.println(new XMLOutputter().outputString(node))

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Bug reading CDATA from xml element in Java - java

Related

How to get the whole text of StAX XMLEvent object?

unable to parse a node from xml string with dom4j

JAXB Unmarshalling an subset of Unknown XML content

Error when using xmlstream reader

How to get node contents from JDOM

Categories

Resources