Using JAXB to extract content of several XML elements as text

Using JAXB to extract content of several XML elements as text - java

I have the following XML file
<items>
<title>blabla</title>
<text>123</text>
</items>
I'm unmarshalling the XML to the next java object by JAXB and XmlAnyElement annotation with two classes implementing DOMHandler. I want to extract the inner XML of elements "title" and "text" as Strings.
public class Item implements Serializable {
private String title;
private String text;
public String getTitle() {
return title;
}
#XmlAnyElement(value = TitleHandler.class)
public void setTitle(String title) {
this.title = title;
}
public String getText() {
return text;
}
#XmlAnyElement(value = TextHandler.class)
public void setText(String text) {
this.text = text;
}
}
But when i put a breakpoints in the method "String getElement(StreamResult rt)" of the TitleHandler and the TextHandler, both of elements use TextHandler.class for unmarshalling. Element "title" use TextHandler instead of TitleHandler.
Any help will be greatly appriciated
UPDATE
Restriction usage constraints for XmlAnyElement annotation:
There can be only one XmlAnyElement annotated JavaBean property in a class and its super classes.

The #XmlAnyElement annotation is used as a catch-all for elements in the XML input that aren't mapped by name to some specific property. That's why there can be only one such annotation per class (including inherited properties). What you want is this:
public class Item implements Serializable {
private String title;
private String text;
public String getTitle() {
return title;
}
#XmlElement(name = "title")
#XmlJavaTypeAdapter(value = TitleHandler.class)
public void setTitle(String title) {
this.title = title;
}
public String getText() {
return text;
}
#XmlElement(name = "text")
#XmlJavaTypeAdapter(value = TextHandler.class)
public void setText(String text) {
this.text = text;
}
}
The #XmlElement annotation indicates that the corresponding property is mapped to elements with that name. So the Java text property derives from the XML <text> element, and the title property from the <title> element. Since the names of the properties and the elements are the same, this is also the default behavior without the #XmlElement annotations, so you could leave them out.
In order to handle the conversion from XML content to a String instead of an actual structure (like a Title class or Text class) you'll need an adapter. that's what the #XmlJavaTypeAdapter annotation is for. It specifies how marshalling/unmarshalling for that property must be handled.
See this useful answer: https://stackoverflow.com/a/18341694/630136
An example of how you could implement TitleHandler.
import java.io.StringReader;
import java.io.StringWriter;
import javax.xml.bind.annotation.adapters.XmlAdapter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMResult;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
public class TitleHandler extends XmlAdapter<Object, String> {
/**
* Factory for building DOM documents.
*/
private final DocumentBuilderFactory docBuilderFactory;
/**
* Factory for building transformers.
*/
private final TransformerFactory transformerFactory;
public TitleHandler() {
docBuilderFactory = DocumentBuilderFactory.newInstance();
transformerFactory = TransformerFactory.newInstance();
}
#Override
public String unmarshal(Object v) throws Exception {
// The provided Object is a DOM Element
Element titleElement = (Element) v;
// Getting the "a" child elements
NodeList anchorElements = titleElement.getElementsByTagName("a");
// If there's none or multiple, return empty string
if (anchorElements.getLength() != 1) {
return "";
}
Element anchor = (Element) anchorElements.item(0);
// Creating a DOMSource as input for the transformer
DOMSource source = new DOMSource(anchor);
// Default transformer: identity tranformer (doesn't alter input)
Transformer transformer = transformerFactory.newTransformer();
// This is necessary to avoid the <?xml ...?> prolog
transformer.setOutputProperty("omit-xml-declaration", "yes");
// Transform to a StringWriter
StringWriter stringWriter = new StringWriter();
StreamResult result = new StreamResult(stringWriter);
transformer.transform(source, result);
// Returning result as string
return stringWriter.toString();
}
#Override
public Object marshal(String v) throws Exception {
// DOM document builder
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
// Creating a new empty document
Document doc = docBuilder.newDocument();
// Creating the <title> element
Element titleElement = doc.createElement("title");
// Setting as the document root
doc.appendChild(titleElement);
// Creating a DOMResult as output for the transformer
DOMResult result = new DOMResult(titleElement);
// Default transformer: identity tranformer (doesn't alter input)
Transformer transformer = transformerFactory.newTransformer();
// String reader from the input and source
StringReader stringReader = new StringReader(v);
StreamSource source = new StreamSource(stringReader);
// Transforming input string to the DOM
transformer.transform(source, result);
// Return DOM root element (<title>) for JAXB marshalling to XML
return doc.getDocumentElement();
}
}
If the type for unmarshalling input/marshalling output is left as Object, JAXB will provide DOM nodes. The above uses XSLT transformations (though without an actual stylesheet, just an "identity" transform) to turn the DOM input into a String and vice-versa. I've tested it on a minimal input document and it works for both XML to an Item object and the other way around.
EDIT:
The following version will handle any XML content in <title> rather than expecting a single <a> element. You'll probably want to turn this into an abstract class and then have TitleHander and TextHandler extend it, so that the currently hardcoded <title> tags are provided by the implementation.
import java.io.StringReader;
import java.io.StringWriter;
import javax.xml.bind.annotation.adapters.XmlAdapter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMResult;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
public class TitleHandler extends XmlAdapter<Object, String> {
/**
* Factory for building DOM documents.
*/
private final DocumentBuilderFactory docBuilderFactory;
/**
* Factory for building transformers.
*/
private final TransformerFactory transformerFactory;
/**
* XSLT that will strip the root element. Used to only take the content of an element given
*/
private final static String UNMARSHAL_XSLT = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n" +
"<xsl:transform xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\" version=\"1.0\">\n" +
"\n" +
" <xsl:output method=\"xml\" omit-xml-declaration=\"yes\" />\n" +
"\n" +
" <xsl:template match=\"/*\">\n" +
" <xsl:apply-templates select=\"#*|node()\"/>\n" +
" </xsl:template>\n" +
"\n" +
" <xsl:template match=\"#*|node()\">\n" +
" <xsl:copy>\n" +
" <xsl:apply-templates select=\"#*|node()\"/>\n" +
" </xsl:copy>\n" +
" </xsl:template>\n" +
" \n" +
"</xsl:transform>";
public TitleHandler() {
docBuilderFactory = DocumentBuilderFactory.newInstance();
transformerFactory = TransformerFactory.newInstance();
}
#Override
public String unmarshal(Object v) throws Exception {
// The provided Object is a DOM Element
Element rootElement = (Element) v;
// Creating a DOMSource as input for the transformer
DOMSource source = new DOMSource(rootElement);
// Creating a transformer that will strip away the root element
StreamSource xsltSource = new StreamSource(new StringReader(UNMARSHAL_XSLT));
Transformer transformer = transformerFactory.newTransformer(xsltSource);
// Transform to a StringWriter
StringWriter stringWriter = new StringWriter();
StreamResult result = new StreamResult(stringWriter);
transformer.transform(source, result);
// Returning result as string
return stringWriter.toString();
}
#Override
public Object marshal(String v) throws Exception {
// DOM document builder
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
// Creating a new empty document
Document doc = docBuilder.newDocument();
// Creating a DOMResult as output for the transformer
DOMResult result = new DOMResult(doc);
// Default transformer: identity tranformer (doesn't alter input)
Transformer transformer = transformerFactory.newTransformer();
// String reader from the input and source
StringReader stringReader = new StringReader("<title>" + v + "</title>");
StreamSource source = new StreamSource(stringReader);
// Transforming input string to the DOM
transformer.transform(source, result);
// Return DOM root element for JAXB marshalling to XML
return doc.getDocumentElement();
}
}

Related

How to keep xml attribute in fasterXml Jackson XmlMapper?

I am writing test cases which test generated xml structures. I am supplying the xml structures via an xml file. I am using currently FasterXMLs Jackson XmlMapper for reading and testing for expected xml.
Java: adoptopenjdk 11
Maven: 3.6.3
JUnit (Jupiter): 5.7.1 (JUnit Jupiter)
Mapper: com.fasterxml.jackson.dataformat.xml.XmlMapper
Dependency: <dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-xml</artifactId>
<version>2.11.4</version>
</dependency>
I have an xml file which contains expected xml (e.g.: /test/testcases.xml:
<testcases>
<testcase1>
<response>
<sizegroup-list>
<sizeGroup id="1">
<sizes>
<size>
<technicalSize>38</technicalSize>
<textSize>38</textSize>
<size>
<size>
<technicalSize>705</technicalSize>
<textSize>110cm</textSize>
<size>
</sizes>
</sizeGroup-list>
</response>
</testcase1>
</testcases>
My code looks like this (simplified):
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.dataformat.xml.XmlMapper;
import org.apache.commons.lang3.StringUtils;
import org.junit.jupiter.api.Test;
import java.io.FileInputStream;
import java.io.InputStream;
import static org.junit.jupiter.api.Assertions.assertEquals;
class Testcases {
private static final String OBJECT_NODE_START_TAG = "<ObjectNode>";
private static final String OBJECT_NODE_CLOSE_TAG = "</ObjectNode>";
private static final String TESTCASES_XML = "/test/testcases.xml";
private static final XmlMapper XML_MAPPER = new XmlMapper();
#Test
void testcase1() throws Exception {
final String nodePtr = "/testcase1/response";
try (InputStream inputStream = new FileInputStream(TESTCASES_XML)) {
JsonNode rootNode = XML_MAPPER.readTree(inputStream);
JsonNode subNode = rootNode.at(nodePtr);
if (subNode.isMissingNode()) {
throw new IllegalArgumentException(
"Node '" + nodePtr + "' not found in file " + TESTCASES_XML);
}
String expectedXml = XML_MAPPER.writeValueAsString(subNode);
expectedXml = unwrapObjectNode(expectedXml);
// Testcalls, e.g. someService.generateXmlData()
String generatedXml = "...";
assertEquals(expectedXml, generatedXml);
};
}
// FIXME: Ugly: Tell XmlMapper to unwrap ObjectNode automatically
private String unwrapObjectNode(String xmlString) {
if(StringUtils.isBlank(xmlString)) {
return xmlString;
}
if(xmlString.startsWith(OBJECT_NODE_START_TAG)) {
xmlString = xmlString.substring(OBJECT_NODE_START_TAG.length());
if(xmlString.endsWith(OBJECT_NODE_CLOSE_TAG)) {
xmlString = xmlString.substring(0, xmlString.length() - OBJECT_NODE_CLOSE_TAG.length());
}
}
return xmlString;
}
}
But the returned expected xml looks like this:
<sizegroup-list>
<sizeGroup>
<id>1</id>
<sizes>
<size>
<technicalSize>38</technicalSize>
<textSize>38</textSize>
<size>
<size>
<technicalSize>705</technicalSize>
<textSize>110cm</textSize>
<size>
</sizes>
</sizeGroup-list>
The former attribute id of the element sizeGroup gets mapped as a sub element and fails my test. How can I tell XmlMapper to keep the attributes of xml elements?
Best regards,
David

i was not able to tell XmlMapper to keep the attributes of xml tags from the loaded xml file. But i have found another way by parsing xml test data with xPath expressions.
A simple String.equals(...) proofed to be unreliable if expected and actual xml contain different whitespaces or xml tag order. Luckily there is a library for comparing xml. XmlUnit!
Additional dependency (seems to be present as transitive dependency as of Spring Boot 2.6.x):
<dependency>
<groupId>org.xmlunit</groupId>
<artifactId>xmlunit-core</artifactId>
<!-- version transitive in spring-boot-starter-parent 2.6.7 -->
<version>2.8.4</version>
<scope>test</test>
</dependency>
ResourceUtil.java:
import org.apache.commons.lang3.StringUtils;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import java.io.IOException;
import java.io.InputStream;
import java.io.StringWriter;
import java.net.URL;
public class ResourceUtil {
private static final DocumentBuilderFactory XML_DOCUMENT_BUILDER_FACTORY = DocumentBuilderFactory.newInstance();
private static final XPathFactory X_PATH_FACTORY = XPathFactory.newInstance();
private ResourceUtil() {}
/** Reads an xml file named after the testcase class (e.g. MyTestcase.class
* -> MyTestcase.xml) and parses the data at the supplied xPath expression. */
public static String xmlData(Class<?> testClass, String xPathExpression) {
return getXmlDocumentAsString(testClass, testClass.getSimpleName() + ".xml", xPathExpression);
}
/** Reads the specified xml file and parses the data at the supplied xPath
* expression. The xml file is expected in the same package/directory as
* the testcase class. */
private static String getXmlDocumentAsString(Class<?> ctxtClass, String fileName, String xPathExpression) {
Document xmlDocument = getXmlDocument(ctxtClass, fileName);
XPath xPath = X_PATH_FACTORY.newXPath();
try {
Node subNode = (Node)xPath.compile(xPathExpression).evaluate(xmlDocument, XPathConstants.NODE);
return nodeToString(subNode.getChildNodes());
} catch (TransformerException | XPathExpressionException var6) {
throw new IllegalArgumentException("Unable to read value of '" + xPathExpression + "' from file " + fileName, var6);
}
}
/** Reads the specified xml file and returns a Document instance of the
* xml data. The xml file is expected in the same package/directory as
* the testcase class. */
private static Document getXmlDocument(Class<?> ctxtClass, String xmlFileName) {
InputStream inputStream = getResourceFile(ctxtClass, xmlFileName);
try {
DocumentBuilder builder = XML_DOCUMENT_BUILDER_FACTORY.newDocumentBuilder();
return builder.parse(inputStream);
} catch (SAXException | IOException | ParserConfigurationException var4) {
throw new IllegalStateException("Unable to read xml content from file '" + xmlFileName + "'.", var4);
}
}
/** Returns an InputStream of the specified xml file. The xml file is
* expected in the same package/directory as the testcase class. */
private static InputStream getResourceFile(Class<?> ctxtClass, String fileName) {
String pkgPath = StringUtils.replaceChars(ctxtClass.getPackage().getName(), ".", "/");
String filePath = "/" + pkgPath + "/" + fileName;
URL url = ctxtClass.getResource(filePath);
if (url == null) {
throw new IllegalArgumentException("Resource file not found: " + filePath);
}
return ResourceTestUtil.class.getResourceAsStream(filePath);
}
/** Deserializes a NodeList to a String with (formatted) xml. */
private static String nodeToString(NodeList nodeList) throws TransformerException {
StringWriter buf = new StringWriter();
Transformer xform = TransformerFactory.newInstance().newTransformer(getXsltAsResource());
xform.setOutputProperty("omit-xml-declaration", "yes");
xform.setOutputProperty("indent", "no");
for(int i = 0; i < nodeList.getLength(); ++i) {
xform.transform(new DOMSource(nodeList.item(i)), new StreamResult(buf));
}
return buf.toString().trim();
}
/** Returns a Source of an XSLT file for formatting xml data */
private static Source getXsltAsResource() {
return new StreamSource(ResourceTestUtil.class.getResourceAsStream("xmlstylesheet.xslt"));
}
xmlstylesheet.xslt (works for me, you may alter to your preferences):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:output method="xml" encoding="UTF-8"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
MyTestcase.java:
import org.xmlunit.builder.DiffBuilder;
import org.xmlunit.diff.DefaultNodeMatcher;
import org.xmlunit.diff.Diff;
import org.xmlunit.diff.ElementSelectors;
import static ResourceUtil.xmldata;
public class MyTestcase {
#Test
void testcase1() {
// Execute logic to generate xml
String xml = ...
assertXmlEquals(xmlData(getClass(), "/test/testcase1/result"), xml);
}
/** Compare xml using XmlUnit assertion. Expected and actual xml need
* to be equal in content (ignoring whitespace and xml tag order) */
void assertXmlEquals(String expectedXml, String testXml) {
Diff diff = DiffBuilder.compare(expectedXml)
.withTest(testXml)
.ignoreWhitespace()
.checkForSimilar()
.withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byNameAndText, ElementSelectors.byName))
.build();
assertFalse(diff.fullDescription(), diff.hasDifferences());
}
}
MyTestcase.xml:
<test>
<testcase1>
<result>
<myData>
...
</myData>
</result>
</testcase1>
</test>
Best regards,
David

How to pass integer and date in a node element for XML

import java.io.File;
import java.sql.Date;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
public class XmlOutput1 {
public static void main(String[] args) {
try
{
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
Element rootElement = doc.createElement("Employee");
doc.appendChild(rootElement);
rootElement.appendChild(getEmployee(doc, "1234", "Anupam", "Engineer", "AManager", "17/10/2014","10000","email"));
rootElement.appendChild(getEmployee(doc, "1235", "Anirban", "Doctor", "BManager", "25/10/2014","20000","phone"));
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
StreamResult console = new StreamResult(System.out);
StreamResult console1 = new StreamResult(new File("C:\\Users\\anupam.a.mukherjee\\Desktop\\anupam1.xml"));
transformer.transform(source, console);
transformer.transform(source,console1);
}
catch (Exception e) {
e.printStackTrace();
}
}
private static Node getEmployee(Document document, String empno, String name, String job, String manager,String date,String salary,String communication)
{
Element employee = document.createElement("Employee");
employee.setAttribute("empno", empno);
employee.appendChild(getEmployeeElements(document, employee, "name", name));
employee.appendChild(getEmployeeElements(document, employee, "job", job));
employee.appendChild(getEmployeeElements(document, employee, "manager", manager));
employee.appendChild(getEmployeeElements(document, employee, "date", date));
employee.appendChild(getEmployeeElements(document, employee, "salary", salary));
employee.appendChild(getEmployeeElements(document, employee, "communication", communication));
return employee;
}
private static Node getEmployeeElements(Document document, Element element, String name, String value) {
Element node = document.createElement(name);
node.appendChild(document.createTextNode(value));
return node;
}
private static Node getEmployeeElements1(Document document, Element element, int name, int value) {
Element node = document.createElement(name);
node.appendChild(document.createTextNode(value));
return node;
}
}
Now my question is in the second function that is getEmployeeElements1,its giving error,because node can accept only string,so please tell me the procedure to create node function of integer,date format,double..etc..

For the name: an Element is a Node, and the string expected as parameter is the tag name. It has no real sense to have a tag name as integer.
For the value: DOM representation is minimal and does not handle advance typing, so you have to handle on your code the integer/date/double data checking, and store it as a String into the DOM document.
You can use frameworks that generate code and provide containers with typed data so that you are sure to store a correct value in a node (ex: JAXB), but while you use native DOM, you will have to manipulate only text data.
Here you can simply use methods like static String Integer.toString(int i) to fill you DOM object.

File name truncation in java.io.File(String) if the file name contains '#'

I am trying to create a new file using java.io.File(String). If the filepath string contains '#' symbol means, created file's name gets truncated. Please anyone explain me why it's happening and how to create the filename with '#' from java.
My code.
new File("d:\\file#test.xml");
Expected output:
file#test.xml
Real output:
file
Note: I need to create the file in both windows and Unix file system.
Normally both windows and Unix systems are allowed to create a filename with #.
Edited:
Thanks to all your reply. yes, problem is not in java.io.File(String). Actually i am getting this problem when i am trying to create the xml file using xml transformer.
please find the full code below.
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
public class XMLWriterDOM {
public static void main(String[] args) {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder;
try {
dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.newDocument();
//add elements to Document
Element rootElement =
doc.createElementNS("http://www.journaldev.com/employee", "Employees");
//append root element to document
doc.appendChild(rootElement);
//append first child element to root element
rootElement.appendChild(getEmployee(doc, "1", "Pankaj", "29", "Java Developer", "Male"));
//for output to file,
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
//for pretty print
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(doc);
StreamResult file = new StreamResult(new File("d:\\\\emps#1.xml"));
//write data
transformer.transform(source, file);
System.out.println("DONE");
} catch (Exception e) {
e.printStackTrace();
}
}
private static Node getEmployee(Document doc, String id, String name, String age, String role,
String gender) {
Element employee = doc.createElement("Employee");
//set id attribute
employee.setAttribute("id", id);
//create name element
employee.appendChild(getEmployeeElements(doc, employee, "name", name));
//create age element
employee.appendChild(getEmployeeElements(doc, employee, "age", age));
//create role element
employee.appendChild(getEmployeeElements(doc, employee, "role", role));
//create gender element
employee.appendChild(getEmployeeElements(doc, employee, "gender", gender));
return employee;
}
//utility method to create text node
private static Node getEmployeeElements(Document doc, Element element, String name, String value) {
Element node = doc.createElement(name);
node.appendChild(doc.createTextNode(value));
return node;
}
}
Please suggest me to solve this issue.

I can't reproduce your problem under Mac OS X.
import java.nio.file.Path;
import java.nio.file.Paths;
public class Main {
public static void main(String[] args) {
String temp = System.getProperty("java.io.tmpdir");
Path path = Paths.get(temp, "foo#bar.txt");
System.out.println(path.toAbsolutePath());
}
}
Output:
/var/folders/qm/0t0z2hfx6lb53gf7d8h2srzm0000gp/T/foo#bar.txt
What is the code that makes the output?

Using JDOM to Parse XML file with external DTD that has not been declared in the XML file

In my XML file I have some entities such as ’
So I have created a DTD tag for my XML document to define these entities. Below is the Java code used to read the XML file.
SAXBuilder builder = new SAXBuilder();
URL url = new URL("http://127.0.0.1:8080/sample/subject.xml");
InputStream stream = url.openStream();
org.jdom.Document document = builder.build(stream);
Element root = document.getRootElement();
Element name = root.getChild("name");
result = name.getText();
System.err.println(result);
How can I change the Java code to retrieve a DTD over HTTP to allow the parsing of my XML document to be error free?
Simplified example of the xml document.
<main>
<name>hello ‘ world ’ foo & bar </name>
</main>

One way to do this would be to read the document and then validate it with the transformer:
import java.net.URL;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
public class ValidateWithExternalDTD {
private static final String URL = "http://127.0.0.1:8080/sample/subject.xml";
private static final String DTD = "http://127.0.0.1/YourDTD.dtd";
public static void main(String args[]) {
try {
DocumentBuilderFactory factory= DocumentBuilderFactory.newInstance();
factory.setValidating(true);
DocumentBuilder builder = factory.newDocumentBuilder();
// Set the error handler
builder.setErrorHandler(new org.xml.sax.ErrorHandler() {
public void fatalError(SAXParseException spex)
throws SAXException {
// output error and exit
spex.printStackTrace();
System.exit(0);
}
public void error(SAXParseException spex)
throws SAXParseException {
// output error and continue
spex.printStackTrace();
}
public void warning(SAXParseException spex)
throws SAXParseException {
// output warning and continue
spex.printStackTrace();
}
});
// Read the document
URL url = new URL(ValidateWithExternalDTD.URL);
Document xmlDocument = builder.parse(url.openStream());
DOMSource source = new DOMSource(xmlDocument);
// Use the tranformer to validate the document
StreamResult result = new StreamResult(System.out);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, ValidateWithExternalDTD.DTD);
transformer.transform(source, result);
// Process your document if everything is OK
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
Another way would be to replace the XML title with the XML title plus the DTD reference
Replace this:
<?xml version = "1.0"?>
with this:
<?xml version = "1.0"?><!DOCTYPE ...>
Of course you would replace the first occurance only and not try to go through the whole xml document
You have to instantiate the SAXBuilder by passing true(validate) to its constructor:
SAXBuilder builder = new SAXBuilder(true);
or call:
builder.setValidation(true)

Is there a way to parse XML via SAX/DOM with line numbers available per node

I already have written a DOM parser for a large XML document format that contains a number of items that can be used to automatically generate Java code. This is limited to small expressions that are then merged into a dynamically generated Java source file.
So far - so good. Everything works.
BUT - I wish to be able to embed the line number of the XML node where the Java code was included from (so that if the configuration contains uncompilable code, each method will have a pointer to the source XML document and the line number for ease of debugging). I don't require the line number at parse-time and I don't need to validate the XML Source Document and throw an error at a particular line number. I need to be able to access the line number for each node and attribute in my DOM or per SAX event.
Any suggestions on how I might be able to achieve this?
P.S.
Also, I read the StAX has a method to obtain line number whilst parsing, but ideally I would like to achieve the same result with regular SAX/DOM processing in Java 4/5 rather than become a Java 6+ application or take on extra .jar files.

I know this thread is a little old (sorry), but it has taken me so long to crack this nut I had to share the solution with someone...
You only seem to be able to obtain the line numbers with SAX which doesn't build a DOM. The DOM parser does not give the line numbers, and neither does it let you near the SAX parser it is using. My solution is to do an empty XSLT transformation using a SAX source and a DOM result, but even then someone has done their best to hide this. See the code below.
I add the location information to each element as an attribute with my own namespace, so I can find elements using XPath and report where the data came from.
Hope this helps:
// The file to parse.
String systemId = "myxml.xml";
/*
* Create transformer SAX source that adds current element position to
* the element as attributes.
*/
XMLReader xmlReader = XMLReaderFactory.createXMLReader();
LocationFilter locationFilter = new LocationFilter(xmlReader);
InputSource inputSource = new InputSource(new FileReader(systemId));
// Do this so that XPath function document() can take relative URI.
inputSource.setSystemId(systemId);
SAXSource saxSource = new SAXSource(locationFilter, inputSource);
/*
* Perform an empty transformation from SAX source to DOM result.
*/
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMResult domResult = new DOMResult();
transformer.transform(saxSource, domResult);
Node root = domResult.getNode();
...
class LocationFilter extends XMLFilterImpl {
LocationFilter(XMLReader xmlReader) {
super(xmlReader);
}
private Locator locator = null;
#Override
public void setDocumentLocator(Locator locator) {
super.setDocumentLocator(locator);
this.locator = locator;
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
// Add extra attribute to elements to hold location
String location = locator.getSystemId() + ':' + locator.getLineNumber() + ':' + locator.getColumnNumber();
Attributes2Impl attrs = new Attributes2Impl(attributes);
attrs.addAttribute("http://myNamespace", "location", "myns:location", "CDATA", location);
super.startElement(uri, localName, qName, attrs);
}
}

I ran into this issue recently and I thought I'd share a ready made utility class for handling it. Works with Java 11, whereas some of Reg Whitton's code uses some now deprecated classes.
Mostly based on this article with a few tweaks. Notably, storing the line number as a the node's user data rather than setting it as an attribute.
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayDeque;
import java.util.Deque;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.xml.sax.Attributes;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class XmlDom {
public static Document readXML(InputStream is, final String lineNumAttribName) throws IOException, SAXException {
final Document doc;
SAXParser parser;
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
parser = factory.newSAXParser();
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
doc = docBuilder.newDocument();
} catch(ParserConfigurationException e){
throw new RuntimeException("Can't create SAX parser / DOM builder.", e);
}
final Deque<Element> elementStack = new ArrayDeque<>();
final StringBuilder textBuffer = new StringBuilder();
DefaultHandler handler = new DefaultHandler() {
private Locator locator;
#Override
public void setDocumentLocator(Locator locator) {
this.locator = locator; //Save the locator, so that it can be used later for line tracking when traversing nodes.
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
addTextIfNeeded();
Element el = doc.createElement(qName);
for(int i = 0;i < attributes.getLength(); i++)
el.setAttribute(attributes.getQName(i), attributes.getValue(i));
el.setUserData(lineNumAttribName, String.valueOf(locator.getLineNumber()), null);
elementStack.push(el);
}
#Override
public void endElement(String uri, String localName, String qName){
addTextIfNeeded();
Element closedEl = elementStack.pop();
if (elementStack.isEmpty()) { // Is this the root element?
doc.appendChild(closedEl);
} else {
Element parentEl = elementStack.peek();
parentEl.appendChild(closedEl);
}
}
#Override
public void characters (char ch[], int start, int length) throws SAXException {
textBuffer.append(ch, start, length);
}
// Outputs text accumulated under the current node
private void addTextIfNeeded() {
if (textBuffer.length() > 0) {
Element el = elementStack.peek();
Node textNode = doc.createTextNode(textBuffer.toString());
el.appendChild(textNode);
textBuffer.delete(0, textBuffer.length());
}
}
};
parser.parse(is, handler);
return doc;
}
}
Access the line number with
node.getUserData(lineNumAttribName);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Using JAXB to extract content of several XML elements as text - java

Related

How to keep xml attribute in fasterXml Jackson XmlMapper?

How to pass integer and date in a node element for XML

File name truncation in java.io.File(String) if the file name contains '#'

Using JDOM to Parse XML file with external DTD that has not been declared in the XML file

Is there a way to parse XML via SAX/DOM with line numbers available per node

Categories

Resources