Traverse (Read) static DOM document object is thread-safe or not? - java

I created a DOM document static object, such as below, it uses javax.xml.parsers.* and org.w3c.dom.* API:
DocumentBuilderFactory docBldrFactry = DocumentBuilderFactory.newInstance();
docBldrObj = docBldrFactry.newDocumentBuilder();
File file = new File(fileDirectory);
// Parse the XML file and return a DOM document object
document = docBldrObj.parse(file);
//FYI, document is declared as private static org.w3c.dom.Document document elsewhere.
Later after created above, If this static DOM document object shared by threads, but all threads are just read (traverse) this document, is it thread safe?
I assume it is since read should not modify this shared state, but not sure whether internally there is some magic about it which I don't know.
Thanks

The problem solved by writing own simple Document structure. E.g, clone the DOM document into that, which is thread-safe on read operations.
FYI, for my own purpose, when cloning the document, I don't clone everything but the information based on my need (COMMENT_NODE, TEXT_NODE, ELEMENT_NODE, attributes).

Related

How to append the Elements to existing Nodelist during the parse of XSD file in Java DocumentBuilder

Application Background:
Basically, I am building an application in which I am parsing the XML document using SAX PARSER for every incoming tag I would like to know its datatype and other information so I am using the XSD associated with that XML file to get the datatype and other information related to those tags. Hence, I am parsing the XSD file and storing all the information in Hashmap so that whenever the tag comes I can pass that XML TAG as key to my Hashmap and obtain the value (information associated with it which is obtained during XSD parsing) associated with it.
Problem I am facing:
As of now, I am able to parse my XSD using the DocumentBuilderFactory. But during the collection of elements, I am able to get only one type of element and store it in my NODELIST such as elements with tag name "xs:element". My XSD also has some other element type such as "xs:complexType", xs:any etc. I would like to read all of them and store them into a single NODELIST which I can later loop and push to HASHMAP. However I am unable to add any additional elements to my NODELIST after adding one type to it:
Below code will add tags with the xs:element
NodeList list = doc.getElementsByTagName("xs:element");
How can I add the tags with xs:complexType and xs:any to the same NODELIST?
Is this a good way to find the datatype and other attributes of the XSD or any other better approach available. As I may need to hit the HASHMAP many times for every TAG in XML will there be a performance issue?
Is DocumentBuilderFactory is a good approach to parse XML or are there any better libaraies for XSD parsing? I looked into Xerces2 but could not find any good example and I got struck and posted the question here.
Following is my code for parsing the XSD using DocumentBuilderFactory:
public class DOMParser {
private static Map<String, Element> xmlTags = new HashMap<String, Element>();
public static void main(String[] args) throws URISyntaxException, SAXException, IOException, ParserConfigurationException {
String xsdPath1 = Paths.get(Xerces2Parser.class.getClassLoader().getResource("test.xsd").toURI()).toFile().getAbsolutePath();
String filePath1 = Path.of(xsdPath1).toString();
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse(new File(filePath1));
NodeList list = doc.getElementsByTagName("xs:element");
System.out.println(list.getLength());
// How to add the xs:complexType to same list as above
// list.add(doc.getElementsByTagName("xs:complexType"));
// list = doc.getElementsByTagName("xs:complexType");
// Loop and add data to Map for future lookups
for (int i = 0; i < list.getLength(); i++) {
Element element = (Element) list.item(i);
if (element.hasAttributes()) {
xmlTags.put(element.getAttribute("name"), element);
}
}
}
}
I don't know what you are trying to achieve (you have described the code you are writing, not the problem it is designed to solve) but what you are doing seems misguided. Trying to get useful information out of an XSD schema by parsing it at the XML level is really hard work, and it's clear from the questions you are asking that you haven't appreciated the complexities of what you are attempting.
It's hard to advise you on the low-level detail of maintaining hash maps and node lists when we don't understand what you are trying to achieve. What information are you trying to extract from the schema, and why?
There are a number of ways of getting information out of a schema at a higher level. Xerces has a Java API for accessing a compiled schema. Saxon has an XML representation of compiled schemas called SCM (the difference from raw XSD is that all the work of expanding xs:include and xs:import, expanding attribute groups, model groups, and substitution groups etc has been done for you). Saxon also has an XPath API (a set of extension functions) for accessing compiled schema information.

Special characters creates problem while writing xml

first of all please excuse my shallow understanding into coding as I am a business analyst. Now my question. I am writing java code to convert a csv into xml. I am able to read csv successfully into objects. However, while writing the xml, when special a space or "=" is encounteredan error is thrown.
Piece of the problematic code, I have imporovised the value in create element just to highlight the problem. In actual I am getting this value from an object:-
DocumentBuilderFactory documentFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentFactory.newDocumentBuilder();
Document xmlDocument= documentBuilder.newDocument();
Element root = xmlDocument.createElement("Media NationalGroupId="8" AllFTA="1002" AllSTV="1001");
xmlDocument.appendChild(root);
My xml should look something like this
<Media DateCreated="20200224 145251" NationalGroupId="8" AllFTA="1002" AllSTV="1001" AllTV="1000" NextId="1000000">
createElement should only receive Media as the argument.
To add the other attributes (DateCreated, NationalGroupId, etc), you need to call setAttribute on root, one by one.

DOM handling using marklogic java client api

I'm new to MarkLogic java API and trying to create an xml document where Document is constructed using DocumentBuilderFactory and DocumentBuilder and everything is working fine with the following code.
DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder=factory.newDocumentBuilder();
Document doc=docBuilder.newDocument(); //Works fine
Now since I have doc reference I can call doc.CreateElement() to create an xml structured document.
In the same way if I refer a Document using DOMHandle from com.marklogic.client.io.DOMHandle;
DOMHandle handle=new DOMHandle();
Document doc=handle.get();
doc.createElement(); //NULL POINTER EXCEPTION
Now the document reference created from handle gives an null pointer exception.
I understood that I am getting document from a getter method which returns an empty document but I am not trying to access anything from the empty document. Instead trying to create a document element using doc.createElement() where null pointer exception is arising.
Please explain the issue.
A DOMHandle represents XML content as a DOM Document. It is not a factory which would create a DOM Document. The handle is just an adapter that wraps a document that we read from the database or create in Java. Unless explicitly set with the constructor DOMHandle(Document content) or with the method public void set(Document content), the content of the DOMHandle would be null and hence the NullPointerException. You should probably do one of these
DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder=factory.newDocumentBuilder();
Document doc=docBuilder.newDocument();
// Build the Document completely and assign it to the handle and use the handle
DOMHandle handle = new DOMHandle();
handle.set(doc);
// or DOMHandle handle = new DOMHandle(doc);
// or DOMHandle handle = new DOMHandle().with(doc);

Mockito: How to mock XML Document object to retrieve value from XPath

I am trying to mock a org.w3c.dom.Document object such that calling XPath.evaluate() should return a defined value, e.g., foo, as below:
Document doc = Mockito.mock(Document.class);
Mockito.when(XPathUtils.xpath.evaluate("/MyNode/text()", doc, XPathConstants.STRING)).thenReturn("foo");
I shall pass the doc object to the target method that will extract the textual contents of node MyNode as foo.
I have tried mocking nodes and setting in the doc object as follows:
Node nodeMock = mock(Node.class);
NodeList list = Mockito.mock(NodeList.class);
Element element = Mockito.mock(Element.class);
Mockito.when(list.getLength()).thenReturn(1);
Mockito.when(list.item(0)).thenReturn(nodeMock);
Mockito.when(doc.getNodeType()).thenReturn(Node.DOCUMENT_NODE);
Mockito.when(element.getNodeType()).thenReturn(Node.ELEMENT_NODE);
Mockito.when(nodeMock.getNodeType()).thenReturn(Node.TEXT_NODE);
Mockito.when(doc.hasChildNodes()).thenReturn(false);
Mockito.when(element.hasChildNodes()).thenReturn(true);
Mockito.when(nodeMock.hasChildNodes()).thenReturn(false);
Mockito.when(nodeMock.getNodeName()).thenReturn("MyNode");
Mockito.when(nodeMock.getTextContent()).thenReturn("MyValue");
Mockito.when(element.getChildNodes()).thenReturn(list);
Mockito.when(doc.getDocumentElement()).thenReturn(element);
But this is giving error like:
org.mockito.exceptions.misusing.WrongTypeOfReturnValue: String cannot
be returned by hasChildNodes() hasChildNodes() should return boolean
Is my approach correct and I am missing just another mock, or should I approach it differently? Please help.
Don't mock types you don't own !, it's wrong.
To avoid repeating here's an answer that explains why https://stackoverflow.com/a/28698223/48136
EDIT : What I mean is that the code should have usable builder methods (either in the production or in the test classpath) that should be able to create a real Document whatever the source of that document, but certainly not a mock.
For exemple this factory method or builder could use the DocumentBuilder this way :
class FakeXMLBuilder {
static Document fromString(String xml) {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
return dBuilder.parse(new ByteArrayInputStream(xml.getBytes("UTF_8")));
}
}
Of course this has to be tailored to the need of the project, and this can be customized a lot more. In my current project we have a lot of test builders that can create objects, json, etc. For exemple :
Leg leg = legWithRandomId().contact("Bob").duration(234, SECONDS).build();
String leg = legWithRandomId().contact("Bob").duration(234, SECONDS).toJSON();
String leg = legWithRandomId().contact("Bob").duration(234, SECONDS).toXML();
Whatever leg = legWithRandomId().contact("Bob").duration(234, SECONDS).to(WhateverFactory.whateverFactory());
Use Jsoup.parse() with a test XML String. Something like the following should work, for testing some instance method I've assumed is testClassInstance.readFromConnection(String url):
// Turn a block of XML (static String or one read from a file) into a Document
Document document = Jsoup.parse(articleXml);
// Tell Mockito what to do with the Document
Mockito.when(testClassInstance.readFromConnection(Mockito.any()))
.thenReturn(document);
I'm used to referring to this as "mocking" a Document, but it's just creating and using a real Document, which is then used in mocking other methods. You can construct the xml version of document however you like, or use its regular setters to manipulate it for testing whatever your code is supposed to do with the file. You could also replace that mess of reading-the-file with a constant string, provided it's short enough to be manageable.

Stepts for creating a Document object

I am learning about XML in Java and every time I want to use a Document object I have to write:
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
I know how it works further, but what actually happens in those 3 lines? Why do I need a DocumentBuilderFactory and then a DocumentBuilder to build a Document?
Update: Could you give me an example where I shouldn't write the first 2 lines exactly the same? I don't see the point of instantiating 2 more objects for a new Document. What is their effective role?
1) Factory (creates something) can create a DocumentBuilder
Obtain a new instance of a DocumentBuilderFactory. This static method
creates a new factory instance.
2)
Creates a new instance of a DocumentBuilder using the currently
configured parameters.
3)
Parse the content of the given file as an XML document and return a
new DOM Document object. An IllegalArgumentException is thrown if the
File is null null.
Source
This is how the library is build. Without the factory you will not be able to create a new DocumentBuilder object and thus will not be able to parse a file
The approach you use for the XML parsing is known as the Document Object Model (DOM) approach (note: it is not the only one available) and a part of Java API for XML Processing (JAXP). Quoting:
Designed to be flexible, JAXP allows you to use any XML-compliant
parser from within your application
To allow the programmer to use any XML parser, the system needs to avoid using a specific implementation. To be able to do that it decides the implementation during runtime using a design pattern known as the Factory pattern which (quoting) "...deals with the problem of creating objects (products) without specifying the exact class of object that will be created."
So when you use DocumentBuilder dBuilder = dbFactory.newDocumentBuilder(); the returned instance is not actually a DocumentBuilder (it couldn't be - this is an abstract class) but an instance of another class that extends DocumentBuilder. You could print the actual class in runtime to verify that.
// returns com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl in my system
System.out.println( dbFactory.getClass().getName() );
// returns com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl in my system
System.out.println( dBuilder.getClass().getName() );
Examples where you wouldn't need to use the first two lines, would be the cases where you would use a specific parsing implementation directly (and thus introducing a third party dependency in your project).
I hope this helps
Right from the javadocs:
DocumentBuilderFactory.newInstance()
Obtain a new instance of a DocumentBuilderFactory. This static method
creates a new factory instance. This method uses the following ordered
lookup procedure to determine the DocumentBuilderFactory
implementation class to load:
Use the javax.xml.parsers.DocumentBuilderFactory system property.
Use the properties file "lib/jaxp.properties" in the JRE directory. This configuration file is in standard java.util.Properties format
and contains the fully qualified name of the implementation class
with the key being the system property defined above. The
jaxp.properties file is read only once by the JAXP implementation and
it's values are then cached for future use. If the file does not
exist when the first attempt is made to read from it, no further
attempts are made to check for its existence. It is not possible to
change the value of any property in jaxp.properties after it has been
read for the first time.
Use the Services API (as detailed in the JAR specification), if available, to determine the classname. The Services API will look for
a classname in the file Platform default DocumentBuilderFactory
instance.
META-INF/services/javax.xml.parsers.DocumentBuilderFactory in jars
available to the runtime.
Platform default DocumentBuilderFactory instance.
Once an application has obtained a reference to a
DocumentBuilderFactory it can use the factory to configure and obtain
parser instances.
DocumentBuilderFactory.newDocumentBuilder()
Creates a new instance of a DocumentBuilder using the currently
configured parameters.
Returns: A new instance of a DocumentBuilder.
Throws: ParserConfigurationException - if a DocumentBuilder cannot be
created which satisfies the configuration requested.
DocumentBuilder.parse()
Parse the content of the given file as an XML document and return a
new DOM Document object. An IllegalArgumentException is thrown if the
File is null null.
Parameters: f - The file containing the XML to parse.
Returns: A new DOM Document object.
Throws: IOException - If any IO errors occur. SAXException - If any
parse errors occur. IllegalArgumentException - When f is null

Categories