Stepts for creating a Document object - java

I am learning about XML in Java and every time I want to use a Document object I have to write:
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
I know how it works further, but what actually happens in those 3 lines? Why do I need a DocumentBuilderFactory and then a DocumentBuilder to build a Document?
Update: Could you give me an example where I shouldn't write the first 2 lines exactly the same? I don't see the point of instantiating 2 more objects for a new Document. What is their effective role?

1) Factory (creates something) can create a DocumentBuilder
Obtain a new instance of a DocumentBuilderFactory. This static method
creates a new factory instance.
2)
Creates a new instance of a DocumentBuilder using the currently
configured parameters.
3)
Parse the content of the given file as an XML document and return a
new DOM Document object. An IllegalArgumentException is thrown if the
File is null null.
Source
This is how the library is build. Without the factory you will not be able to create a new DocumentBuilder object and thus will not be able to parse a file

The approach you use for the XML parsing is known as the Document Object Model (DOM) approach (note: it is not the only one available) and a part of Java API for XML Processing (JAXP). Quoting:
Designed to be flexible, JAXP allows you to use any XML-compliant
parser from within your application
To allow the programmer to use any XML parser, the system needs to avoid using a specific implementation. To be able to do that it decides the implementation during runtime using a design pattern known as the Factory pattern which (quoting) "...deals with the problem of creating objects (products) without specifying the exact class of object that will be created."
So when you use DocumentBuilder dBuilder = dbFactory.newDocumentBuilder(); the returned instance is not actually a DocumentBuilder (it couldn't be - this is an abstract class) but an instance of another class that extends DocumentBuilder. You could print the actual class in runtime to verify that.
// returns com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl in my system
System.out.println( dbFactory.getClass().getName() );
// returns com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl in my system
System.out.println( dBuilder.getClass().getName() );
Examples where you wouldn't need to use the first two lines, would be the cases where you would use a specific parsing implementation directly (and thus introducing a third party dependency in your project).
I hope this helps

Right from the javadocs:
DocumentBuilderFactory.newInstance()
Obtain a new instance of a DocumentBuilderFactory. This static method
creates a new factory instance. This method uses the following ordered
lookup procedure to determine the DocumentBuilderFactory
implementation class to load:
Use the javax.xml.parsers.DocumentBuilderFactory system property.
Use the properties file "lib/jaxp.properties" in the JRE directory. This configuration file is in standard java.util.Properties format
and contains the fully qualified name of the implementation class
with the key being the system property defined above. The
jaxp.properties file is read only once by the JAXP implementation and
it's values are then cached for future use. If the file does not
exist when the first attempt is made to read from it, no further
attempts are made to check for its existence. It is not possible to
change the value of any property in jaxp.properties after it has been
read for the first time.
Use the Services API (as detailed in the JAR specification), if available, to determine the classname. The Services API will look for
a classname in the file Platform default DocumentBuilderFactory
instance.
META-INF/services/javax.xml.parsers.DocumentBuilderFactory in jars
available to the runtime.
Platform default DocumentBuilderFactory instance.
Once an application has obtained a reference to a
DocumentBuilderFactory it can use the factory to configure and obtain
parser instances.
DocumentBuilderFactory.newDocumentBuilder()
Creates a new instance of a DocumentBuilder using the currently
configured parameters.
Returns: A new instance of a DocumentBuilder.
Throws: ParserConfigurationException - if a DocumentBuilder cannot be
created which satisfies the configuration requested.
DocumentBuilder.parse()
Parse the content of the given file as an XML document and return a
new DOM Document object. An IllegalArgumentException is thrown if the
File is null null.
Parameters: f - The file containing the XML to parse.
Returns: A new DOM Document object.
Throws: IOException - If any IO errors occur. SAXException - If any
parse errors occur. IllegalArgumentException - When f is null

Related

Special characters creates problem while writing xml

first of all please excuse my shallow understanding into coding as I am a business analyst. Now my question. I am writing java code to convert a csv into xml. I am able to read csv successfully into objects. However, while writing the xml, when special a space or "=" is encounteredan error is thrown.
Piece of the problematic code, I have imporovised the value in create element just to highlight the problem. In actual I am getting this value from an object:-
DocumentBuilderFactory documentFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentFactory.newDocumentBuilder();
Document xmlDocument= documentBuilder.newDocument();
Element root = xmlDocument.createElement("Media NationalGroupId="8" AllFTA="1002" AllSTV="1001");
xmlDocument.appendChild(root);
My xml should look something like this
<Media DateCreated="20200224 145251" NationalGroupId="8" AllFTA="1002" AllSTV="1001" AllTV="1000" NextId="1000000">
createElement should only receive Media as the argument.
To add the other attributes (DateCreated, NationalGroupId, etc), you need to call setAttribute on root, one by one.

DOM handling using marklogic java client api

I'm new to MarkLogic java API and trying to create an xml document where Document is constructed using DocumentBuilderFactory and DocumentBuilder and everything is working fine with the following code.
DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder=factory.newDocumentBuilder();
Document doc=docBuilder.newDocument(); //Works fine
Now since I have doc reference I can call doc.CreateElement() to create an xml structured document.
In the same way if I refer a Document using DOMHandle from com.marklogic.client.io.DOMHandle;
DOMHandle handle=new DOMHandle();
Document doc=handle.get();
doc.createElement(); //NULL POINTER EXCEPTION
Now the document reference created from handle gives an null pointer exception.
I understood that I am getting document from a getter method which returns an empty document but I am not trying to access anything from the empty document. Instead trying to create a document element using doc.createElement() where null pointer exception is arising.
Please explain the issue.
A DOMHandle represents XML content as a DOM Document. It is not a factory which would create a DOM Document. The handle is just an adapter that wraps a document that we read from the database or create in Java. Unless explicitly set with the constructor DOMHandle(Document content) or with the method public void set(Document content), the content of the DOMHandle would be null and hence the NullPointerException. You should probably do one of these
DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder=factory.newDocumentBuilder();
Document doc=docBuilder.newDocument();
// Build the Document completely and assign it to the handle and use the handle
DOMHandle handle = new DOMHandle();
handle.set(doc);
// or DOMHandle handle = new DOMHandle(doc);
// or DOMHandle handle = new DOMHandle().with(doc);

How to read file in compilation time using Java?

I have a project which consists of reading 1000 XML files, each defining a rule of processing the different types of data. The consequence is that the application takes a few seconds to load the XML files when it starts. It's an Android mobile app so the CPU isn't very powerful.
Is there a way to create static objects at compilation time by reading these XML files? If I can pre-process the XML by defining static objects which already have the XML read into it, the app should be able to start loaded, a lot faster. The draw-back that the XML file can't change in the runtime is acceptable.
This is a generic question - I am not bound to use any specific method or library. Anything that allows me to pre-parse the XML will do. But as comments asked for my current runtime-parsing implementation, I provide it in the following paragraphs which uses the DOM parser shipped with Java.
The current implementation:
The XML processing class simply creates an object by reads each XML file. It is used like this:
lst.add(XMLData(new FileInputStream(new File("assets/001.xml"))));
lst.add(XMLData(new FileInputStream(new File("assets/002.xml"))));
....
Where XMLData is the object that reads the XML file and keeps the relevant information. lst is a List of such objects.
The XMLData class look like this:
class XMLDAta {
public XMLData(InputStream xml) throws IOException, SAXException {
DocumentBuilder dBuilder;
try {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
dBuilder = dbFactory.newDocumentBuilder();
} catch (ParserConfigurationException e) {
// TODO: if schema has problems (e.g. defined twice).
// all XML well-formedness were checked before shipping them
e.printStackTrace(); // shouldn't happen
return;
}
Document xml = dBuilder.parse(xmlAsset);

Mockito: How to mock XML Document object to retrieve value from XPath

I am trying to mock a org.w3c.dom.Document object such that calling XPath.evaluate() should return a defined value, e.g., foo, as below:
Document doc = Mockito.mock(Document.class);
Mockito.when(XPathUtils.xpath.evaluate("/MyNode/text()", doc, XPathConstants.STRING)).thenReturn("foo");
I shall pass the doc object to the target method that will extract the textual contents of node MyNode as foo.
I have tried mocking nodes and setting in the doc object as follows:
Node nodeMock = mock(Node.class);
NodeList list = Mockito.mock(NodeList.class);
Element element = Mockito.mock(Element.class);
Mockito.when(list.getLength()).thenReturn(1);
Mockito.when(list.item(0)).thenReturn(nodeMock);
Mockito.when(doc.getNodeType()).thenReturn(Node.DOCUMENT_NODE);
Mockito.when(element.getNodeType()).thenReturn(Node.ELEMENT_NODE);
Mockito.when(nodeMock.getNodeType()).thenReturn(Node.TEXT_NODE);
Mockito.when(doc.hasChildNodes()).thenReturn(false);
Mockito.when(element.hasChildNodes()).thenReturn(true);
Mockito.when(nodeMock.hasChildNodes()).thenReturn(false);
Mockito.when(nodeMock.getNodeName()).thenReturn("MyNode");
Mockito.when(nodeMock.getTextContent()).thenReturn("MyValue");
Mockito.when(element.getChildNodes()).thenReturn(list);
Mockito.when(doc.getDocumentElement()).thenReturn(element);
But this is giving error like:
org.mockito.exceptions.misusing.WrongTypeOfReturnValue: String cannot
be returned by hasChildNodes() hasChildNodes() should return boolean
Is my approach correct and I am missing just another mock, or should I approach it differently? Please help.
Don't mock types you don't own !, it's wrong.
To avoid repeating here's an answer that explains why https://stackoverflow.com/a/28698223/48136
EDIT : What I mean is that the code should have usable builder methods (either in the production or in the test classpath) that should be able to create a real Document whatever the source of that document, but certainly not a mock.
For exemple this factory method or builder could use the DocumentBuilder this way :
class FakeXMLBuilder {
static Document fromString(String xml) {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
return dBuilder.parse(new ByteArrayInputStream(xml.getBytes("UTF_8")));
}
}
Of course this has to be tailored to the need of the project, and this can be customized a lot more. In my current project we have a lot of test builders that can create objects, json, etc. For exemple :
Leg leg = legWithRandomId().contact("Bob").duration(234, SECONDS).build();
String leg = legWithRandomId().contact("Bob").duration(234, SECONDS).toJSON();
String leg = legWithRandomId().contact("Bob").duration(234, SECONDS).toXML();
Whatever leg = legWithRandomId().contact("Bob").duration(234, SECONDS).to(WhateverFactory.whateverFactory());
Use Jsoup.parse() with a test XML String. Something like the following should work, for testing some instance method I've assumed is testClassInstance.readFromConnection(String url):
// Turn a block of XML (static String or one read from a file) into a Document
Document document = Jsoup.parse(articleXml);
// Tell Mockito what to do with the Document
Mockito.when(testClassInstance.readFromConnection(Mockito.any()))
.thenReturn(document);
I'm used to referring to this as "mocking" a Document, but it's just creating and using a real Document, which is then used in mocking other methods. You can construct the xml version of document however you like, or use its regular setters to manipulate it for testing whatever your code is supposed to do with the file. You could also replace that mess of reading-the-file with a constant string, provided it's short enough to be manageable.

Traverse (Read) static DOM document object is thread-safe or not?

I created a DOM document static object, such as below, it uses javax.xml.parsers.* and org.w3c.dom.* API:
DocumentBuilderFactory docBldrFactry = DocumentBuilderFactory.newInstance();
docBldrObj = docBldrFactry.newDocumentBuilder();
File file = new File(fileDirectory);
// Parse the XML file and return a DOM document object
document = docBldrObj.parse(file);
//FYI, document is declared as private static org.w3c.dom.Document document elsewhere.
Later after created above, If this static DOM document object shared by threads, but all threads are just read (traverse) this document, is it thread safe?
I assume it is since read should not modify this shared state, but not sure whether internally there is some magic about it which I don't know.
Thanks
The problem solved by writing own simple Document structure. E.g, clone the DOM document into that, which is thread-safe on read operations.
FYI, for my own purpose, when cloning the document, I don't clone everything but the information based on my need (COMMENT_NODE, TEXT_NODE, ELEMENT_NODE, attributes).

Categories