This question already has an answer here:
Closed 12 years ago.
Possible Duplicate:
how to write test case in java
hi
I created one class in which I have one constructor as follows:
public class ABC {
private static String host;
private static String port;
private static String browser;
private static String url;
private static String fullurl;
public ABC(){
try {
File file = new File("Element.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(file);
doc.getDocumentElement().normalize();
please tell me test case for constructor:
First of all, doc is not the output. It is a local variable inside the constructor and can't be tested/validated in a unit test. But on the other hand, you can rely on the (tested) Parser. It will produce the correct DOM for the given input file.
You may want to test, if the values from the input file are stored to the fields as specified.
So create an input file with legal values, create an instance and assert if the fields contain the correct values:
#Test
public void testABCWithLegalValues() {
ABC abc = new ABC("correct.xml"); // NOTE! I invented a new constructor
// to allow passing test config files!!
assertEquals("www.google.com", abc.getHost());
assertEquals(80, abc.getPort());
// ...
}
This is an example test method based on jUnit 4.
Other tests would feed the constructor with malformed xml files or files with illegal data (like a port Address > 65535) and verify that the class reacts as specified.
I have no idea what kinds of test case you need.
However, your can verify the final result in this way: feeding the xml and asserting the values about host, port, browser, url, fullurl.
Maybe you need refactor it to make xml text or file set by test-case.
Your class is performing two distinct tasks:
Reading a file and parsing it as a Document
Processing the Document to determine host, port, browser, url and fullurl
Since this currently all occurs within the constructor and the file name is hardcoded, this class is pretty hard to unit test.
If you can modify the class, then here are my suggestions to make this class testable:
Don't hardcode the name of the file to be parsed. I would pass it as a constructor argument here because you don't need the fileName later on so no need to keep it as a private field.
Separate the tasks, let the constructor read the file and create a separate method to process the document.
Since you wanted the codez, here is the modified class:
import java.io.File;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.apache.log4j.Logger;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class ABC {
private static final Logger LOG = Logger.getLogger(ABC.class);
private static final String DEFAULT_FILENAME = "Element.xml";
private String host;
private String port;
private String browser;
private String url;
private String fullurl;
public class AbcFileAccessException extends Exception {
private static final long serialVersionUID = 1L;
public AbcFileAccessException(Exception e) {
super(e);
}
}
public ABC() throws AbcFileAccessException {
this(DEFAULT_FILENAME);
}
public ABC(String fileName) throws AbcFileAccessException {
File file = new File(fileName);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder db = dbf.newDocumentBuilder();
process(db.parse(file));
} catch (ParserConfigurationException e) {
throw new AbcFileAccessException(e);
} catch (SAXException e) {
throw new AbcFileAccessException(e);
} catch (IOException e) {
throw new AbcFileAccessException(e);
}
}
public ABC(Document document) {
process(document);
}
public void process(Document document) {
if (document == null) {
throw new IllegalArgumentException("Document may not be null");
}
document.getDocumentElement().normalize();
LOG.info("Root element " + document.getDocumentElement().getNodeName());
NodeList nodeLst = document.getElementsByTagName("selenium");
for (int s = 0; s < nodeLst.getLength(); s++) {
Node fstNode = nodeLst.item(s);
if (fstNode.getNodeType() == Node.ELEMENT_NODE) {
Element fstElmnt = (Element) fstNode;
NodeList fstNmElmntLst = fstElmnt.getElementsByTagName("name");
Element fstNmElmnt = (Element) fstNmElmntLst.item(0);
NodeList fstNm = fstNmElmnt.getChildNodes();
String name = ((Node) fstNm.item(0)).getNodeValue();
NodeList lstNmElmntLst = fstElmnt.getElementsByTagName("value");
Element lstNmElmnt = (Element) lstNmElmntLst.item(0);
NodeList lstNm = lstNmElmnt.getChildNodes();
String value = ((Node) lstNm.item(0)).getNodeValue();
if (name.equals("host")) {
host = value;
}
if (name.equals("port")) {
port = value;
}
if (name.equals("browser")) {
browser = value;
}
if (name.equals("url")) {
url = value;
}
if (name.equals("fullurl")) {
fullurl = value;
}
}
}
}
public String getHost() {
return host;
}
public String getPort() {
return port;
}
public String getBrowser() {
return browser;
}
public String getUrl() {
return url;
}
public String getFullurl() {
return fullurl;
}
}
Other improvements I made :
Avoid static fields for runtime data like this. If they are private (as in your example) then they can just be instance field, seeing you are already creating a (non-singleton) instance of the class. If you intended them to be accessed by other classes it is even worse, because those other classes could access the fields like ABC.host which makes them hard to test and locked in to you implementation class. Let's not go into that now (-:
NEVER call setters from a contructor (see http://www.javapractices.com/topic/TopicAction.do?Id=215 for an explanation).
Scope try-catch blocks as narrowly as possible (or practical). This makes your code more readable because it is clear where the exceptions are being thrown.
Catch each exception type separately. Bundeling them together makes the code less readable. I agree this is a pain for some parts of the API (try using reflection), but it is good practice. Assume a developer should be able to read and understand your code from a printout (so without hovering and code navigation features of your IDE).
Don't handle exceptions by calling printStacktrace, logging an error or throwing a RuntimeException if you can avoid it. If you do, at least document these error conditions thoroughly. It is ok to create your own exception types for error conditions, this makes for a very understandable API (so other developers don't have to delve into your code, but can use the class after reading the JavaDoc).
Don't use System.out.println for logging, use a logging framework like Log4j.
This class can now be tested as follows:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.TransformerFactoryConfigurationError;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import junit.framework.Assert;
import org.junit.Before;
import org.junit.Test;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import xcon.pilot.ABC.AbcFileAccessException;
public class ABCTest {
private Document document;
#Before
public void setUp() throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
document = builder.newDocument();
}
#Test
public void testProcess() throws ParserConfigurationException,
AbcFileAccessException, TransformerFactoryConfigurationError,
TransformerException {
Element root = (Element) document.createElement("root");
document.appendChild(root);
String host = "myhost";
String port = "myport";
String browser = "mybrowser";
String url = "myurl";
String fullurl = "myfullurl";
root.appendChild(createElement("host", host));
root.appendChild(createElement("port", port));
root.appendChild(createElement("browser", browser));
root.appendChild(createElement("url", url));
root.appendChild(createElement("fullurl", fullurl));
// for your convenience
printXml();
ABC instance = new ABC(document);
Assert.assertEquals(host, instance.getHost());
Assert.assertEquals(port, instance.getPort());
Assert.assertEquals(browser, instance.getBrowser());
Assert.assertEquals(url, instance.getUrl());
Assert.assertEquals(fullurl, instance.getFullurl());
}
private Element createElement(String name, String value) {
Element result = (Element) document.createElement("selenium");
Element nameElement = document.createElement("name");
nameElement.setTextContent(name);
result.appendChild(nameElement);
Element valueElement = document.createElement("value");
valueElement.setTextContent(value);
result.appendChild(valueElement);
return result;
}
private void printXml() throws TransformerConfigurationException,
TransformerFactoryConfigurationError, TransformerException {
Transformer transformer = TransformerFactory.newInstance()
.newTransformer();
Source source = new DOMSource(document);
Result output = new StreamResult(System.out);
transformer.transform(source, output);
}
}
This tests your Document processing logic. Testing the reading and parsing of files is notably tricky and can't really be seen as unit testing because you are always dependent on the operating system and its filesystem. I usually leave that as part of integration testing and build/deployment support. It helps to build good sanity checks and error handling in your code so missing/incorrect files are reported clearly and early.
Hope this helped you.
Related
I have created an method that digitally signs an xml (XAdES with Timestamping) using xades4j library https://github.com/luisgoncalves/xades4j
Unfortunately the xml's are quite big sometimes (1,8 GB) and I was wondering if there is a way to do that by streaming the XML instead of creating a DOM and loading the whole document in memory. Is there a way? Can I do that with xades4j?
Below is the current code that signs the document using a DOM representation of the xml. The initial method that is called first is the signXml().
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.security.KeyStoreException;
import javax.annotation.PostConstruct;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.apache.commons.io.FilenameUtils;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import com.safe.AbstractManager;
import com.safe.model.DirectPasswordProvider;
import xades4j.algorithms.ExclusiveCanonicalXMLWithoutComments;
import xades4j.production.Enveloped;
import xades4j.production.SignatureAlgorithms;
import xades4j.production.XadesSigner;
import xades4j.production.XadesTSigningProfile;
import xades4j.providers.KeyingDataProvider;
import xades4j.providers.impl.FileSystemKeyStoreKeyingDataProvider;
import xades4j.providers.impl.HttpTsaConfiguration;
import xades4j.providers.impl.KeyStoreKeyingDataProvider;
import xades4j.utils.DOMHelper;
#Component
public class FileOperationsManager extends AbstractManager {
#Value("${certificates.digital-signature.filepath}")
private String certPath;
#Value("${certificates.digital-signature.password}")
private String certPass;
#Value("${certificates.digital-signature.type}")
private String certType;
private DocumentBuilder db;
private TransformerFactory tf;
#PostConstruct
public void init() throws Exception {
final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
this.db = dbf.newDocumentBuilder();
this.tf = TransformerFactory.newInstance();
}
public Path signXml(final Path xmlFile, final Path targetDir) {
final String baseName = FilenameUtils.getBaseName(xmlFile.getFileName().toString())
.concat("_Signed")
.concat(".")
.concat(FilenameUtils.getExtension(xmlFile.getFileName().toString()));
final Path target = Paths.get(targetDir.toString(), baseName);
try (final FileInputStream fis = new FileInputStream(String.valueOf(xmlFile))) {
final Document doc = this.parseDocument(fis);
final Element elementToSign = doc.getDocumentElement();
final SignatureAlgorithms algorithms = new SignatureAlgorithms()
.withCanonicalizationAlgorithmForTimeStampProperties(new ExclusiveCanonicalXMLWithoutComments("ds", "xades"))
.withCanonicalizationAlgorithmForSignature(new ExclusiveCanonicalXMLWithoutComments());
final KeyingDataProvider kdp = this.createFileSystemKeyingDataProvider(certType, certPath, certPass, true);
final XadesSigner signer = new XadesTSigningProfile(kdp)
.withSignatureAlgorithms(algorithms)
.with(new HttpTsaConfiguration("http://timestamp.digicert.com"))
.newSigner();
new Enveloped(signer).sign(elementToSign);
this.exportDocument(doc, target);
} catch (final FileNotFoundException e) {
throw new RuntimeException();
} catch (final Exception e) {
throw new RuntimeException();
}
return target;
}
private FileSystemKeyStoreKeyingDataProvider createFileSystemKeyingDataProvider(
final String keyStoreType,
final String keyStorePath,
final String keyStorePwd,
final boolean returnFullChain) throws KeyStoreException {
return FileSystemKeyStoreKeyingDataProvider
.builder(keyStoreType, keyStorePath, KeyStoreKeyingDataProvider.SigningCertificateSelector.single())
.storePassword(new DirectPasswordProvider(keyStorePwd))
.entryPassword(new DirectPasswordProvider(keyStorePwd))
.fullChain(returnFullChain)
.build();
}
public Document parseDocument(final InputStream is) {
try {
final Document doc = this.db.parse(is);
final Element elem = doc.getDocumentElement();
DOMHelper.useIdAsXmlId(elem);
return doc;
} catch (final Exception e) {
throw new RuntimeException();
}
}
public void exportDocument(final Document doc, final Path target) {
try (final FileOutputStream out = new FileOutputStream(target.toFile())) {
this.tf.newTransformer().transform(
new DOMSource(doc),
new StreamResult(out));
} catch (final Exception e) {
throw new RuntimeException();
}
}
Unfortunately xades4j doesn't support streaming on the XML document to which the signature will be appended. I don't know if there are other alternative libraries that do.
A possible workaround using xades4j is to use a detached signature instead of an enveloped signature. The signature can be added to an empty XML document and the large XML file is explicitly added as a Reference to that signature.
xades4j delegates the core XML-DSIG handling to Apache Santuario, so if Santuario uses streaming for Reference resolution, this should avoid your issue. I'm not sure, though, but it may be worth testing.
https://github.com/luisgoncalves/xades4j/wiki/DefiningSignedResources
You may need to use a file URI and/or base URIs.
I'm working on an application that has these kinds of xml file (document.xml):
<root>
<subRoot myAttribute="CN=Ok">
Ok
</subRoot>
<subRoot myAttribute="CN="Problem"">
Problem
</subRoot>
</root>
I need to retrieve Element's using XPath expressions. I'm not able to retrieve the second element, which I need to select using the value of myAttribute. This is due to the " character ...
Here is a test class. The second assertion is throwing an AssertionError because the object is null.
import static org.junit.Assert.assertNotNull;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.charset.StandardCharsets;
import org.apache.commons.io.IOUtils;
import org.jdom.Document;
import org.jdom.Element;
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
import org.jdom.xpath.XPath;
import org.junit.Test;
public class XPathTest {
#Test
public void quotesXpath() throws JDOMException, IOException {
Document document = getDocumentFromContent(getClasspathResource("document.xml"));
String okXPath = "/root/subRoot[#myAttribute=\"CN=Ok\"]";
assertNotNull(getElement(document, okXPath)); // Ok ...
String problemXPath = "/root/subRoot[#myAttribute=\"CN="Problem"\"]";
assertNotNull(getElement(document, problemXPath)); // Why null ?
}
public String getClasspathResource(String filePath) throws IOException {
try (InputStream inputStream = this.getClass().getClassLoader().getResourceAsStream(filePath)) {
return IOUtils.toString(inputStream, StandardCharsets.UTF_8);
}
}
public static Document getDocumentFromContent(String content) throws IOException, JDOMException {
try (InputStream is = new ByteArrayInputStream(content.getBytes(StandardCharsets.UTF_8))) {
SAXBuilder builder = new SAXBuilder();
return builder.build(is);
}
}
public Element getElement(Document document, String xpathExpression) throws JDOMException {
XPath xpath = XPath.newInstance(xpathExpression);
return (Element) xpath.selectSingleNode(document);
}
}
The application is using Jdom 1.1.3
<dependency>
<groupId>org.jdom</groupId>
<artifactId>jdom</artifactId>
<version>1.1.3</version>
</dependency>
How can I change my xpath expression so that the second element is returned ? Is this possible with this version of Jdom ?
Thank you for your help !
Try this expression:
String problemXPath = "/root/subRoot[#myAttribute='CN=\"Problem\"']";
Firstly, when the document is parsed, the entity " is replaced with the " character, so that should be used directly in the XPath expression.
Secondly, in XPath you can use either single or double quotes for string constants, which is convenient if you have strings that contain quotes.
I have a requirement to create a sort of 'skeleton' xml based on an XSD schema.
The documents defined by these schemas have no namespace. They are authored by other developers, not in an automated way.
There is no mixed content allowed. That is, elements can contain elements only, or text only.
The rules for this sample xml are:
elements that can contain only text content should not be created in the sample xml
all other optional and mandatory elements should be included in the sample xml
elements should be created only once even if they can occur multiple times
any other nodes such as attributes, comments, processing instruction, etc. should be ommited - the sample xml would be an 'element tree'
Are there APIs or tools in Java that can generate such sample xml? I'm looking for pointers where to get started.
This needs to be done programmatically in a reliable way, as the sample xml is used by other XSLT transformations.
Hope below code will serve your purpose
package com.example.demo;
import java.io.File;
import javax.xml.namespace.QName;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import jlibs.xml.sax.XMLDocument;
import jlibs.xml.xsd.XSInstance;
import jlibs.xml.xsd.XSParser;
public interface xsdtoxml {
public static void main(String[] pArgs) {
try {
String filename = "out.xsd";
// instance.
final Document doc = loadXsdDocument(filename);
//Find the docs root element and use it to find the targetNamespace
final Element rootElem = doc.getDocumentElement();
String targetNamespace = null;
if (rootElem != null && rootElem.getNodeName().equals("xs:schema"))
{
targetNamespace = rootElem.getAttribute("targetNamespace");
}
//Parse the file into an XSModel object
org.apache.xerces.xs.XSModel xsModel = new XSParser().parse(filename);
//Define defaults for the XML generation
XSInstance instance = new XSInstance();
instance.minimumElementsGenerated = 1;
instance.maximumElementsGenerated = 1;
instance.generateDefaultAttributes = true;
instance.generateOptionalAttributes = true;
instance.maximumRecursionDepth = 0;
instance.generateAllChoices = true;
instance.showContentModel = true;
instance.generateOptionalElements = true;
//Build the sample xml doc
//Replace first param to XMLDoc with a file input stream to write to file
QName rootElement = new QName(targetNamespace, "out");
XMLDocument sampleXml = new XMLDocument(new StreamResult(System.out), true, 4, null);
instance.generate(xsModel, rootElement, sampleXml);
} catch (TransformerConfigurationException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static Document loadXsdDocument(String inputName) {
final String filename = inputName;
final DocumentBuilderFactory factory = DocumentBuilderFactory
.newInstance();
factory.setValidating(false);
factory.setIgnoringElementContentWhitespace(true);
factory.setIgnoringComments(true);
Document doc = null;
try {
final DocumentBuilder builder = factory.newDocumentBuilder();
final File inputFile = new File(filename);
doc = builder.parse(inputFile);
} catch (final Exception e) {
e.printStackTrace();
// throw new ContentLoadException(msg);
}
return doc;
}
}
xsd to xml :
1 : you can use eclipse (right click and select Generate)
2 : Sun/Oracle Multi-Schema Validator
3 : xmlgen
see:
How to generate sample XML documents from their DTD or XSD?
for subtle requirements, you should program it yourself
I would need to build a simple program for my homework purposes that will retrieve data from an XML attribute based on the user input in a web service. To that end, I assumed I would start building a class that could parse my XML string and also I built a simple java service that does nothing but responds with a simple message. The problem is how do I put these together in order to get my program to work? Is this a good way to begin with? Please advise.
Also, to make thing a little more easier, the data in the string representation of XML has key words in both English and Serbian that would enable this web service to retrieve from one another:
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class Recnik {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
String xmlString = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><!DOCTYPE language [<!ATTLIST phrase id ID #IMPLIED>]><language id=\"sr\"><phrase key=\"house\" value=\"kuca\"/><phrase key=\"dog\" value=\"pas\"/><phrase key=\"cat\" value=\"macka\"/></language>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
//FileInputStream fis = new FileInputStream("myBooks.xml");
InputSource is = new InputSource(new StringReader(xmlString));
Document doc = db.parse(is);
Element r = doc.getDocumentElement();
NodeList language = r.getElementsByTagName("phrase");
System.out.println(language.item(1).getAttributes().item(0).getTextContent());
}
}
package Prevodilac;
import javax.jws.WebService;
import javax.jws.WebMethod;
import javax.jws.WebParam;
#WebService(serviceName = "Prevodilac")
public class Prevodilac {
#WebMethod(operationName = "pretraga")
public String pretraga(int a, int b) {
Integer res = a+b;
return res.toString();
}
}
#WebService(serviceName = "Prevodilac")
public class Prevodilac {
Document doc;
public Prevodilac() throws ParserConfigurationException, SAXException, IOException{
// Fill the document just once, not for each method call
String xmlString = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><!DOCTYPE language [<!ATTLIST phrase id ID #IMPLIED>]><language id=\"sr\"><phrase key=\"house\" value=\"kuca\"/><phrase key=\"dog\" value=\"pas\"/><phrase key=\"cat\" value=\"macka\"/></language>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xmlString));
doc = db.parse(is);
}
#WebMethod(operationName = "pretraga")
public String pretraga(String key) {
Element r = doc.getDocumentElement();
NodeList language = r.getElementsByTagName("phrase");
String result = "Not found";
for( int index = 0; index < language.getLength(); index++ ) {
Node attribute = language.item(index).getAttributes().getNamedItem("key");
// TODO (It's homework after all):
// check if the attribute corresponds to key parameter
if( attribute..... ){
// fill result with attribute value
result = ...;
}
}
return result;
}
}
I've an xml file that I would avoid having to load all in memory.
As everyone know, for such a file I better have to use a SAX parser (which will go along the file and call for events if something relevant is found.)
My current problem is that I would like to process the file "by chunk" which means:
Parse the file and find a relevant tag (node)
Load this tag entirely in memory (like we would do it in DOM)
Do the process of this entity (that local chunk)
When I'm done with the chunk, release it and continue to 1. (until "end of file")
In a perfect world I'm searching some something like this:
// 1. Create a parser and set the file to load
IdealParser p = new IdealParser("BigFile.xml");
// 2. Set an XPath to define the interesting nodes
p.setRelevantNodesPath("/path/to/relevant/nodes");
// 3. Add a handler to callback the right method once a node is found
p.setHandler(new Handler(){
// 4. The method callback by the parser when a relevant node is found
void aNodeIsFound(saxNode aNode)
{
// 5. Inflate the current node i.e. load it (and all its content) in memory
DomNode d = aNode.expand();
// 6. Do something with the inflated node (method to be defined somewhere)
doThingWithNode(d);
}
});
// 7. Start the parser
p.start();
I'm currently stuck on how to expand a "sax node" (understand me…) efficiently.
Is there any Java framework or library relevant to this kind of task?
UPDATE
You could also just use the javax.xml.xpath APIs:
package forum7998733;
import java.io.FileReader;
import javax.xml.xpath.*;
import org.w3c.dom.Node;
import org.xml.sax.InputSource;
public class XPathDemo {
public static void main(String[] args) throws Exception {
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
InputSource xml = new InputSource(new FileReader("BigFile.xml"));
Node result = (Node) xpath.evaluate("/path/to/relevant/nodes", xml, XPathConstants.NODE);
System.out.println(result);
}
}
Below is a sample of how it could be done with StAX.
input.xml
Below is some sample XML:
<statements>
<statement account="123">
...stuff...
</statement>
<statement account="456">
...stuff...
</statement>
</statements>
Demo
In this example a StAX XMLStreamReader is used to find the node that will be converted to a DOM. In this example we convert each statement fragment to a DOM, but your navigation algorithm could be more advanced.
package forum7998733;
import java.io.FileReader;
import javax.xml.stream.*;
import javax.xml.transform.*;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.dom.*;
public class Demo {
public static void main(String[] args) throws Exception {
XMLInputFactory xif = XMLInputFactory.newInstance();
XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("src/forum7998733/input.xml"));
xsr.nextTag(); // Advance to statements element
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
DOMResult domResult = new DOMResult();
t.transform(new StAXSource(xsr), domResult);
DOMSource domSource = new DOMSource(domResult.getNode());
StreamResult streamResult = new StreamResult(System.out);
t.transform(domSource, streamResult);
}
}
}
Output
<?xml version="1.0" encoding="UTF-8" standalone="no"?><statement account="123">
...stuff...
</statement><?xml version="1.0" encoding="UTF-8" standalone="no"?><statement account="456">
...stuff...
</statement>
It could be done with SAX... But I think the newer StAX (Streaming API for XML) will serve your purpose better. You could create an XMLEventReader and use that to parse your file, detecting which nodes adhere to one of your criteria. For simple path-based selection (not really XPath, but some simple / delimited path) you'd need to maintain a path to your current node by adding entries to a String on new elements or cutting of entries on an end tag. A boolean flag can suffice to maintain whether you're currently in "relevant mode" or not.
As you obtain XMLEvents from your reader, you could copy the relevant ones over to an XMLEventWriter that you've created on some suitable placeholder, like a StringWriter or ByteArrayOutputStream. Once you've completed the copying for some XML extract that forms a "subdocument" of what you wish to build a DOM for, simply supply your placeholder to a DocumentBuilder in a suitable form.
The limitation here is that you're not harnessing all the power of the XPath language. If you wish to take stuff like node position into account, you'd have to foresee that in your own path. Perhaps someone knows of a good way of integrating a true XPath implementation into this.
StAX is really nice in that it gives you control over the parsing, rather than using some callback interface through a handler like SAX.
There's yet another alternative: using XSLT. An XSLT stylesheet is the ideal way to filter out only relevant stuff. You could transform your input once to obtain the required fragments and process those. Or run multiple stylesheets over the same input to get the desired extract each time. An even nicer (and more efficient) solution, however, would be the use of extension functions and/or extension elements.
Extension functions can be implemented in a way that's independent from the XSLT processor being used. They're fairly straightforward to use in Java and I know for a fact that you can use them to pass complete XML extracts to a method, because I've done so already. Might take some experimentation, but it's a powerful mechanism. A DOM extract (or node) is probably one of the accepted parameter types for such a method. That'd leave the document building up to the XSLT processor which is even easier.
Extension elements are also very useful, but I think they need to be used in an implementation-specific manner. If you're okay with tying yourself to a specific JAXP setup like Xerces + Xalan, they might be the answer.
When going for XSLT, you'll have all the advantages of a full XPath 1.0 implementation, plus the peace of mind that comes from knowing XSLT is in really good shape in Java. It limits the building of the input tree to those nodes that are needed at any time and is blazing fast because the processors tend to compile stylesheets into Java bytecode rather than interpreting them. It is possible that using compilation instead of interpretation loses the possibility of using extension elements, though. Not certain about that. Extension functions are still possible.
Whatever way you choose, there's so much out there for XML processing in Java that you'll find plenty of help in implementing this, should you have no luck in finding a ready-made solution. That'd be the most obvious thing, of course... No need to reinvent the wheel when someone did the hard work.
Good luck!
EDIT: because I'm actually not feeling depressed for once, here's a demo using the StAX solution I whipped up. It's certainly not the cleanest code, but it'll give you the basic idea:
package staxdom;
import java.io.IOException;
import java.io.InputStream;
import java.io.StringReader;
import java.io.StringWriter;
import java.util.Collections;
import java.util.HashSet;
import java.util.Set;
import java.util.Stack;
import javax.xml.namespace.QName;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLEventWriter;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.XMLEvent;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class DOMExtractor {
private final Set<String> paths;
private final XMLInputFactory inputFactory;
private final XMLOutputFactory outputFactory;
private final DocumentBuilderFactory docBuilderFactory;
private final Stack<QName> activeStack = new Stack<QName>();
private boolean active = false;
private String currentPath = "";
public DOMExtractor(final Set<String> paths) {
this.paths = Collections.unmodifiableSet(new HashSet<String>(paths));
inputFactory = XMLInputFactory.newFactory();
outputFactory = XMLOutputFactory.newFactory();
docBuilderFactory = DocumentBuilderFactory.newInstance();
}
public void parse(final InputStream input) throws XMLStreamException, ParserConfigurationException, SAXException, IOException {
final XMLEventReader reader = inputFactory.createXMLEventReader(input);
XMLEventWriter writer = null;
StringWriter buffer = null;
final DocumentBuilder builder = docBuilderFactory.newDocumentBuilder();
XMLEvent currentEvent = reader.nextEvent();
do {
if(active)
writer.add(currentEvent);
if(currentEvent.isEndElement()) {
if(active) {
activeStack.pop();
if(activeStack.isEmpty()) {
writer.flush();
writer.close();
final Document doc;
final StringReader docReader = new StringReader(buffer.toString());
try {
doc = builder.parse(new InputSource(docReader));
} finally {
docReader.close();
}
//TODO: use doc
//Next bit is only for demo...
outputDoc(doc);
active = false;
writer = null;
buffer = null;
}
}
int index;
if((index = currentPath.lastIndexOf('/')) >= 0)
currentPath = currentPath.substring(0, index);
} else if(currentEvent.isStartElement()) {
final StartElement start = (StartElement)currentEvent;
final QName qName = start.getName();
final String local = qName.getLocalPart();
currentPath += "/" + local;
if(!active && paths.contains(currentPath)) {
active = true;
buffer = new StringWriter();
writer = outputFactory.createXMLEventWriter(buffer);
writer.add(currentEvent);
}
if(active)
activeStack.push(qName);
}
currentEvent = reader.nextEvent();
} while(!currentEvent.isEndDocument());
}
private void outputDoc(final Document doc) {
try {
final Transformer t = TransformerFactory.newInstance().newTransformer();
t.transform(new DOMSource(doc), new StreamResult(System.out));
System.out.println("");
System.out.println("");
} catch(TransformerException ex) {
ex.printStackTrace();
}
}
public static void main(String[] args) {
final Set<String> paths = new HashSet<String>();
paths.add("/root/one");
paths.add("/root/three/embedded");
final DOMExtractor me = new DOMExtractor(paths);
InputStream stream = null;
try {
stream = DOMExtractor.class.getResourceAsStream("sample.xml");
me.parse(stream);
} catch(final Exception e) {
e.printStackTrace();
} finally {
if(stream != null)
try {
stream.close();
} catch(IOException ex) {
ex.printStackTrace();
}
}
}
}
And the sample.xml file (should be in the same package):
<?xml version="1.0" encoding="UTF-8"?>
<root>
<one>
<two>this is text</two>
look, I can even handle mixed!
</one>
... not sure what to do with this, though
<two>
<willbeignored/>
</two>
<three>
<embedded>
<and><here><we><go>
Creative Commons Legal Code
Attribution 3.0 Unported
CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
LEGAL SERVICES. DISTRIBUTION OF THIS LICENSE DOES NOT CREATE AN
ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
REGARDING THE INFORMATION PROVIDED, AND DISCLAIMS LIABILITY FOR
DAMAGES RESULTING FROM ITS USE.
License
THE WORK (AS DEFINED BELOW) IS PROVIDED UNDER THE TERMS OF THIS CREATIVE
COMMONS PUBLIC LICENSE ("CCPL" OR "LICENSE"). THE WORK IS PROTECTED BY
COPYRIGHT AND/OR OTHER APPLICABLE LAW. ANY USE OF THE WORK OTHER THAN AS
AUTHORIZED UNDER THIS LICENSE OR COPYRIGHT LAW IS PROHIBITED.
BY EXERCISING ANY RIGHTS TO THE WORK PROVIDED HERE, YOU ACCEPT AND AGREE
TO BE BOUND BY THE TERMS OF THIS LICENSE. TO THE EXTENT THIS LICENSE MAY
BE CONSIDERED TO BE A CONTRACT, THE LICENSOR GRANTS YOU THE RIGHTS
CONTAINED HERE IN CONSIDERATION OF YOUR ACCEPTANCE OF SUCH TERMS AND
CONDITIONS.
</go></we></here></and>
</embedded>
</three>
</root>
EDIT 2: Just noticed in Blaise Doughan's answer that there's a StAXSource. That'll be even more efficient. Use that if you're going with StAX. Will eliminate the need to keep some buffer. StAX allows you to "peek" at the next event, so you can check if it's a start element with the right path without consuming it before passing it into the transformer .
ok thanks to your pieces of code, I finally end up with my solution:
Usage is quite intuitive:
try
{
/* CREATE THE PARSER */
XMLParser parser = new XMLParser();
/* CREATE THE FILTER (THIS IS A REGEX (X)PATH FILTER) */
XMLRegexFilter filter = new XMLRegexFilter("statements/statement");
/* CREATE THE HANDLER WHICH WILL BE CALLED WHEN A NODE IS FOUND */
XMLHandler handler = new XMLHandler()
{
public void nodeFound(StringBuilder node, XMLStackFilter withFilter)
{
// DO SOMETHING WITH THE FOUND XML NODE
System.out.println("Node found");
System.out.println(node.toString());
}
};
/* ATTACH THE FILTER WITH THE HANDLER */
parser.addFilterWithHandler(filter, handler);
/* SET THE FILE TO PARSE */
parser.setFilePath("/path/to/bigfile.xml");
/* RUN THE PARSER */
parser.parse();
}
catch (Exception ex)
{
ex.printStackTrace();
}
Note:
I made a XMLNodeFoundNotifier and a XMLStackFilter interface to easily integrate or build your own handler / filter.
Normally you should be able to parse very large files with this class. Only the returned nodes are actually loaded into memory.
You can enable attributes support in uncommenting the right part in the code, I disabled it for simplicity reasons.
You can use as many filters per handler as you need and conversely
All the of the code is here:
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Stack;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import javax.xml.stream.*;
/* IMPLEMENT THIS TO YOUR CLASS IN ORDER TO TO BE NOTIFIED WHEN A NODE IS FOUND*/
interface XMLNodeFoundNotifier {
abstract void nodeFound(StringBuilder node, XMLStackFilter withFilter);
}
/* A SMALL HANDER USEFULL FOR EXPLICIT CLASS DECLARATION */
abstract class XMLHandler implements XMLNodeFoundNotifier {
}
/* INTERFACE TO WRITE YOUR OWN FILTER BASED ON THE CURRENT NODES STACK (PATH)*/
interface XMLStackFilter {
abstract boolean isRelevant(Stack fullPath);
}
/* A VERY USEFULL FILTER USING REGEX AS THE PATH FILTER */
class XMLRegexFilter implements XMLStackFilter {
Pattern relevantExpression;
XMLRegexFilter(String filterRules) {
relevantExpression = Pattern.compile(filterRules);
}
/* HERE WE ARE ARE ASK TO TELL IF THE CURRENT STACK (LIST OF NODES) IS RELEVANT
* OR NOT ACCORDING TO WHAT WE WANT. RETURN TRUE IF THIS IS THE CASE */
#Override
public boolean isRelevant(Stack fullPath) {
/* A POSSIBLE CLEVER WAY COULD BE TO SERIALIZE THE WHOLE PATH (INCLUDING
* ATTRIBUTES) TO A STRING AND TO MATCH IT WITH A REGEX BEING THE FILTER
* FOR NOW StackToString DOES NOT SERIALIZE ATTRIBUTES */
String stackPath = XMLParser.StackToString(fullPath);
Matcher m = relevantExpression.matcher(stackPath);
return m.matches();
}
}
/* THE MAIN PARSER'S CLASS */
public class XMLParser {
HashMap<XMLStackFilter, XMLNodeFoundNotifier> filterHandler;
HashMap<Integer, Integer> feedingStreams;
Stack<HashMap> currentStack;
String filePath;
XMLParser() {
currentStack = new <HashMap>Stack();
filterHandler = new <XMLStackFilter, XMLNodeFoundNotifier> HashMap();
feedingStreams = new <Integer, Integer>HashMap();
}
public void addFilterWithHandler(XMLStackFilter f, XMLNodeFoundNotifier h) {
filterHandler.put(f, h);
}
public void setFilePath(String filePath) {
this.filePath = filePath;
}
/* CONVERT A STACK OF NODES TO A REGULAR PATH STRING. NOTE THAT PER DEFAULT
* I DID NOT ADDED THE ATTRIBUTES INTO THE PATH. UNCOMENT THE LINKS ABOVE TO
* DO SO
*/
public static String StackToString(Stack<HashMap> s) {
int k = s.size();
if (k == 0) {
return null;
}
StringBuilder out = new StringBuilder();
out.append(s.get(0).get("tag"));
for (int x = 1; x < k; ++x) {
HashMap node = s.get(x);
out.append('/').append(node.get("tag"));
/*
// UNCOMMENT THIS TO ADD THE ATTRIBUTES SUPPORT TO THE PATH
ArrayList <String[]>attributes = (ArrayList)node.get("attr");
if (attributes.size()>0)
{
out.append("[");
for (int i = 0 ; i<attributes.size(); i++)
{
String[]keyValuePair = attributes.get(i);
if (i>0) out.append(",");
out.append(keyValuePair[0]);
out.append("=\"");
out.append(keyValuePair[1]);
out.append("\"");
}
out.append("]");
}*/
}
return out.toString();
}
/*
* ONCE A NODE HAS BEEN SUCCESSFULLY FOUND, WE GET THE DELIMITERS OF THE FILE
* WE THEN RETRIEVE THE DATA FROM IT.
*/
private StringBuilder getChunk(int from, int to) throws Exception {
int length = to - from;
FileReader f = new FileReader(filePath);
BufferedReader br = new BufferedReader(f);
br.skip(from);
char[] readb = new char[length];
br.read(readb, 0, length);
StringBuilder b = new StringBuilder();
b.append(readb);
return b;
}
/* TRANSFORMS AN XSR NODE TO A HASHMAP NODE'S REPRESENTATION */
public HashMap XSRNode2HashMap(XMLStreamReader xsr) {
HashMap h = new HashMap();
ArrayList attributes = new ArrayList();
for (int i = 0; i < xsr.getAttributeCount(); i++) {
String[] s = new String[2];
s[0] = xsr.getAttributeName(i).toString();
s[1] = xsr.getAttributeValue(i);
attributes.add(s);
}
h.put("tag", xsr.getName());
h.put("attr", attributes);
return h;
}
public void parse() throws Exception {
FileReader f = new FileReader(filePath);
XMLInputFactory xif = XMLInputFactory.newInstance();
XMLStreamReader xsr = xif.createXMLStreamReader(f);
Location previousLoc = xsr.getLocation();
while (xsr.hasNext()) {
switch (xsr.next()) {
case XMLStreamConstants.START_ELEMENT:
currentStack.add(XSRNode2HashMap(xsr));
for (XMLStackFilter filter : filterHandler.keySet()) {
if (filter.isRelevant(currentStack)) {
feedingStreams.put(currentStack.hashCode(), new Integer(previousLoc.getCharacterOffset()));
}
}
previousLoc = xsr.getLocation();
break;
case XMLStreamConstants.END_ELEMENT:
Integer stream = null;
if ((stream = feedingStreams.get(currentStack.hashCode())) != null) {
// FIND ALL THE FILTERS RELATED TO THIS FeedingStreem AND CALL THEIR HANDLER.
for (XMLStackFilter filter : filterHandler.keySet()) {
if (filter.isRelevant(currentStack)) {
XMLNodeFoundNotifier h = filterHandler.get(filter);
StringBuilder aChunk = getChunk(stream.intValue(), xsr.getLocation().getCharacterOffset());
h.nodeFound(aChunk, filter);
}
}
feedingStreams.remove(currentStack.hashCode());
}
previousLoc = xsr.getLocation();
currentStack.pop();
break;
default:
break;
}
}
}
}
A little while since i did SAX, but what you want to do is process each of the tags until you find the end tag for the group you want to process, then run your process, clear it out and look for the next start tag.