I am querying XML files with size of around 1 MB(20k+ lines). I am using XPath to describe what I want to get and VTD-XML library to get it. I think that I have some problems with performance.
The problem is, I am making about 5k+ queries to XML file. It takes approximately 16-17 seconds to retrieve all values. I want to ask you, if this is normal performance for such task? How I can improve it?
I am using VTD-XML library with AutoPilot navigation approach which give me opportunity to use XPath. Implementation is as following:
private VTDGen vg = new VTDGen();
private VTDNav vn;
private AutoPilot ap = new AutoPilot();
public void init(String xml) {
log.info("Creating document");
xml = xml.replace("<?xml version=\"1.0\"?>", "<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
byte[] bytes = xml.getBytes(StandardCharsets.UTF_8);
vg.setDoc(bytes);
try {
vg.parse(true);
vn = vg.getNav();
} catch (ParseException e) {
e.printStackTrace();
}
log.info("Document created");
}
public String parseXmlOrReturnNull(String query) {
String xPathStringVal = null;
try {
ap.selectXPath(query);
ap.bind(vn);
int i = -1;
while ((i = ap.evalXPath()) != -1) {
xPathStringVal = vn.getXPathStringVal();
}
}catch (XPathEvalException e) {
e.printStackTrace();
} catch (NavException e) {
e.printStackTrace();
} catch (XPathParseException e) {
e.printStackTrace();
}
return xPathStringVal;
}
My xml files have specific format, they are divided into lot of parts - segments, and my queries are same for all segments(I am querying it in a loop). For example part of xml:
<segment>
<a>
<b>value1</b>
<c>
<d>value2</d>
<e>value3</d>
</c>
</a>
</segment>
<segment>
<a>
<b>value4</b>
<c>
<d>value5</d>
<e>value6</d>
<f>value6</d>
</c>
</a>
</segment>
...
If I want to get value1 in first segment I am using query:
//segment[1]/a/b
for value 4 in second segment
//segment[2]/a/b
etc.
Intuition says a few things: in my approach every query is independent (it doesn't know anything about other query), it means that AutoPilot, my iterator, always starts at the beginning of the file when I want to query it.
My question is: Is there any way to set AutoPilot at the beginning of processing segment? And when I finish querying move AutoPilot to next segment? I think that if my method will start searching value not from the beginning but from specifying point It will be much faster.
Another way is to divide xml file into small xml files (one xml file = one segment) and querying those small xml files.
What do you think guys? Thanks in advance
Minor: The replace is not needed as UTF-8 is the default encoding; only when there is an encoding, one would need to patch it to UTF-8.
The XPath should only done once, to not start from [0] to the next index.
If you need a List representation you could use JAXB with annotations.
An event based primitive parsing without DOM object probably is best (SAXParser).
Handler handler = new org.xml.sax.helpers.DefaultHandler {
#Override
public void startElement(String uri,
String localName, String qName, Attributes attributes) throws SAXException {
}
#Override
public void endElement(String uri,
String localName, String qName) throws SAXException {
}
#Override
public void characters(char ch[], int start, int length) throws SAXException {
}
};
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
InputStream in = new ByteArrayInputStream(bytes);
parser.parse(in, handler);
I am new to XML. I want to read the following XML on the basis of request name. Please help me on how to read the below XML in Java -
<?xml version="1.0"?>
<config>
<Request name="ValidateEmailRequest">
<requestqueue>emailrequest</requestqueue>
<responsequeue>emailresponse</responsequeue>
</Request>
<Request name="CleanEmail">
<requestqueue>Cleanrequest</requestqueue>
<responsequeue>Cleanresponse</responsequeue>
</Request>
</config>
If your XML is a String, Then you can do the following:
String xml = ""; //Populated XML String....
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xml)));
Element rootElement = document.getDocumentElement();
If your XML is in a file, then Document document will be instantiated like this:
Document document = builder.parse(new File("file.xml"));
The document.getDocumentElement() returns you the node that is the document element of the document (in your case <config>).
Once you have a rootElement, you can access the element's attribute (by calling rootElement.getAttribute() method), etc. For more methods on java's org.w3c.dom.Element
More info on java DocumentBuilder & DocumentBuilderFactory. Bear in mind, the example provided creates a XML DOM tree so if you have a huge XML data, the tree can be huge.
Related question.
Update Here's an example to get "value" of element <requestqueue>
protected String getString(String tagName, Element element) {
NodeList list = element.getElementsByTagName(tagName);
if (list != null && list.getLength() > 0) {
NodeList subList = list.item(0).getChildNodes();
if (subList != null && subList.getLength() > 0) {
return subList.item(0).getNodeValue();
}
}
return null;
}
You can effectively call it as,
String requestQueueName = getString("requestqueue", element);
In case you just need one (first) value to retrieve from xml:
public static String getTagValue(String xml, String tagName){
return xml.split("<"+tagName+">")[1].split("</"+tagName+">")[0];
}
In case you want to parse whole xml document use JSoup:
Document doc = Jsoup.parse(xml, "", Parser.xmlParser());
for (Element e : doc.select("Request")) {
System.out.println(e);
}
If you are just looking to get a single value from the XML you may want to use Java's XPath library. For an example see my answer to a previous question:
How to use XPath on xml docs having default namespace
It would look something like:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class Demo {
public static void main(String[] args) {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document dDoc = builder.parse("E:/test.xml");
XPath xPath = XPathFactory.newInstance().newXPath();
Node node = (Node) xPath.evaluate("/Request/#name", dDoc, XPathConstants.NODE);
System.out.println(node.getNodeValue());
} catch (Exception e) {
e.printStackTrace();
}
}
}
There are a number of different ways to do this. You might want to check out XStream or JAXB. There are tutorials and the examples.
If the XML is well formed then you can convert it to Document. By using the XPath you can get the XML Elements.
String xml = "<stackusers><name>Yash</name><age>30</age></stackusers>";
Form XML-String Create Document and find the elements using its XML-Path.
Document doc = getDocument(xml, true);
public static Document getDocument(String xmlData, boolean isXMLData) throws Exception {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
dbFactory.setNamespaceAware(true);
dbFactory.setIgnoringComments(true);
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc;
if (isXMLData) {
InputSource ips = new org.xml.sax.InputSource(new StringReader(xmlData));
doc = dBuilder.parse(ips);
} else {
doc = dBuilder.parse( new File(xmlData) );
}
return doc;
}
Use org.apache.xpath.XPathAPI to get Node or NodeList.
System.out.println("XPathAPI:"+getNodeValue(doc, "/stackusers/age/text()"));
NodeList nodeList = getNodeList(doc, "/stackusers");
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList));
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList.item(0)));
public static String getNodeValue(Document doc, String xpathExpression) throws Exception {
Node node = org.apache.xpath.XPathAPI.selectSingleNode(doc, xpathExpression);
String nodeValue = node.getNodeValue();
return nodeValue;
}
public static NodeList getNodeList(Document doc, String xpathExpression) throws Exception {
NodeList result = org.apache.xpath.XPathAPI.selectNodeList(doc, xpathExpression);
return result;
}
Using javax.xml.xpath.XPathFactory
System.out.println("javax.xml.xpath.XPathFactory:"+getXPathFactoryValue(doc, "/stackusers/age"));
static XPath xpath = javax.xml.xpath.XPathFactory.newInstance().newXPath();
public static String getXPathFactoryValue(Document doc, String xpathExpression) throws XPathExpressionException, TransformerException, IOException {
Node node = (Node) xpath.evaluate(xpathExpression, doc, XPathConstants.NODE);
String nodeStr = getXmlContentAsString(node);
return nodeStr;
}
Using Document Element.
System.out.println("DocumentElementText:"+getDocumentElementText(doc, "age"));
public static String getDocumentElementText(Document doc, String elementName) {
return doc.getElementsByTagName(elementName).item(0).getTextContent();
}
Get value in between two strings.
String nodeVlaue = org.apache.commons.lang.StringUtils.substringBetween(xml, "<age>", "</age>");
System.out.println("StringUtils.substringBetween():"+nodeVlaue);
Full Example:
public static void main(String[] args) throws Exception {
String xml = "<stackusers><name>Yash</name><age>30</age></stackusers>";
Document doc = getDocument(xml, true);
String nodeVlaue = org.apache.commons.lang.StringUtils.substringBetween(xml, "<age>", "</age>");
System.out.println("StringUtils.substringBetween():"+nodeVlaue);
System.out.println("DocumentElementText:"+getDocumentElementText(doc, "age"));
System.out.println("javax.xml.xpath.XPathFactory:"+getXPathFactoryValue(doc, "/stackusers/age"));
System.out.println("XPathAPI:"+getNodeValue(doc, "/stackusers/age/text()"));
NodeList nodeList = getNodeList(doc, "/stackusers");
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList));
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList.item(0)));
}
public static String getXmlContentAsString(Node node) throws TransformerException, IOException {
StringBuilder stringBuilder = new StringBuilder();
NodeList childNodes = node.getChildNodes();
int length = childNodes.getLength();
for (int i = 0; i < length; i++) {
stringBuilder.append( toString(childNodes.item(i), true) );
}
return stringBuilder.toString();
}
OutPut:
StringUtils.substringBetween():30
DocumentElementText:30
javax.xml.xpath.XPathFactory:30
XPathAPI:30
XPathAPI NodeList:<stackusers>
<name>Yash</name>
<age>30</age>
</stackusers>
XPathAPI NodeList:<name>Yash</name><age>30</age>
following links might help
http://labe.felk.cvut.cz/~xfaigl/mep/xml/java-xml.htm
http://developerlife.com/tutorials/?p=25
http://www.java-samples.com/showtutorial.php?tutorialid=152
There are two general ways of doing that. You will either create a Domain Object Model of that XML file, take a look at this
and the second choice is using event driven parsing, which is an alternative to DOM xml representation. Imho you can find the best overall comparison of these two basic techniques here. Of course there are much more to know about processing xml, for instance if you are given XML schema definition (XSD), you could use JAXB.
There are various APIs available to read/write XML files through Java.
I would refer using StaX
Also This can be useful - Java XML APIs
You can make a class which extends org.xml.sax.helpers.DefaultHandler and call
start_<tag_name>(Attributes attrs);
and
end_<tag_name>();
For it is:
start_request_queue(attrs);
etc.
And then extends that class and implement xml configuration file parsers you want. Example:
...
public void startElement(String uri, String name, String qname,
org.xml.sax.Attributes attrs)
throws org.xml.sax.SAXException {
Class[] args = new Class[2];
args[0] = uri.getClass();
args[1] = org.xml.sax.Attributes.class;
try {
String mname = name.replace("-", "");
java.lang.reflect.Method m =
getClass().getDeclaredMethod("start" + mname, args);
m.invoke(this, new Object[] { uri, (org.xml.sax.Attributes)attrs });
}
catch (IllegalAccessException e) {
throw new RuntimeException(e);
}
catch (NoSuchMethodException e) {
throw new RuntimeException(e); }
catch (java.lang.reflect.InvocationTargetException e) {
org.xml.sax.SAXException se =
new org.xml.sax.SAXException(e.getTargetException());
se.setStackTrace(e.getTargetException().getStackTrace());
}
and in a particular configuration parser:
public void start_Request(String uri, org.xml.sax.Attributes attrs) {
// make sure to read attributes correctly
System.err.println("Request, name="+ attrs.getValue(0);
}
Since you are using this for configuration, your best bet is apache commons-configuration. For simple files it's way easier to use than "raw" XML parsers.
See the XML how-to
In a xsl transformation I have a xslt file that includes some other xslt. The problem is that the URI for these xslt contains illegal characters, in particular '##'. The xslt looks like this:
<xsl:include href="/appdm/tomcat/webapps/sentys##1.0.0/WEB-INF/classes/xslt/release_java/xslt/gen.xslt" />
and when I try to instantiate a java Transformer I get the error:
javax.xml.transform.TransformerConfigurationException: javax.xml.transform.TransformerConfigurationException: javax.xml.transform.TransformerException: org.xml.sax.SAXException: org.apache.xml.utils.URI$MalformedURIException: Fragment contains invalid character:#
This is the java code:
public String xslTransform2String(String sXml, String sXslt) throws Exception {
String sResult = null;
try {
Source oStrSource = createStringSource(sXml);
DocumentBuilderFactory oDocFactory = DocumentBuilderFactory.newInstance();
oDocFactory.setNamespaceAware(true);
//sXslt is the xslt content with the inclusions
//<xsl:include href="/appdm/tomcat/webapps/sentys##1.0.0/WEB-INF/classes/xslt/release_java/xslt/gen.xslt" />"
Document oDocXslt = oDocFactory.newDocumentBuilder().parse(new InputSource(new StringReader(sXslt)));
Source oXsltSource = new DOMSource(oDocXslt);
StringWriter oStrOut = new StringWriter();
Result oTransRes = createStringResult(oStrOut);
Transformer oTrans = createXsltTransformer(oXsltSource);
oTrans.transform(oStrSource, oTransRes);
sResult = oStrOut.toString();
} catch (Exception oEx) {
throw new BddException(oEx, XmlProvider.ERR_XSLT, null);
}
return sResult;
}
private Transformer createXsltTransformer(Source oXsltSource) throws Exception {
Transformer transformer = getXsltTransformerFactory().newTransformer(
oXsltSource);
ErrorListener errorListener = new DefaultErrorListener();
transformer.setErrorListener(errorListener);
return transformer;
}
is there a way I can go with relative paths instead of absolute path?
Thank you
To avoid the MalformedURIException, replace the second or both # with %23.
See https://stackoverflow.com/a/5007362/4092205
We are performing some operations on embedded/Nested XML.I am using SAXParser to parse the entire XML file.I want to get the entire nested XML with tags and value.For example my XML looks like.
I want entire XML within the <ANY_ELEMENT>.....</ANY-ELEMENT> tag.
<?xml version="1.0" encoding="UTF-8"?>
<x:xMessage xmlns:x="http://www.connecture.com/integration/x" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.connecture.com/integration/x xMessageWrapper.xsd
">
<x:xMessageHeader>
<Version>850</Version>
<Source>Source</Source>
<Target>target</Target>
<Timestamp>2013-12-31T12:00:00</Timestamp>
<RequestID>123456</RequestID>
<ResponseID>54321</ResponseID>
<Priority>3</Priority>
<Username>Deepak</Username>
<Password>Kumar</Password>
</x:xMessageHeader>
<x:xMessageBody>
<ANY-ELEMENT>
<xEnveloped_834A1 xsi:schemaLocation="....." xmlns="......."
..........................
..........................
some Complex XML
..........................
..........................
..........................
</ANY-ELEMENT>
</x:XMessageBody>
</x:XMessage>
Handler class Sample code:
public class MessageWrapperHandler extends DefaultHandler {
private boolean bActualMessage = false;
private String actualMessage = null;
private long lengthActualMessage=0;
public void startElement(String uri, String localName, String qName, Attributes attributes) {
if (qName.equalsIgnoreCase("ANY-ELEMENT")) {
bActualMessage = true;
//lengthActualMessage=How to know the length of Child XML
}
}
public void characters(char ch[], int start, int length) {
if (bActualMessage) {
actualMessage = new String(ch, start, length);
//trying to get embedded XML
bActualMessage = false;
}
}
}
But since next element after is XML content so giving me nothing.SO How to achieve it.
EDIT: You are free to modify XML after <ANY-ELEMENT> like adding contents into CDATA
Instead of SAX, I would recommend using StAX (a StAX implementation is included in the JDK/JRE since Java SE 6). StAX is similar to SAX except instead of having the events pushed to you, you pull (request) them.
In the code below the XMLStreamReader is advanced to the ANY-ELEMENT element. Once it is at the correct position you can interact with it as you wish.
import javax.xml.stream.*;
import javax.xml.transform.stream.StreamSource;
public class Demo {
public static void main(String[] args) throws Exception {
XMLInputFactory xif = XMLInputFactory.newFactory();
StreamSource xmlSource = new StreamSource("src/forum19559825/input.xml");
XMLStreamReader xsr = xif.createXMLStreamReader(xmlSource);
Demo demo = new Demo();
demo.positionXMLStreamReaderAtAnyElement(xsr);
demo.processAnyElement(xsr);
}
private void positionXMLStreamReaderAtAnyElement(XMLStreamReader xsr) throws Exception {
while(xsr.hasNext()) {
if(xsr.getEventType() == XMLStreamReader.START_ELEMENT && "ANY-ELEMENT".equals(xsr.getLocalName())) {
break;
}
xsr.next();
}
}
private void processAnyElement(XMLStreamReader xmlStreamReaderAtAnyElement) {
// TODO: Stuff
System.out.println("FOUND IT");
}
}
problem in parsing special character attributes using jdom
ex
< tag xml:lang="123" >
this case getAttributes() method return null
is there any solution to fix this.
Works without problems for me:
public class TestJdom
{
public static void main(String[] args) throws JDOMException, IOException {
String xmlString = "<test><tag xml:lang=\"123\"></tag></test>";
SAXBuilder builder = new SAXBuilder();
StringReader stringReader = new StringReader(new String(xmlString
.getBytes()));
Document doc = builder.build(stringReader);
List<?> attrs = doc.getRootElement().getChild("tag").getAttributes();
System.out.println(attrs);
}
}
You probably need to set namespace, check http://cs.au.dk/~amoeller/XML/programming/jdomexample.html