Read few xml elements only in an efficient way - java

I want to read only few XML tag values .I have written the below code.XML is big and a bit complex. But for example I have simplified the xml . Is there any other efficient way to solve it ?I am using JAVA 8
DocumentBuilderFactory dbfaFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = dbfaFactory.newDocumentBuilder();
Document doc = documentBuilder.parse("xml_val.xml");
System.out.println(doc.getElementsByTagName("date_added").item(0).getTextContent());
<item_list id="item_list01">
<numitems_intial>5</numitems_intial>
<item>
<date_added>1/1/2014</date_added>
<added_by person="person01" />
</item>
<item>
<date_added>1/6/2014</date_added>
<added_by person="person05" />
</item>
<numitems_current>7</numitems_current>
<manager person="person48" />
</item_list>

Using XPAth and passing a specific expression to get the desired element
public class MainJaxbXpath {
public static void main(String[] args) {
try {
FileInputStream fileIS;
fileIS = new FileInputStream("/home/luis/tmp/test.xml");
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
builder = builderFactory.newDocumentBuilder();
Document xmlDocument;
xmlDocument = builder.parse(fileIS);
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "//item_list[#id=\"item_list01\"]//date_added[1]";
String nodeList =(String) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.STRING);
System.out.println(nodeList);
} catch (SAXException | IOException | ParserConfigurationException | XPathExpressionException e3) {
e3.printStackTrace();
}
}
}
Result:
1/1/2014
To look for more than one element on the same operation
String expression01 = "//item_list[#id=\"item_list01\"]//date_added[1]";
String expression02 = "//item_list[#id=\"item_list02\"]//date_added[2]";
String expression = String.format("%s | %s", expression01, expression02);
NodeList nodeList =(NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
Node currentNode = nodeList.item(i);
if (currentNode.getNodeType() == Node.ELEMENT_NODE) {
System.out.println(currentNode.getTextContent());
}
}

Some suggestions.
Firstly, don't use DOM. There's a wide range of dom-like XML tree representations available in Java; DOM is the first and the worst. Later third-party models like JDOM2 and XOM are much better designed.
Secondly, consider doing the whole thing in an XML-oriented language like XSLT or XQuery rather than in Java. In XQuery, using Saxon's XQuery API, this would be:
Processor proc = new Processor(false);
XQueryCompiler comp = proc.newXQueryCompiler();
XQueryExecutable exec = comp.compile("//date_added");
XQueryEvaluator eval = exec.load();
eval.setSource(new StreamSource(new File("/home/luis/tmp/test.xml")));
for (XdmItem item : eval.evaluate()) {
System.out.println(item.getStringValue());
}
But since the query is so simple, Saxon also has a direct map/reduce style API to access the tree. This would be:
Processor proc = new Processor(false);
XdmNode doc = proc.newDocumentBuilder().build(
new StreamSource(new File("/home/luis/tmp/test.xml")));
for (XdmItem item : doc.select(descendant("date_added")).asList()) {
System.out.println(item.getStringValue());
}
A suggestion that has nothing to do with efficiency: please use international standard dates. 1/6/2014 could be 1st June or 6th January. Writing it as 2014-06-01 (or 2014-01-06 if that's what you intended) not only avoids the kind of dangerous bugs that arise if you use an ambiguous format, it also means you can use standard date-and-time processing libraries, such as the XPath 2.0+ function library.

Related

Link XML and XSD using java

i'm trying to write the header for an xml file so it would be something like this:
<file xmlns="http://my_namespace"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://my_namespace file.xsd">
however, I can't seem to find how to do it using the Document class in java. This is what I have:
public void exportToXML() {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder;
try {
dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.newDocument();
doc.setXmlStandalone(true);
doc.createTextNode("<file xmlns=\"http://my_namespace"\n" +
"xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n" +
"xsi:schemaLocation=\"http://my_namespace file.xsd\">");
Element mainRootElement = doc.createElement("MainRootElement");
doc.appendChild(mainRootElement);
for(int i = 0; i < tipoDadosParaExportar.length; i++) {
mainRootElement.appendChild(criarFilhos(doc, tipoDadosParaExportar[i]));
}
Transformer tr = TransformerFactory.newInstance().newTransformer();
tr.transform(new DOMSource(doc),
new StreamResult(new FileOutputStream(filename)));
} catch (Exception e) {
e.printStackTrace();
}
}
I tried writing it on the file using the createTextNode but it didn't work either, it only writes the version before showing the elements.
PrintStartXMLFile
Would appreciate if you could help me. Have a nice day
Your createTextNode() method is only suitable for creating text nodes, it's not suitable for creating elements. You need to use createElement() for this. If you're doing this by building a tree, then you need to build nodes, you can't write lexical markup.
I'm not sure what MainRootElement is supposed to be; you've only given a fragment of your desired output so it's hard to tell.
Creating a DOM tree and then serializing it is a pretty laborious way of constructing an XML file. Using something like an XMLEventWriter is easier. But to be honest, I got frustrated by all the existing approaches and wrote a new library for the purpose as part of Saxon 10. It's called simply "Push", and looks something like this:
Processor proc = new Processor();
Serializer serializer = proc.newSerializer(new File(fileName));
Push push = proc.newPush(serializer);
Document doc = push.document(true);
doc.setDefaultNamespace("http://my_namespace");
Element root = doc.element("root")
.attribute(new QName("xsi", "http://www.w3.org/2001/XMLSchema-instance", "schemaLocation"),
"http://my_namespace file.xsd");
doc.close();

Extract XML blocks as string in Java

I have an XML as below
<accountProducts>
<accountProduct>...</accountProduct>
<accountProduct>...</accountProduct>
<accountProduct>...</accountProduct>
<accountProduct>...</accountProduct>
</accountProducts>
Now I want to extract each of the accountProduct block as string. So is there any XML parsing technique to do that or I need to do string manipulation.
Any help please.
Using the DOM as suggested above, you will need to parse your XML with a DocumentBuilder.
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
//if your document has namespaces, you can specify that in your builder.
DocumentBuilder db = dbf.newDocumentBuilder();
Using this object, you can call the parse() method.
Your XML input can be provided to a DOM parser as a file or as a stream.
As a file...
File f = new File("MyXmlFile.xml");
Document d = db.parse(f);
As a string...
String myXmlString = "...";
InputSource ss = new InputSource(new StringReader(myXmlString));
Document d = db.parse(ss);
Once you have a Document object, you can traverse the document with DOM functions or with XPATH. This example illustrates the DOM methods.
In your example, assuming that accountProduct nodes contain only text, the following should work.
NodeList nl = d.getElementsByTagName("accountProduct");
for(int i=0; i<nl.getLength(); i++) {
Element elem = (Element)nl.item(i);
System.out.println(elem.getTextContent());
}
If accountProduct contains mixed content (text and elements), you would need more code to extract what you need.
Use JAXP for this.
The Java API for XML Processing (JAXP) is for processing XML data using applications written in the Java programming language.

Xpath approach in case of large files

The class you're gonna see right now is the classic approach to parse an XML document via XPath in Java:
public class Main {
private Document createXMLDocument(String fileName) throws Exception {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(fileName);
return doc;
}
private NodeList readXMLNodes(Document doc, String xpathExpression) throws Exception {
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile(xpathExpression);
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
return nodes;
}
public static void main(String[] args) throws Exception {
Main m = new Main();
Document doc = m.createXMLDocument("tv.xml");
NodeList nodes = m.readXMLNodes(doc, "//serie/eason/#id");
int n = nodes.getLength();
Map<Integer, List<String>> series = new HashMap<Integer, List<String>>();
for (int i = 1; i <= n; i++) {
nodes = m.readXMLNodes(doc, "//serie/eason[#id='" + i + "']/episode/text()");
List<String> episodes = new ArrayList<String>();
for (int j = 0; j < nodes.getLength(); j++) {
episodes.add(nodes.item(j).getNodeValue());
}
series.put(i, episodes);
}
for (Map.Entry<Integer, List<String>> entry : series.entrySet()) {
System.out.println("Season: " + entry.getKey());
for (String ep : entry.getValue()) {
System.out.println("Episodio: " + ep);
}
System.out.println("+------------------------------------+");
}
}
}
In there I find some methods to be worrying in case of a huge xml file. Like the use of
Document doc = builder.parse(fileName);
return doc;
or
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
return nodes;
I'm worried because the xml document I need to handle is created by the customer and inside you can basically have an indefinite number of records describing emails and their contents (every user has its own personal email, so lots of html in there). I know it's not the smartest approach but it's one of the possibilities and it was already up and running before I arrived here.
My question is: how can I parse and evaluate huge xml files using xpath?
You could use the StAX parser. It will take less memory than the DOM options. A good introduction to StAX is at http://tutorials.jenkov.com/java-xml/stax.html
First of all, XPath doesn't parse XML. Your createXMLDocument() method does that, producing as output a tree representation of the parsed XML. The XPath is then used to search the tree representation.
What you are really looking for is something that searches the XML on the fly, while it is being parsed.
One way to do this is with an XQuery system that implements "document projection" (for example, Saxon-EE). This will analyze your query to see what parts of the document are needed, and when you parse your document, it will build a tree containing only those parts of the document that are actually needed.
If the query is as simple as the one in your example, however, then it isn't too hard to code it as a SAX application, where events such as startElement and endElement are notified by the XML parser to the application, without building a tree in memory.

Parsing XML from website to an Android device

I am starting an Android application that will parse XML from the web. I've created a few Android apps but they've never involved parsing XML and I was wondering if anyone had any tips on the best way to go about it?
Here's an example:
try {
URL url = new URL(/*your xml url*/);
URLConnection conn = url.openConnection();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(conn.getInputStream());
NodeList nodes = doc.getElementsByTagName(/*tag from xml file*/);
for (int i = 0; i < nodes.getLength(); i++) {
Element element = (Element) nodes.item(i);
NodeList title = element.getElementsByTagName(/*item within the tag*/);
Element line = (Element) title.item(0);
phoneNumberList.add(line.getTextContent());
}
}
catch (Exception e) {
e.printStackTrace();
}
In my example, my XML file looks a little like:
<numbers>
<phone>
<string name = "phonenumber1">555-555-5555</string>
</phone>
<phone>
<string name = "phonenumber2">555-555-5555</string>
</phone>
</numbers>
and I would replace /*tag from xml file*/ with "phone" and /*item within the tag*/ with "string".
I always use the w3c dom classes. I have a static helper method that I use to parse the xml data as a string and returns to me a Document object. Where you get the xml data can vary (web, file, etc) but eventually you load it as a string.
something like this...
Document document = null;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
try
{
builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(data));
document = builder.parse(is);
}
catch (SAXException e) { }
catch (IOException e) { }
catch (ParserConfigurationException e) { }
There are different types of parsing mechanisms available, one is SAX Here is SAX parsing example, second is DOM parsing Here is DOM Parsing example.. From your question it is not clear what you want, but these may be good starting points.
There are three types of parsing I know: DOM, SAX and XMLPullParsing.
In my example here you need the URL and the parent node of the XML element.
try {
URL url = new URL("http://www.something.com/something.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();
NodeList nodeList1 = doc.getElementsByTagName("parent node here");
for (int i = 0; i < nodeList1.getLength(); i++) {
Node node = nodeList1.item(i);
}
} catch(Exception e) {
}
Also try this.
I would use the DOM parser, it is not as efficient as SAX, if the XML file is not too large, as it is easier in that case.
I have made just one android App, that involved XML parsing. XML received from a SOAP web service. I used XmlPullParser. The implementation from Xml.newPullParser() had a bug where calls to nextText() did not always advance to the END_TAG as the documentation promised. There is a work around for this.

How do I extract child element from XML to a string in Java?

If I have an XML document like
<root>
<element1>
<child attr1="blah">
<child2>blahblah</child2>
<child>
</element1>
</root>
I want to get an XML string with the first child element. My output string would be
<element1>
<child attr1="blah">
<child2>blahblah</child2>
<child>
</element1>
There are many approaches, would like to see some ideas. I've been trying to use Java XML APIs for it, but it's not clear that there is a good way to do this.
thanks
You're right, with the standard XML API, there's not a good way - here's one example (may be bug ridden; it runs, but I wrote it a long time ago).
import javax.xml.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import org.w3c.dom.*;
import java.io.*;
public class Proc
{
public static void main(String[] args) throws Exception
{
//Parse the input document
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new File("in.xml"));
//Set up the transformer to write the output string
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
transformer.setOutputProperty("indent", "yes");
StringWriter sw = new StringWriter();
StreamResult result = new StreamResult(sw);
//Find the first child node - this could be done with xpath as well
NodeList nl = doc.getDocumentElement().getChildNodes();
DOMSource source = null;
for(int x = 0;x < nl.getLength();x++)
{
Node e = nl.item(x);
if(e instanceof Element)
{
source = new DOMSource(e);
break;
}
}
//Do the transformation and output
transformer.transform(source, result);
System.out.println(sw.toString());
}
}
It would seem like you could get the first child just by using doc.getDocumentElement().getFirstChild(), but the problem with that is if there is any whitespace between the root and the child element, that will create a Text node in the tree, and you'll get that node instead of the actual element node. The output from this program is:
D:\home\tmp\xml>java Proc
<?xml version="1.0" encoding="UTF-8"?>
<element1>
<child attr1="blah">
<child2>blahblah</child2>
</child>
</element1>
I think you can suppress the xml version string if you don't need it, but I'm not sure on that. I would probably try to use a third party XML library if at all possible.
Since this is the top google answer and For those of you who just want the basic:
public static String serializeXml(Element element) throws Exception
{
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
StreamResult result = new StreamResult(buffer);
DOMSource source = new DOMSource(element);
TransformerFactory.newInstance().newTransformer().transform(source, result);
return new String(buffer.toByteArray());
}
I use this for debug, which most likely is what you need this for
I would recommend JDOM. It's a Java XML library that makes dealing with XML much easier than the standard W3C approach.
public String getXML(String xmlContent, String tagName){
String startTag = "<"+ tagName + ">";
String endTag = "</"+ tagName + ">";
int startposition = xmlContent.indexOf(startTag);
int endposition = xmlContent.indexOf(endTag, startposition);
if (startposition == -1){
return "ddd";
}
startposition += startTag.length();
if(endposition == -1){
return "eee";
}
return xmlContent.substring(startposition, endposition);
}
Pass your xml as string to this method,and in your case pass 'element' as parameter tagname.
XMLBeans is an easy to use (once you get the hang of it) tool to deal with XML without having to deal with the annoyances of parsing.
It requires that you have a schema for the XML file, but it also provides a tool to generate a schema from an exisint XML file (depending on your needs the generated on is probably fine).
If your xml has schema backing it, you could use xmlbeans or JAXB to generate pojo objects that help you marshal/unmarshal xml.
http://xmlbeans.apache.org/
https://jaxb.dev.java.net/
As question is actually about first occurrence of string inside another string, I would use String class methods, instead of XML parsers:
public static String getElementAsString(String xml, String tagName){
int beginIndex = xml.indexOf("<" + tagName);
int endIndex = xml.indexOf("</" + tagName, beginIndex) + tagName.length() + 3;
return xml.substring(beginIndex, endIndex);
}
You can use following function to extract xml block as string by passing proper xpath expression,
private static String nodeToString(Node node) throws TransformerException
{
StringWriter buf = new StringWriter();
Transformer xform = TransformerFactory.newInstance().newTransformer();
xform.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
xform.transform(new DOMSource(node), new StreamResult(buf));
return(buf.toString());
}
public static void main(String[] args) throws Exception
{
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(inputFile);
XPath xPath = XPathFactory.newInstance().newXPath();
Node result = (Node)xPath.evaluate("A/B/C", doc, XPathConstants.NODE); //"A/B[id = '1']" //"//*[#type='t1']"
System.out.println(nodeToString(result));
}

Categories