Reading XML as string in Java - java

Could somebody help me with this. I would like to know how to read this example as string? I know how to read first one but don't know how to read them all
<Tr rn=\"000000000000000\" vr=\"T\" sSpre=\"S\" reg=\"P\" dSpre=\"2000-01-01\" dOdprt=\"2000-01-01\" iban=\"SI00\" eno=\"R\" vir=\"B\" maticnaPps=\"0000000000\"><Imetnik davcna=\"00000000\" matSub=\"0000000000\" drz=\"705\"><PopolnoIme>UNKNOWN</PopolnoIme><KratkoIme>UNKNOWN</KratkoIme><Naslov sifTipNaslova=\"00\" sifObcina=\"000\" sifPosta=\"0000\" sifUlica=\"0000\" sifNaselje=\"000\" stHisna=\"000\" sifHsmid=\"00000000\"><Obcina>UNKNOWN</Obcina><Posta>UNKNOWN</Posta><Ulica>UNKNOWN</Ulica><Naselje>UNKNOWN</Naselje></Naslov></Imetnik></Tr>

Maybe this is what you are looking for? Example here: http://ideone.com/N4jIO
import java.io.ByteArrayInputStream;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class Main {
public static void main(String... args) throws IOException, SAXException, ParserConfigurationException {
String xml = "<Tr rn=\"000000000000000\" vr=\"T\" sSpre=\"S\" reg=\"P\" dSpre=\"2000-01-01\" dOdprt=\"2000-01-01\" iban=\"SI00\" eno=\"R\" vir=\"B\" maticnaPps=\"0000000000\"><Imetnik davcna=\"00000000\" matSub=\"0000000000\" drz=\"705\"><PopolnoIme>UNKNOWN</PopolnoIme><KratkoIme>UNKNOWN</KratkoIme><Naslov sifTipNaslova=\"00\" sifObcina=\"000\" sifPosta=\"0000\" sifUlica=\"0000\" sifNaselje=\"000\" stHisna=\"000\" sifHsmid=\"00000000\"><Obcina>UNKNOWN</Obcina><Posta>UNKNOWN</Posta><Ulica>UNKNOWN</Ulica><Naselje>UNKNOWN</Naselje></Naslov></Imetnik></Tr>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
print(doc.getDocumentElement(), "");
}
private static void print(Node e, String tab) {
if (e.getNodeType() == Node.TEXT_NODE) {
System.out.println(tab + e.getNodeValue());
return;
}
System.out.print(tab + e.getNodeName());
NamedNodeMap as = e.getAttributes();
if (as != null && as.getLength() > 0) {
System.out.print(" attributes=[");
for (int i = 0; i < as.getLength(); i++)
System.out.print((i == 0 ? "" : ", ") + as.item(i));
System.out.print("]");
}
System.out.println();
if (e.getNodeValue() != null)
System.out.println(tab + " " + e.getNodeValue());
NodeList childs = e.getChildNodes();
for (int i = 0; i < childs.getLength(); i++)
print(childs.item(i), tab + " ");
}
}

If your goal is to load/parse an XML Document from a String object, you'll simply need to use the usual XML document loading code, but to use a StringReader to provide your inputstream. (or a ByteArrayInputStream, or anything really as long as you build up a chain of transformations that lets your access your data as an InputStream).
An example follows here (untested and without exception handling. Sorry, I don't have a test environment at the moment):
final DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
final DocumentBuilder db = f.newDocumentBuilder();
final InputSource is = new InputSource();
is.setCharacterStream(new StringReader(YOURSTRING));
final Document doc = db.parse(is);
doc.getDocumentElement().normalize();
/*
* do whatever you want/need here.
*/
If that's not what you wanted, sorry I am not quite sure what you were asking here.

Using xerces could be more understandable:
public static void loadImetniks(String filePath) {
File xmlFile;
SAXBuilder builder;
Element root, child;
Imetnik imet;//another class that you have to create to help you for parsing
Document doc;
try {
xmlFile = new File(filePath);
builder = new SAXBuilder(); // parameters control validation, etc
doc = builder.build(xmlFile);
root = doc.getRootElement(); // Tr could be the root but I am not sure if you will have more Tr nodes in the same file??
tr.setRn(root.getAttributeValue(Constants.RN));//define the constants string in another file
tr.setVr(root.getAttributeValue(Constants.VR));
tr.setSspre(root.getAttributeValue(Constants.SSPRE));
tr.setReg(root.getAttributeValue(Constants.REG));
tr.setIban(root.getAttributeValue(Constants.IBAN));
.... //repeat for every attribute
....
List children = root.getChildren(); // depends of how many Imetnik you will have
for (Iterator iter = children.iterator(); iter.hasNext();) {
child = (Element) iter.next();
imet = new Imetnik();
imet.loadXML(child); // you have to define the loadXML function in your object Imetnik which should extract the attributes and internal nodes
//imets.add(contest); // just use in the case that you will have to extract more than one Imetnik node
}
} catch (Exception e) {
log.error("Error al hacer el parsing del contests.xml!");
log.error(e.getMessage());
}
}
For instance, your Imetnik class should contain:
public void loadXML(Element root) {
Element child;
//Naslov naslov; // for Naslov because it could be an object itself
davcna = root.getAttributeValue(Constants.DAVCNA); //define the string constant
matSub = root.getAttributeValue(Constants.MATSUB); //define the string constant
drz = Integer.parseInt(root.getAttributeValue(Constants.DRZ)); //define the string constant
List children = root.getChildren(); // your root is Imetnik now
for (Iterator iter = children.iterator(); iter.hasNext();) {
.....
.......
}
}

The best solution to parse XML files in Java is to use a dedicated library such as:
Xerces
Sax

Related

evaluateXPath runs slow for repeating 1 XML Element in java

I have about 1,000,000 XML files and I am using XpathExpression with Java language to walk through the XML tags and get my considered data.
Imagine I have about 5000 tags for name, 5000 tags for family name, 5000 tags for age, and only 1 tag for date in each file. Now I want to repeat date tag to 5000 times too.
Blow code is runnable for XML files with Java programming with less than 20MB size, but I have files with more than 20MB size and it takes so many times to run and in some cases, I got Out of memory error in eclipse( I tried adding vmargs in the run configuration of Eclipse but it takes so much time and still so low.)
I am pretty sure there is a problem with my array for repeting date tag and it is not optimized, I really appreciate if you mind and have a look at my code, in addition I should say that i am newbie to java:
package TEST;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Stream;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import java.util.Arrays;
public class Data {
//function started
public static void main(String[] args) throws Exception
{
//Get Files
String doc ="MyFileNew";
String dump="";
int number =200;
for (int i=1; i<=201; i++) {
number ++;
dump = doc+number;
String fileName= "/root/MyFiles/" + dump + ".xml";
Document document = getDocument(fileName);
;
FileWriter fw = null;
BufferedWriter bw = null;
PrintWriter pw = null;
//Using Document Builder
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document doc1 = documentBuilder.parse(fileName);
/*******Get attribute values using xpath******/
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
try {
fw = new FileWriter("/root/Results/" + dump + ".txt");
bw = new BufferedWriter(fw);
pw = new PrintWriter(bw);
//Printing Name tags
pw.println( "Name"+ evaluateXPath(document, "/xml/item/item[#key='Name']/text()") );
//Counting Name tags
XPathExpression expr1 = xpath.compile("count(/xml/item/item[#key='Name']/)");
Number result1 = (Number) expr1.evaluate(doc1, XPathConstants.NUMBER);
int n = result1.intValue();
//Printing FamilyName tags
pw.println( "FamilyName: " + evaluateXPath(document, "/xml/item/item[#key='FamilyName']/text() \n") );
//Printing Age tags
pw.println( "Age: " + evaluateXPath(document, "/xml/item/item[#key='Age']/text() \n") );
//Repeating Date based on counting name tags
String[] strArray = new String[0];
for (int q=0; q<n;q++){
List<String> strArraytmp = evaluateXPath(document,"/xml/item/item[#key='date']/text()");
String[] strings = strArraytmp.stream().toArray(String[]::new);
strArray= Stream.of(strArray, strings ).flatMap(Stream::of).toArray(String[]::new);
}
pw.println("date: " + Arrays.toString(strArray));
System.out.println("this file goes to path:" + "/root/Results/Data/" + dump + ".txt");
pw.flush();
}
catch (IOException e)
{ e.printStackTrace(); } }
}
private static List<String> evaluateXPath(Document document, String xpathExpression) throws Exception
{
// Create XPathFactory object
XPathFactory xpathFactory = XPathFactory.newInstance();
// Create XPath object
XPath xpath = xpathFactory.newXPath();
List<String> values = new ArrayList<>();
try
{
// Create XPathExpression object
XPathExpression expr = xpath.compile(xpathExpression);
// Evaluate expression result on XML document
NodeList nodes = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); i++) {
values.add(nodes.item(i).getNodeValue());
}
} catch (XPathExpressionException e) {
e.printStackTrace();
}
return values;
}
private static Document getDocument(String fileName) throws Exception
{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(fileName);
return doc;
}
}

Not able to parse inner elements of XML using DocumentBuilderFactory in Java

I'm having a response as XML. I'm trying to parse the XML object to get inner details. Im using DocumentBuilderFactory for this. The parent object is not null, but when I try to get the deepnode list elements, its returning null. Am I missing anything
Here is my response XML
ResponseXML
<DATAPACKET REQUEST-ID = "1">
<HEADER>
</HEADER>
<BODY>
<CONSUMER_PROFILE2>
<CONSUMER_DETAILS2>
<NAME>David</NAME>
<DATE_OF_BIRTH>1949-01-01T00:00:00+03:00</DATE_OF_BIRTH>
<GENDER>001</GENDER>
</CONSUMER_DETAILS2>
</CONSUMER_PROFILE2></BODY></DATAPACKET>
and Im parsing in the following way
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(responseXML));
// Consumer details.
if(doc.getDocumentElement().getElementsByTagName("CONSUMER_DETAILS2") != null) {
Node consumerDetailsNode = doc.getDocumentElement().getElementsByTagName("CONSUMER_DETAILS2").item(0); -->This is coming as null
dateOfBirth = getNamedItem(consumerDetailsNode, "DATE_OF_BIRTH");
System.out.println("DOB:"+dateOfBirth);
}
getNamedItem
private static String getNamedItem(Node searchResultNode, String param) {
return searchResultNode.getAttributes().getNamedItem(param) != null ? searchResultNode.getAttributes().getNamedItem(param).getNodeValue() : "";
}
Any ideas would be greatly appreciated.
The easiest way to search for individual elements within an XML document is with XPAth. It provides search syntax similar to file system notation.
Here is a solution to the specific problem of you document:
EDIT: solution adopted to support multiple CONSUMER_PROFILE2 elements. You just need to get and parse NodeList instread of one Node
import java.io.*;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
import org.xml.sax.*;
public class XpathDemo
{
public static void main(String[] args)
{
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document xmlDoc = builder.parse(new InputSource(new FileReader("C://Temp/xx.xml")));
// Selects all CONSUMER_PROFILE2 elements no matter where they are in the document
String cp2_nodes = "//CONSUMER_PROFILE2";
// Selects first DATE_OF_BIRTH element somewhere under current element
String dob_nodes = "//DATE_OF_BIRTH[1]";
// Selects text child node of current element
String text_node = "/child::text()";
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList dob_list = (NodeList)xPath.compile(cp2_nodes + dob_nodes + text_node)
.evaluate(xmlDoc, XPathConstants.NODESET);
for (int i = 0; i < dob_list.getLength() ; i++) {
Node dob_node = dob_list.item(i);
String dob_text = dob_node.getNodeValue();
System.out.println(dob_text);
}
} catch (Exception e) {
e.printStackTrace();
}
}
}

Java Dom parser reports wrong number of child nodes

I have the following xml file:
<?xml version="1.0" encoding="UTF-8"?>
<users>
<user id="0" firstname="John"/>
</users>
Then I'm trying to parse it with java, but getchildnodes reports wrong number of child nodes.
Java code:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(this.file);
document.getDocumentElement().normalize();
Element root = document.getDocumentElement();
NodeList nodes = root.getChildNodes();
System.out.println(nodes.getLength());
Result: 3
Also I'm getting NPEs for accessing the nodes attributes, so I'm guessing something's going horribly wrong.
The child nodes consist of elements and text nodes for whitespace. You will want to check the node type before processing the attributes. You may also want to consider using the javax.xml.xpath APIs available in the JDK/JRE starting with Java SE 5.
Example 1
This example demonstrates how to issue an XPath statement against a DOM.
package forum11649396;
import java.io.StringReader;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
import org.xml.sax.InputSource;
public class Demo {
public static void main(String[] args) throws Exception {
String xml = "<?xml version='1.0' encoding='UTF-8'?><users><user id='0' firstname='John'/></users>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(new InputSource(new StringReader(xml)));
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
Element userElement = (Element) xpath.evaluate("/users/user", document, XPathConstants.NODE);
System.out.println(userElement.getAttribute("id"));
System.out.println(userElement.getAttribute("firstname"));
}
}
Example 2
The following example demonstrates how to issue an XPath statement against an InputSource to get a DOM node. This saves you from having to parse the XML into a DOM yourself.
package forum11649396;
import java.io.StringReader;
import javax.xml.xpath.*;
import org.w3c.dom.*;
import org.xml.sax.InputSource;
public class Demo {
public static void main(String[] args) throws Exception {
String xml = "<?xml version='1.0' encoding='UTF-8'?><users><user id='0' firstname='John'/></users>";
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
InputSource inputSource = new InputSource(new StringReader(xml));
Element userElement = (Element) xpath.evaluate("/users/user", inputSource, XPathConstants.NODE);
System.out.println(userElement.getAttribute("id"));
System.out.println(userElement.getAttribute("firstname"));
}
}
There are three child nodes:
a text node containing a line break
an element node (tagged user)
a text node containing a line break
So when processing the child nodes, check for element nodes.
You have to make sure you account for the '\n' between the nodes, which count for text nodes. You can test for that using if(root.getNodeType() == Node.ELEMENT_NODE)
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(this.file);
document.getDocumentElement().normalize();
for(Node root = document.getFirstChild(); root != null; root = root.getNextSibling()) {
if(root.getNodeType() == Node.ELEMENT_NODE) {
NodeList nodes = root.getChildNodes();
System.out.println(root.getNodeName() + " has "+nodes.getLength()+" children");
for(int i=0; i<nodes.getLength(); i++) {
Node n = nodes.item(i);
System.out.println("\t"+n.getNodeName());
}
}
}
I didn't notice any of the answers addressing your last note about NPEs when trying to access attributes.
Also I'm getting NPEs for accessing the nodes attributes, so I'm guessing something's going horribly wrong.
Since I've seen the following suggestion on a few sites, I assume it's a common way to access attributes:
String myPropValue = node.getAttributes().getNamedItem("myProp").getNodeValue();
which works fine if the nodes always contain a myProp attribute, but if it has no attributes, getAttributes will return null. Also, if there are attributes, but no myProp attribute, getNamedItem will return null.
I'm currently using
public static String getStrAttr(Node node, String key) {
if (node.hasAttributes()) {
Node item = node.getAttributes().getNamedItem(key);
if (item != null) {
return item.getNodeValue();
}
}
return null;
}
public static int getIntAttr(Node node, String key) {
if (node.hasAttributes()) {
Node item = node.getAttributes().getNamedItem(key);
if (item != null) {
return Integer.parseInt(item.getNodeValue());
}
}
return -1;
}
in a utility class, but your mileage may vary.

Why am I getting a DefferedDocumentImpl when trying to create a JDOM Document?

Im trying to parse the input stream from a zipfile zip entry and trying to create a org.w3c.dom.Document but for some reason im getting a DefferedDocumentImpl back. Im also creating a new org.w3c.dom.Document and this is returning a DocumentImpl. Then using Xpath to select a single node but im getting this error "org.apache.xerces.dom.DocumentImpl incompatible with org.jdom.Element" when Im trying to find my specific node. Ive done some searching but cant seem to find and examples. Anyone know why im not getting my docs created as dom docs? Thanks in advance for the help.
//create a zip file from the crate location
File downloadFile = crate.getLocation();
ZipFile zipFile = new ZipFile(downloadFile);
//put all the contents of the zip file into an enumeration
Enumeration entries = zipFile.entries();
while (entries.hasMoreElements()){
ZipEntry entry = (ZipEntry) entries.nextElement();
String currentEntry = entry.getName();
if (currentEntry.equals("ATTACH 8130-3 XML/signature.xml")){
InputStream zipStream = zipFile.getInputStream(entry);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
org.w3c.dom.Document doc = (org.w3c.dom.Document)dBuilder.parse(zipStream);
doc.getDocumentElement().normalize();
NodeList certNode = doc.getElementsByTagName("ATA_PartCertificationForm");
int testInt = certNode.getLength();
org.w3c.dom.Document doc2 = (org.w3c.dom.Document) dBuilder.newDocument();
Node parentNode = doc.getParentNode();
Element rootElement = doc2.createElement("CurrentCertificate");
doc2.appendChild(rootElement);
for(int i=0; i<certNode.getLength(); i++){
Node childNode = certNode.item(i);
Element childElement;
childElement = (Element)certNode.item(i);
rootElement.appendChild(doc2.importNode(childNode, true));
String nameString = childNode.getNodeName();
Element block13Element = (Element) XPath.selectSingleNode(doc2, "//Block13M");
System.out.println("tester test");
}
System.out.println("Test break");
}
}
You're passing an org.w3c.dom.Document to jdom.xpath.XPath.selectSingleNode(), but that method expects an org.jdom.Document or an org.jdom.Element.
Here's one way to parse your XML as a JDOM document and execute an XPath query using Jaxen, which must also be in the classpath.
import org.jdom.Document;
import org.jdom.Element;
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
import org.jdom.xpath.XPath;
import java.io.IOException;
import java.io.InputStream;
import java.util.List;
public class JdomXpathSandbox {
public static void main(String[] args) throws Exception {
InputStream is = ...;
Document document = new SAXBuilder().build(is);
Element rootElement = document.getRootElement();
String xpathExpression = ...
List matchingNodes = XPath.selectNodes(rootElement, xpathExpression);
}
}

Get the name of all attributes in a XML-File

Hey I have an XML file and I would like to navigate to a given node and grab the name of all Attributes to that node.
For example: (XML File)
<RootName>
<SubNode>
<Attribute1>Value 1</Attribute1>
<Attribute2>Value 2</Attribute2>
</SubNode>
</RootName>
Here is my code so far: (Java Code)
File file = new File("data.xml");
try
{
/* Parse File */
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(file);
/* Find Starting Tag */
NodeList nodes = doc.getElementsByTagName(StartTag);
for (int i = 0; i < nodes.getLength(); i++)
{
Element element = (Element) nodes.item(i);
System.out.println(element);
}
Now I know you can find a specific attribute given the name
String name = element.getAttribute("Attribute1");
But I would like to find all these names dynamically.
Thanks in advance
-Scott
What you are looking for are the Elements. Here is a sample on how to get the Elements in an XML:
import java.io.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
public class DOMElements{
static public void main(String[] arg){
try {
BufferedReader bf = new BufferedReader(new InputStreamReader(System.in));
System.out.print("Enter XML File name: ");
String xmlFile = bf.readLine();
File file = new File(xmlFile);
if(file.exists()){
// Create a factory
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// Use the factory to create a builder
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(xmlFile);
// Get a list of all elements in the document
NodeList list = doc.getElementsByTagName("*");
System.out.println("XML Elements: ");
for (int i=0; i<list.getLength(); i++) {
// Get element
Element element = (Element)list.item(i);
System.out.println(element.getNodeName());
}
}
else{
System.out.print("File not found!");
}
}
catch (Exception e) {
System.exit(1);
}
}
}
Also see my comment below your question on how to properly design XML and when to use elements, and when to use attributes.
element.getAttributes(); gets you a org.w3c.dom.NamedNodeMap. You can loop through this using the item(int index) method to get org.w3c.dom.Attr nodes, and get the names of those from the getName() method.

Categories