XML Parsing Error "well-formed" - java

I have XML data in data base (not file)
i need to parse it to gave possibility write test to verify data in xml
xml (content data):
<brid:AccountData xmlns:brid="http://billing.a1telekom.at/BridgeIT/BridgeITDefinition">
<brid:AccountNumber>250000000032</brid:AccountNumber>
<brid:CustomerNumber>100104653</brid:CustomerNumber>
<brid:AccountType>NORM</brid:AccountType>
<brid:BillCycle>M2</brid:BillCycle>
<brid:LastInvoiceDate>0001-01-01T00:00:00.000</brid:LastInvoiceDate>
<brid:BillThroughDate>0001-01-01T00:00:00.000</brid:BillThroughDate>
<brid:StartDate>2016-02-26T15:27:13</brid:StartDate>
<brid:EndDate>9999-12-31T23:59:59.000</brid:EndDate>
<brid:AccountStatus>Active</brid:AccountStatus>
<brid:TaxCode>U2</brid:TaxCode>
<brid:CostCentre></brid:CostCentre>
</brid:AccountData>
<brid:PaymentData xmlns:brid="http://billing.a1telekom.at/BridgeIT/BridgeITDefinition">
<brid:PaymentMethod>Manual</brid:PaymentMethod>
</brid:PaymentData>
<brid:MediaData xmlns:brid="http://billing.a1telekom.at/BridgeIT/BridgeITDefinition">
<brid:AccountNumber>250000000032</brid:AccountNumber>
<brid:MediaType>PAPIER</brid:MediaType>
<brid:StartDate>2016-02-26T15:27:13</brid:StartDate>
<brid:EndDate>9999-12-31T23:59:59.000</brid:EndDate>
<brid:InvoiceName>ApuiafgjkLrjgdna Fouydf</brid:InvoiceName>
<brid:Language>DE</brid:Language>
<brid:LocationID>118298</brid:LocationID>
<brid:FirstName>Fouydf</brid:FirstName>
<brid:LastName>ApuiafgjkLrjgdna</brid:LastName>
<brid:TitleCode></brid:TitleCode>
<brid:TitleText></brid:TitleText>
<brid:Prefix></brid:Prefix>
<brid:Suffix>Cxvb</brid:Suffix>
<brid:Currency>EUR</brid:Currency>
<brid:Format>2</brid:Format>
<brid:Atomized>N</brid:Atomized>
</brid:MediaData>
How i try to parse it:
p
ackage simbaOnlineReplacement;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import ta.maxcare.help.DBSor;
public class XMLTest2 {
public static void main(String[] args) {
try {
String data = new DBSor().getContentData("106897066");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(data));
Document doc = dBuilder.parse(is);
doc.getDocumentElement().normalize();
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("MediaData");
System.out.println("----------------------------");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
System.out.println("\nCurrent Element :" + nNode.getNodeName());
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println("First Name : " + eElement.getElementsByTagName("FirstName").item(0).getTextContent());
System.out.println("LastName : " + eElement.getElementsByTagName("LastName").item(0).getTextContent());
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
and e given error:
[Fatal Error] :14:2: The markup in the document following the root
element must be well-formed. org.xml.sax.SAXParseException;
lineNumber: 14; columnNumber: 2; The markup in the document following
the root element must be well-formed. at
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
Source) at
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown
Source) at simbaOnlineReplacement.XMLTest2.main(XMLTest2.java:23)

In well formed XML, you must have one root element. brid:AccountData, brid:PaymentData and brid:MediaData are at the same level, and this is wrong. You should have a root element enclosing them all, as an example.

Your XML is a well-formed external general parsed entity, but it is not a well-formed document. The easiest way to handle it is therefore to create a simple wrapper file that references it as an external entity:
<!DOCTYPE wrapper [
<!ENTITY e SYSTEM "data.xml">
]>
<wrapper>&e;</wrapper>
and then parse the wrapper file.

Related

XML parsing by tag-names and attributes method in DOM tree structure

I am trying to parse this xml file but I'm only getting the root elements and not it's child nodes.
I need information of some specific values from the nodes like using .item() method. Since, I'm not entering it's child nodes so it didn't give me the specified values. Please help me solving this...
XML file
<Ws>
<Id V='862631039910699'>
<Dt V='08/07/22;11/25'>
<T V='24.3;24.3;24.3'/>
<H V='98.0;98.0;98.0'/>
<W V='1.3;272'/>
<G V='25;2.4'/>
<A V='0.00;468;472;471'/>
<D V='0.00;8.9;8.065;0.0000;0.0000'/>
</Dt>
</Id>
</Ws>
package api;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import java.io.File;
public class Web{
public static void main(String argv[])
{
try {
File file = new File("C:\\Users\\Prakhar\\OneDrive\\Desktop\\WBE2.xml.txt");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(file);
doc.getDocumentElement().normalize();
System.out.println("Root element: " + doc.getDocumentElement().getNodeName());
NodeList nodeList = doc.getElementsByTagName("Ws");
for (int i = 0; i < nodeList.getLength(); ++i) {
Node node = nodeList.item(i);
System.out.println("\nNode Name :" + node.getNodeName());
if (node.getNodeType()== Node.ELEMENT_NODE) {
Element tElement = (Element)node;
System.out.println("IMEI: " +
doc.getDocumentElement().getChildNodes().item(0).getFirstChild().getChildNodes().item(0).getAttributes().getNamedItem("V").getNodeValue());
System.out.println("Date/Time: " +
doc.getDocumentElement().getChildNodes().item(0).getFirstChild().getChildNodes().item(1).getAttributes().getNamedItem("V").getNodeValue());
System.out.println("Temperature: " +
doc.getDocumentElement().getChildNodes().item(0).getFirstChild().getChildNodes().item(2).getAttributes().getNamedItem("V").getNodeValue());
System.out.println("Humidity: " +
doc.getDocumentElement().getChildNodes().item(0).getFirstChild().getChildNodes().item(1).getAttributes().getNamedItem("V").getNodeValue());
System.out.println("Wind Speed: " +
doc.getDocumentElement().getChildNodes().item(0).getFirstChild().getChildNodes().item(2).getAttributes().getNamedItem("V").getNodeValue());
}
}
}
catch (Exception e) {
System.out.println(e);
}
}
}

java search specific attribut name in the xml file

I wouldlike to search in my xml file all attribut (name) without use element tag node :
xml :
<test 1><test1/>
<test2> <test2/>
<test 3 id="aaa"> </test3>
<test 5> </test5>
<test 6 id="bbb" name="ijof"> </test6>
JAVA :
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new File(path));
root = document.getDocumentElement();
String attribut = root.getAttribute("name");
System.out.println(attribut); // Expected ijof
Did you execute your code at least once? I dont't think so. Otherwise you would have surely noticed that your XML cannot be parsed.
There are several flaws in your example XML:
No root element.
Wrong end tags: It should be <test1></test1> and not <test1><test1/>.
Element names must not contain whitespace and start and end tag must match. It should be <test5> </test5> and not <test 5> </test5>
Apart of that you can use XPATH to get all elements with a name attribute.
Here is a complete example with the XML as a string but this should be irrelevant:
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import java.io.IOException;
import java.io.StringReader;
public class FindNameAttribute {
private static final String XML =
"<root>\n" +
" <test1></test1>\n" +
" <test2> </test2>\n" +
" <test3 id=\"aaa\"> </test3>\n" +
" <test4 name=\"4\"/>\n" +
" <test5> </test5>\n" +
" <test6 id=\"bbb\" name=\"ijof\"> </test6>\n" +
" <test7 id=\"bbb\"><child name=\"childname\"/> </test7>\n" +
"</root>\n";
public static void main(String[] args) {
System.out.println(XML);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
try {
builder = factory.newDocumentBuilder();
StringReader reader = new StringReader(XML);
InputSource source = new InputSource(reader);
Document document = builder.parse(source);
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate("//*[#name]", document, XPathConstants.NODESET);
for(int i = 0; i < nodes.getLength(); i++) {
Element el = (Element) nodes.item(i);
String elementName = el.getTagName();
String nameAttribute = el.getAttribute("name");
System.out.println(String.format("Element name: %s, name attribute: %s", elementName, nameAttribute));
}
} catch (ParserConfigurationException | SAXException | IOException | XPathExpressionException e) {
e.printStackTrace();
}
}
}
This is the output:
<root>
<test1></test1>
<test2> </test2>
<test3 id="aaa"> </test3>
<test4 name="4"/>
<test5> </test5>
<test6 id="bbb" name="ijof"> </test6>
<test7 id="bbb"><child name="childname"/> </test7>
</root>
Element name: test4, name attribute: 4
Element name: test6, name attribute: ijof
Element name: child, name attribute: childname
The relevant XPATH expression is: //*[#name]
//: Looks for every element in the document
*: Placeholder for element name. Each name matches.
*[#name]: The [] denotes the predicate. We only want elements with a name attribute.
#: Means the following name is the name of an attribute. Whithout it would be interpreted as an element name

Reading XML tags getting value from inner tag

I don't know how to explain my situation, I can provide example below.
I have an XML file to be read in Java, something like this:
<Author AffiliationIDS="Aff1">
<AuthorName DisplayOrder="Western">
<GivenName>Wei</GivenName>
<GivenName>Long</GivenName>
<FamilyName>
<Value>Tan</Value>
</FamilyName>
</AuthorName>
</Author>
As you can see the <FamilyName> tag, inside the <FamilyName> tag is surrounded by a Value tag. This is because in the xsd it stated the element as maxOccurs="unbounded" which mean more than one value can be in that element tag. How should I modify the code so that it can read in the <FamilyName> tag and get Value tag element no matter how many occurrence of the Value exist?
Example:
<FamilyName>
<Value>Sarah</Value>
<Value>Johnson</Value>
</FamilyName>
The code look like this.
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import java.io.File;
public class ReadXMLFile {
public static void main(String argv[]) {
try {
File fXmlFile = new File("/fileaddress/test-1.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("AuthorName");
System.out.println("----------------------------");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
System.out.println("\nCurrent Element :" + nNode.getNodeName());
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println("Given Name : " + eElement.getElementsByTagName("GivenName").item(0).getTextContent());
System.out.println("Family Name : " + eElement.getElementsByTagName("FamilyName").item(0).getTextContent());
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
Get the FamilyName node by getElementsByTagName("FamilyName").item(0) and loop over its child nodes (.getChildNodes()) and get the value of the textNode
Or,
You can even getElementsByTagName("Value") if you are sure that value tag does not occur anywhere else other than inside FamilyName
Here is a code Sample
NodeList children = doc.getElementsByTagName("FamilyName").item(0).getChildNodes();
for(int i=0;i<children.getLength();i++) {
if(children.item(i).getNodeType()== Node.ELEMENT_NODE) {
Element child = (Element)children.item(i);
System.out.println(child.getTextContent());
}
}

How to get only the direct childs of the first element in an XML document?

I am working on an XML example in order to understand DOM and XML better. I have a XML document with cars, of which I want to get the first cars-nodes.
I also want to do this generic, without giving a specific tag-name (find elements by tag "supercars" / "luxurycars" ...). More like "give me all the direct subnodes from cars" -> "supercars, supercars, luxurycars".
Therefore I've written the following code in order to understand the structure.
But the output confuses me:
Why is the Nodelist length 7? Is it "[cars], [supercars], [content of supercars], [supercars], [content of supercars]"? I cant manage to get the elements out and see for myself.
Why are there 4 empty "Current Elements:"?
Why is the first NodeName "#text" and not "sportcars", which comes AFTER that?
My XML document sportcars.xml.:
<?xml version="1.0"?>
<cars>
<supercars company="Ferrari">
<carname type="formula one">Ferarri 101</carname>
<carname type="sports car">Ferarri 201</carname>
<carname type="sports car">Ferarri 301</carname>
</supercars>
<supercars company="Lamborgini">
<carname>Lamborgini 001</carname>
<carname>Lamborgini 002</carname>
<carname>Lamborgini 003</carname>
</supercars>
<luxurycars company="Benteley">
<carname>Benteley 1</carname>
<carname>Benteley 2</carname>
<carname>Benteley 3</carname>
</luxurycars>
</cars>
My java file QueryXMLFileDemo.java:
package xml;
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class QueryXmlFileDemo {
public static void main(String[] args) {
try {
File inputFile = new File("sportcars.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(inputFile);
doc.getDocumentElement().normalize();
Node n = doc.getFirstChild();
NodeList nL = n.getChildNodes();
System.out.println("Nodelist length: " + nL.getLength());
for (int i = 0; i < nL.getLength(); i++) {
Node temp = nL.item(i);
System.out.println("Current Element: " + temp.getTextContent());
System.out.println("NodeName: " + temp.getNodeName());
System.out.println("Root Element: " + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("supercars");
}
} catch (Exception e) {
}
}
}
Output:
Nodelist length: 7
Current Element:
NodeName: #text
Current Element:
Ferarri 101
Ferarri 201
Ferarri 301
NodeName: supercars
Current Element:
NodeName: #text
Current Element:
Lamborgini 001
Lamborgini 002
Lamborgini 003
NodeName: supercars
Current Element:
NodeName: #text
Current Element:
Benteley 1
Benteley 2
Benteley 3
NodeName: luxurycars
Current Element:
NodeName: #text
So, how can I print only the nodes "supercars, supercars, luxurycars" and nothing else?
A better way of retrieving nodes is by using XPath or XQuery; inheritly easier to reason about
You get the "#text" in the output because in XML there are text nodes between the elements, even if these are just white space like line breaks or indentation. See the Node Javadoc on the different possible node types.
When you print a node's getTextContent it prints the node and its children, as per the Javadoc.
If you just want to ignore the #text nodes (or any other ones), you can check in your loop what node you're dealing with. In your case, it would be something like this:
if (Node.ELEMENT_NODE != temp.getNodeType()) {
continue;
}
I found the solution, but I also have to admit, that my question was too broad and confusing. Therefore I post my way of solving the problem and hope, this clears what I was asking before.
package xml;
import javax.xml.parsers.DocumentBuilder;
import java.io.File;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class QueryXmlFileDemo {
public static void main(String[] args) {
try {
File inputFile = new File("sportcars.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document inputDocument = dBuilder.parse(inputFile);
inputDocument.getDocumentElement().normalize();
Node carsNode = inputDocument.getFirstChild();
NodeList carsNodeList = carsNode.getChildNodes();
for (int i = 0; i < carsNodeList.getLength(); i++) {
Node carTypes = carsNodeList.item(i);
// hides the #text-entries
if (Node.ELEMENT_NODE != carTypes.getNodeType()) {
continue;
}
System.out.println("CarType: " + carTypes.getNodeName());
}
} catch (Exception e) {
}
}
}
Output:
CarType: supercars
CarType: supercars
CarType: luxurycars
So without knowing the attributes of my XML-document I can get the "first level" of the nodes - the first nodes within <cars>: <supercars>, <supercars> and <luxurycars>.

reading data using JAVA from XML files

I know there was a lot of answers about this question but all didn't work in my case. I would read data from European Central Bank from this link ECB. For example, how to read "rate" of USD where time="2015-02-27" and how to read "rate" of USD from all 90 days ?
One of the simplest ways to do it is to use a DOM (Document Object Model) parser. It will load your xml document in memory and turns it into a tree made of Nodes so that you can travers it being able to get the information of any node at any position. It is memory consumming and is generally less prefered than a SAX parser.
Here is an example:
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import java.io.File;
public class DomParsing {
public static final String ECB_DATAS ="C:\\xml\\eurofxref-hist-90d.xml";
public static void main(String argv[]) {
try {
File fXmlFile = new File(ECB_DATAS);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("Cube");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
System.out.println("\nCurrent Element :" + nNode.getNodeName());
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println("currency : " + eElement.getAttribute("currency") + " and rate is " + eElement.getAttribute("rate"));
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
Applied to your file produces the following result:
currency : BGN and rate is 1.9558
Current Element :Cube
currency : CZK and rate is 27.797
Current Element :Cube
currency : DKK and rate is 7.444

Categories