How can I make DOMParser parse rootless XML without throwing exceptions? [duplicate] - java

I am trying to read this XML file using PHP and I have two root elements. The code that I wrote in PHP reads only one root element and when I add the other one (<action>) it gives me an error.
I want to do something like this : if($xml->action=="register") then print all parameters.
This is my XML file:
<?xml version='1.0' encoding='ISO-8859-1'?>
<action>register</action>
<paramters>
<name>Johnny B</name>
<username>John</username>
</paramters>
And this is my PHP script:
<?php
$xml = simplexml_load_file("test.xml");
echo $xml->getName() . "<br />";
foreach($xml->children() as $child)
{
echo $child->getName() . ": " . $child . "<br />";
}
?>
I really don't know how to do all this...

Fix your XML, it's invalid. XML files can only have 1 root element.
Example valid XML:
<?xml version='1.0' encoding='ISO-8859-1'?>
<action>
<type>register</type>
<name>Johnny B</name>
<username>John</username>
</actions>
Or if you want only parameters to have own elements:
<?xml version='1.0' encoding='ISO-8859-1'?>
<action type="register">
<name>Johnny B</name>
<username>John</username>
</actions>
or if you want multiple actions:
<?xml version='1.0' encoding='ISO-8859-1'?>
<actions>
<action type="register">
<name>Johnny B</name>
<username>John</username>
</action>
</actions>
EDIT:
As I've said in my comment, your teacher should fix his XML. It is invalid. Also he should put his XML through a validator.
If you're really desperate you can introduce an articificial root element, but this is really bad practice and should be avoided at all costs:
$xmlstring = str_replace(
array('<action>','</paramters>'),
array('<root><action>', '</paramters></root>'),
$xmlstring
);

None of the previous answers is quite accurate. The XML specification defines several kinds of entity: document entities, external parsed entities, document type definitions for example. Your example is not a well-formed document entity, which is what XML parsers are normally asked to parse. However, it is a well-formed external parsed entity. The way to process a well-formed external parsed entity is to reference it from a skeletal document entity, like this:
<!DOCTYPE wrapper [
<!ENTITY e SYSTEM "my.xml">
]>
<wrapper>&e;</wrapper>
and then pass the document entity to the XML parser.

As it is an invalid xml file, you can do the following trick.
Insert a dummy start tag at the second line as <dummy>
In the end finish it with </dummy>
Happy parsing ;)

Related

How to remove duplicate XML declaration

I am receiving following XML response via Jersey client
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><aaa><bbb key="Data"><?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<my-data xsi:noNamespaceSchemaLocation="MyData.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<data name="abc" uniqueId="4fe95637-a381-4e0c-bf7f-49f794df5f23">
<variable var1="xyz" value="44"/>
</data>
</my-data>
</bbb></aaa>
I am saving this as an XML file and getting 'premature end of file' error during parsing, since the XML is malformed (duplicate XML declarations)...is there a way to remove following duplicate entry from the output?
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Following is my Java code snippet:
String output = response.getEntity(String.class);
file = writeResponseToFile(output,"MyData.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(file); //Error
Ideally, you should fix the problem at the source. What you're receiving is not XML because having more than one XML declaration violates XML's basic grammar, making the data not well-formed.
If it is impossible to properly fix the problem at the source, and you wish to attempt repair, you have to treat that data as text, not XML, until you remove the extra XML declaration (via text-level operations, not XML parsing).
Fix the xml that you are receiving. You receive two declarations in the xml.
The xml is malformed. Remember in Jersey, you can receive files on JSON, xml, html, etc, via annotations, with #Produces.
And remember that you have xml validators on internet, to valid your xml.
Regards.

XML parsing:Retrieve multiple rows in xml using digester

While parsing an xml file like the one below, i want to get the list of telephone numbers for one particular id.I am using Digester to do this.But i am not understanding how to add the call methods or createobjects .Can anyone help me with this.My xml file contains 1000's of
types
<?xml version='1.0' encoding='utf-8'?>
<address-book>
<contact type="individual">
<id>50</id>
<city>New York</city>
<province>NY</province>
<postalcode>10013</postalcode>
<country>USA</country>
<address>
<telephone>1-212-345-6789</telephone>
<telephone>1-212-345-6789</telephone>
<telephone>1-212-345-6789</telephone>
<telephone>1-212-345-6789</telephone>
</address>
</contact>
<contact type="business">
<id>52</id>
<city>Zagreb</city>
<province></province>
<postalcode>10000</postalcode>
<country>Croatia</country>
<address>
<telephone>1-212-345-6789</telephone>
<telephone>1-212-345-6789</telephone>
<telephone>1-212-345-6789</telephone>
<telephone>1-212-345-6789</telephone>
</address>
</contact>
Also how should i stop the parsing when i get the required Id.
Although the question was specific to using the apache-commons-digester, this can be solved by the host of libraries already available in the XML families of functions - namely a SAX parser coupled with an XPath search. Instead of brute-forcing through the data, if what is being searched is known, an XPath query can find the data relatively efficiently. Otherwise, if traversing the entire set of data for indexing or other purposes, again, recommend using a simple SAX parser and looping through the elements (again possibly via an //MyElement type XPath query) and then for each instance, pass the value to a function for indexing or whatever operation. The apache-commons-digester may be overly complicated and/or slow for what is needed.

How to get data with tag name & their values inside parent tag in xml

I am working on Java. I am parsing an xml file, I am getting tag values, it is working. I have xml file as follows:
<DOC>
<STUDENT>
<ID>1</ID>
<NAME>DAN</NAME>
<ADDRESS>U.K</ADDRESS>
</STUDENT>
<STUDENT>
<ID>2</ID>
<NAME>JACK</NAME>
<ADDRESS>U.S</ADDRESS>
</STUDENT>
</DOC>
I have question that I want to fetch data inside <DOC>....</DOC> with their tag name & value as well. Means I want data as follows:
"<STUDENT>
<ID>1</ID>
<NAME>DAN</NAME>
<ADDRESS>U.K</ADDRESS>
</STUDENT>
<STUDENT>
<ID>2</ID>
<NAME>JACK</NAME>
<ADDRESS>U.S</ADDRESS>
</STUDENT>"
Please guide me how to do it.
The most common approaches in Java are to use one of either SAX or Dom parsing libraries.
If you look them up you should find loads of documentation/tutorials about them.
Dom is the easiest to use normally as it stores the entire XML in memory and you cna then access any tag, however, this is less performant and can be problematic if you are using very large XML. SAX requires more work, but reads the XML and processes each tag as it gets to it.
Both are able to do what you need though.
Take a look at SAX Parser.
This link might be helpful too: http://www.mkyong.com/java/how-to-read-xml-file-in-java-sax-parser/

Update a single element in an xml document

Is it possible to parse and then modify a single element in an XML document?
I'm currently writing a script in ruby which needs to modify a value (specified by xpath) in an xml file. I'm currently using the REXML library to do this:
xmldocument = Document.new(File.new(filename))
property = XPath.first(xmldocument, "/parent/element/property")
property.text = "New property value"
puts xmldocument
Where the input xml is:
<?xml version="1.0" encoding="UTF-8"?>
<parent>
<element>
<property>Old property value</property>
<verbose />
</element>
...
(more elements here)
...
</parent>
And the output is:
<?xml version='1.0' encoding='UTF-8'?>
<parent>
<element>
<property>New property value</property>
<verbose/>
</element>
...
(more elements here)
...
</parent>
You should notice that the output xml is slightly reformatted and more than my desired change are made. For example the tag <verbose /> is changed to <verbose/> and double quotes are replaced with single quotes in the first line.
What is the best way to modify just a given element of an xml file and leave the rest of the file intact? Ideally, there is a solution for Ruby but I'd love to know the solution in other languages such as Java.
In Java, the Saxon library should accomplish everything you're looking for:
http://sourceforge.net/projects/saxon/

Parse XML ampersand in Java

I download an XML-file, I generate using PHP, that looks similar to this
<?xml version="1.0" encoding="utf-8" ?>
<customersXML>
...
<customer id="12" name="Me+%26+My+Brother" swid="1" />
...
</customersXML>
Now I need to parse it in Java, but before that I use URL-Decode, so the XML become this
<?xml version="1.0" encoding="utf-8" ?>
<customersXML>
...
<customer id="12" name="Me & My Brother" swid="1" />
...
</customersXML>
But when I parse the XML-file using SAX, I get a problem with "&". How can I get around this?
The ampersand is a special character in xml (O'reilly Xml: Entities: Handling Special Content) and needs to be encoded. Replace it with & before sending it.
If the XML in question isn't urlencoded in the first place (which it doesn't look like it is), then you shouldn't be urldecoding it. Breaking the xml and then "unbreaking" it really doesn't seem like the best way to go about it. Just use the original xml and parse that.
Never process XML as a string without parsing it, or you are liable to end up with something that is no longer XML. As you have discovered.
You should FIRST parse, THEN url decode.

Categories