How to remove duplicate XML declaration - java

I am receiving following XML response via Jersey client
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><aaa><bbb key="Data"><?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<my-data xsi:noNamespaceSchemaLocation="MyData.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<data name="abc" uniqueId="4fe95637-a381-4e0c-bf7f-49f794df5f23">
<variable var1="xyz" value="44"/>
</data>
</my-data>
</bbb></aaa>
I am saving this as an XML file and getting 'premature end of file' error during parsing, since the XML is malformed (duplicate XML declarations)...is there a way to remove following duplicate entry from the output?
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Following is my Java code snippet:
String output = response.getEntity(String.class);
file = writeResponseToFile(output,"MyData.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(file); //Error

Ideally, you should fix the problem at the source. What you're receiving is not XML because having more than one XML declaration violates XML's basic grammar, making the data not well-formed.
If it is impossible to properly fix the problem at the source, and you wish to attempt repair, you have to treat that data as text, not XML, until you remove the extra XML declaration (via text-level operations, not XML parsing).

Fix the xml that you are receiving. You receive two declarations in the xml.
The xml is malformed. Remember in Jersey, you can receive files on JSON, xml, html, etc, via annotations, with #Produces.
And remember that you have xml validators on internet, to valid your xml.
Regards.

Related

How can I make DOMParser parse rootless XML without throwing exceptions? [duplicate]

I am trying to read this XML file using PHP and I have two root elements. The code that I wrote in PHP reads only one root element and when I add the other one (<action>) it gives me an error.
I want to do something like this : if($xml->action=="register") then print all parameters.
This is my XML file:
<?xml version='1.0' encoding='ISO-8859-1'?>
<action>register</action>
<paramters>
<name>Johnny B</name>
<username>John</username>
</paramters>
And this is my PHP script:
<?php
$xml = simplexml_load_file("test.xml");
echo $xml->getName() . "<br />";
foreach($xml->children() as $child)
{
echo $child->getName() . ": " . $child . "<br />";
}
?>
I really don't know how to do all this...
Fix your XML, it's invalid. XML files can only have 1 root element.
Example valid XML:
<?xml version='1.0' encoding='ISO-8859-1'?>
<action>
<type>register</type>
<name>Johnny B</name>
<username>John</username>
</actions>
Or if you want only parameters to have own elements:
<?xml version='1.0' encoding='ISO-8859-1'?>
<action type="register">
<name>Johnny B</name>
<username>John</username>
</actions>
or if you want multiple actions:
<?xml version='1.0' encoding='ISO-8859-1'?>
<actions>
<action type="register">
<name>Johnny B</name>
<username>John</username>
</action>
</actions>
EDIT:
As I've said in my comment, your teacher should fix his XML. It is invalid. Also he should put his XML through a validator.
If you're really desperate you can introduce an articificial root element, but this is really bad practice and should be avoided at all costs:
$xmlstring = str_replace(
array('<action>','</paramters>'),
array('<root><action>', '</paramters></root>'),
$xmlstring
);
None of the previous answers is quite accurate. The XML specification defines several kinds of entity: document entities, external parsed entities, document type definitions for example. Your example is not a well-formed document entity, which is what XML parsers are normally asked to parse. However, it is a well-formed external parsed entity. The way to process a well-formed external parsed entity is to reference it from a skeletal document entity, like this:
<!DOCTYPE wrapper [
<!ENTITY e SYSTEM "my.xml">
]>
<wrapper>&e;</wrapper>
and then pass the document entity to the XML parser.
As it is an invalid xml file, you can do the following trick.
Insert a dummy start tag at the second line as <dummy>
In the end finish it with </dummy>
Happy parsing ;)

How to remove encoding="UTF-8" standalone="no" from xml Document object in Java

I want to create XML in Java.
DocumentBuilderFactory dbfac = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder;
docBuilder = dbfac.newDocumentBuilder();
Document doc = docBuilder.newDocument();
but Java automatically creates declaration like this
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
How can I remove encoding="UTF-8" standalone="no" so it will be
<?xml version="1.0"?>
Thanks!
Why do you need to remove an encoding? But..
doc.setXmlStandalone(true);
will erase standalone="no"
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
This would resolve your issue, verified at JDK 6
I think there is no legal way to exclude theese attributes from generation.
But after it's generated you can use XSLT to remove this.
I think this is a good way.

Java DOM, namespace / version problem

Im in the process of creating XML as a Node for a RMI program I am developing but I have run across a problem. I can create the XML using DOM but I am struggling to add namespace and version to the top of my XML. I have tried using setAttribute and setAttributeNS but at the moment lost in what else I can do.
The java code to create the element is:
DocumentBuilderFactory dbfac = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = dbfac.newDocumentBuilder();
Document doc = docBuilder.newDocument();
Node root = doc.createElement("Request");
doc.appendChild(root);
//code ommited
The result I get currently is:
<Request>
<Identification>
<UserID>user</UserID>
<Password>pass</Password>
</Identification>
</Request>
In the request section I need it to look like:
<Request xsi:noNamespaceSchemaLocation="URL" Version="1.0">
Any help will be appreciated to help solve this issue!
Thanks
I think you'd want something like:
...
Element root = doc.createElement("Request");
root.setAttributeNS("http://www.w3.org/2001/XMLSchema-instance", "xsi:noNamespaceSchemaLocation", "URL");
root.setAttribute("Version", "1.0");
doc.appendChild(root);
...
Defining root as an Element gives you the .setAttribute* methods.
This would give you
<Request Version="1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="URL"/>
I know that includes a bit more, but the xmlns:xsi attribute is needed so that the xsi namespace is defined.

Modify existing XML stylesheet processing instruction in Java

I'm reading an existing XML file and outputting it (using DOM).
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="test"?>
<Books>
<Book name="MyBook" />
</Books>
But how do I modify the XML stylesheet? -> href here set "test".
Something like this should work (untested)
Element root = doc.getDocumentElement();
XPath xpath = XPathFactory.newInstance().newXPath();
String expression = "/processing-instruction('xml-stylesheet')";
ProcessingInstruction pi;
pi = (ProcessingInstruction)xpath.evaluate(expression, doc, XPathConstants.NODE);
pi.setData("type='text/xsl' href='foo.xsl'");
Thats a bit tricky, but why not read the file first into a String and do a replace before sending it via a stream into the dom parser.

Android java XML junk after document element

I'm using SAX to read/parse XML documents and I have it working fine except for this particular site where eclipse tells me "junk after document element" and I get no data returned
http://www.zachblume.com/apis/rhyme.php?format=xml&word=example
The site is not mine..just trying to get some data from it.
Yes, that's not an XML document. It's trying to include more than one root element:
<?xml version="1.0"?>
<word>ampal</word>
<word>ample</word>
<word>hampel</word>
<word>hample</word>
<word>lampl</word>
<word>pampel</word>
<word>sample</word>
The parser regards everything after <word>ampal</word> as by that time it's read a complete document... hence the complain about "junk after document element".
An XML document can only have one root, but several children within the root. For example:
<?xml version="1.0"?>
<words>
<word>ampal</word>
<word>ample</word>
<word>hampel</word>
<word>hample</word>
<word>lampl</word>
<word>pampel</word>
<word>sample</word>
</words>
The page does not contain XML. It contains an XML snippet at best:
<?xml version="1.0"?>
<word>ampal</word>
<word>ample</word>
<word>hampel</word>
<word>hample</word>
<word>lampl</word>
<word>pampel</word>
<word>sample</word>
This is incorrect since there is no document element. SAX interprets the first <word> as the document element, and correctly reports "junk after document element" since for all it knows, the document element ends on line 1.
To get around the error, do not treat this document as XML. Download it as text, remove the XML declaration (<?xml version="1.0"?>) and then wrap it in a fake document element before you try to process it.

Categories