How remove the <?xml ...?> processing instruction - java

I need to remove the processing instruction from a DOM. I load several files, merge them and save. But the problem is, that the result looks like this:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<frag>
Jó foxim és don Quijote húszwattos lámpánál ülve egy pár bűvös cipőt készít.
</frag>
<?xml version="1.0" encoding="iso-8859-2" standalone="no"?>
<frag>
Jó foxim és don Quijote húszwattos lámpánál ülve egy pár bűvös cipőt készít.
</frag>
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<frag>
Jó foxim és don Quijote húszwattos lámpánál ülve egy pár bűvös cipőt készít.
</frag>
I haven't found the way the <?xml ...?> process instruction can be either removed from DOM or ignored when saving the resulting DOM. I'm using Java 6 and the default parser.

There is no such method for removing the process instruction.
Your merge process is broken. I'll bet you're reading the fragment files and simply concatenating the strings together to create this example.
The right way to do it is to parse each fragment and add the Elements to want into the final DOM, which is then output.
Even if you remove the processing instruction, what you've posted is invalid XML. There's no root tag that I can see, and you must have one and only one.

You can remove processing instructions by using the SAX API - a XMLStreamReader for example. You can create a FilteredReader using the XMLInputFactory and a StreamFilter.
There is a constant XMLStreamConstants.PROCESSING_INSTRUCTION that can help your filter recognize the processing instructions and hold them back.
Similar is definitely possible with StAX too.
Regardless of the technical feasibility, the merge really looks broken as suggested by duffymo.

Related

Partial XML parsing giving an element not bound exception

I'm trying to parse an XML document with DSpace XOAI library. This is the input XML:
<?xml version="1.0" encoding="UTF-8"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
...
<ListRecords>
<record>
<header>
...
</header>
<metadata>
<crossref xmlns="http://www.crossref.org/xschema/1.1" xsi:schemaLocation="http://www.crossref.org/xschema/1.1 http://www.crossref.org/schema/unixref1.1.xsd">
...
</crossref>
</metadata>
</record>
</ListRecords>
</OAI-PMH>
From what I could deduce from debugging, each metadata node is parsed individually by XOAI library. And, in that context, I get this error (which makes sense, because the xsi namespace is defined in parent OAI-PMH node):
ERROR: 'The prefix "xsi" for attribute "xsi:schemaLocation" associated with an element type "crossref" is not bound.'
For what I could understand from the library source code, it uses the Oracle java javax.xml.transform.Transformer to make the transformations. We can set any Transformer to be executed.
I'm already using an XSLT file for transforming the input XML in the format expected by the library. However, I couldn't find a way to create a rule in XSLT to ignore the xsi:schemaLocation that is causing the error.
The other option is create a new Transformer in Java. I was looking at Transformer.setOutputProperty, but I couldn't make a working configuration that ignores this error in crossref node.
Do you guys know how can I correctly parse the contents of crossref node in that local context?
Thanks in advance!

Write stylesheet tag with XML API (STaX/DOM/..)

i'm having some trouble to write a particular xml tag (using an XmlStreamWriter).
Basically, we have an XMLWriter that is based on "javax.xml.stream.XMLStreamWriter" (STaX) which is working fine.
All the xml files that are written begin automatically with the tag :
< ?xml version="1.0" encoding="ISO-8859-1"?> (first space is added to display the xml line)
What we need now is to add a new line (stylesheet) to write every single xml file with the beginning lines :
< ?xml version="1.0" encoding="ISO-8859-1"?> (same as above)
< ?xml-stylesheet type="text/xsl" href="myXsl.xsl"?> (same as above)
I tried to do it the hard-coded way, using the XmlStreamWriter.writeCharacters(String) but the problem is that "<" and ">" are special characters so the output in the xml file is "<"/">".
Also, this is not very clean coding..
In the same way that STaX writes the first line using "XMLStreamWriter.writeStartDocument(String encoding, String version)", does anyone know an XML (XSL/XSLT?) API which WRITER does write the tag :
< ?xml-stylesheet type="text/xsl" href="myXsl.xsl"?> (same as above)
Any help would be much appreciated :)
It is called a processing instruction.
See XMLStreamWriter.writeProcessingInstruction, for instance.
In your case:
writer.writeProcessingInstruction("xml-stylesheet",
"type=\"text/xsl\" href=\"myXsl.xsl\"");
(Not tested.)

How to force xmlspy code to write out qualified namespaces on elements?

I have used XmlSpy 2013 to generate program code in Java from a schema. My application basically reads in xml from a file, modifies the xml, and writes it back out to the file. The generated code provides classes and functions to do the load:
sampleSchema2 doc = sampleSchema2.loadFromFile(filePath);
// Load the file into Java objects...
and to write the file back out:
sampleSchema2 sampleDoc = sampleSchema2.createDocument();
// Populate the doc from the modified Java objects...
sampleDoc.saveToFile(path, true);
The schema I used to generate the code has the following attributes:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:bfrs="http://www.example.com/schema/bfrs" xmlns:cnc="http://www.example.com/schema/cnc" targetNamespace="http://www.example.com/schema/cnc" elementFormDefault="qualified" attributeFormDefault="unqualified" version="2006/05/30" xml:lang="en">
The xml files I read in use qualified namespaces for the elements like so:
<?xml version="1.0" encoding="UTF-8"?>
<cnc:cnc versionNumber="v.2.2.1" versionDate="2012-04-03" xsi:schemaLocation="http://www.example.com/schema/cnc exampleSchema.xsd" xmlns:cnc="http://www.example.com/schema/cnc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<cnc:Revisions>
<cnc:Revision>S003</cnc:Revision>
</cnc:Revisions>
...
But after I write the file out again using saveToFile() as above, all of the qualified namespaces are removed from the elements like so:
<?xml version="1.0" encoding="UTF-8"?>
<cnc versionNumber="v.2.2.1" versionDate="2012-04-03" xmlns:cnc="http://www.example.com/schema/cnc">
<Revisions>
<Revision>S003</Revision>
</Revisions>
...
Does anyone know how I can get xmlspy to qualify the namespaces on the documents so they look like how I read them in? Thank you for any help.
As it turns out, this is currently impossible using Altova generated code per my response from Altova technical support:
Thanks for contacting us.
I'm afraid that it's currently not presently possible to control the
namespace prefix in the generated code.
I'll forward your message on to our development team for future
consideration.
Best regards,
Mxxxxxxxx Kxxxxx
Support Engineer
Altova GmbH

How to add XML declaration to xml with XMLBeans

I have this question about Java XMLBeans. I want to include the following declaration at the top of the XML file:
<?xml version="1.0" encoding="UTF-8"?>
Is there anyway to do this natively with XMLBeans? The above can always be concatenated as String to the xml content but that's ugly.
Thanks!
The <?xml version="1.0" encoding="UTF-8"?> construct is an XML declaration.
Per org.apache.xmlbeans.save(ContentHandler ch, LexicalHandler lh):
Writes the XML represented by this source to the given SAX content and
lexical handlers. Note that this method does not save the XML
declaration, including the encoding information. To save the XML
declaration with the XML, see save(OutputStream), save(OutputStream,
XmlOptions), save(File) or save(File, XmlOptions).

In XML what do you call this: //#Dohicky.0 and how to address it in Java

It's my first time parsing XML and I don't really know what I'm doing at the moment. Here's my XML:
<?xml version="1.0" encoding="UTF-8"?>
<MyDocument xmi:version="2.0">
<Thingamabob name="A" hasDohicky="//#Dohicky.0">
<Dingus/>
</Thingamabob>
<Dohicky name="B"/>
</MyDocument>
So what is "//#Dohicky.0" called? I understand the purpose, but I'm don't know how to deal with it when I'm parsing XML through Java JAXP. I guess I could parse the hasDohicky attribute's value and then lookfor the 0th occurrence of an element by that name... but I bet there's got to be a better way, right?
Thanks All!
In general it's an Attribute (like the "name" attributes in Dohicky and Thingamabob)
In this case hasDohicky looks a bit like a XQuery string, though I am not sure about the ".0" part see here for more info about XQuery.

Categories