Partial XML parsing giving an element not bound exception - java

I'm trying to parse an XML document with DSpace XOAI library. This is the input XML:
<?xml version="1.0" encoding="UTF-8"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
...
<ListRecords>
<record>
<header>
...
</header>
<metadata>
<crossref xmlns="http://www.crossref.org/xschema/1.1" xsi:schemaLocation="http://www.crossref.org/xschema/1.1 http://www.crossref.org/schema/unixref1.1.xsd">
...
</crossref>
</metadata>
</record>
</ListRecords>
</OAI-PMH>
From what I could deduce from debugging, each metadata node is parsed individually by XOAI library. And, in that context, I get this error (which makes sense, because the xsi namespace is defined in parent OAI-PMH node):
ERROR: 'The prefix "xsi" for attribute "xsi:schemaLocation" associated with an element type "crossref" is not bound.'
For what I could understand from the library source code, it uses the Oracle java javax.xml.transform.Transformer to make the transformations. We can set any Transformer to be executed.
I'm already using an XSLT file for transforming the input XML in the format expected by the library. However, I couldn't find a way to create a rule in XSLT to ignore the xsi:schemaLocation that is causing the error.
The other option is create a new Transformer in Java. I was looking at Transformer.setOutputProperty, but I couldn't make a working configuration that ignores this error in crossref node.
Do you guys know how can I correctly parse the contents of crossref node in that local context?
Thanks in advance!

Related

How can i modify xml-stylesheet attribute value in java

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet href="Sample.xsl" type="text/xsl"?>
<MyDoc>.....</MyDoc>
I want to modify the attribute href's value to 'MyDoc.xsl'. I have tried using XPath but it returns nothing:
//xml-stylesheet[contains(text(), 'Sample.xsl')]/#href";
Also using Document only gives elements starting at MyDoc
NodeList list = taggedC32Doc.getElementsByTagName("*");
Is there any way i can do this?
The line you want to change is a Processing Instruction, not an Element, so neither of your attempts to find it as an element will work. Try
/processing-instruction(xml-stylesheet)
You can then get that node's data, which will be href="Sample.xsl" type="text/xsl". Perform the appropriate string manipulation to find and change the href pseudo-attribute in that string -- sorry, most XML APIs don't provide any assistance in doing so, because as far as XML is concerned the PI's data is an unformatted string even though it's usually structured to resemble attributes -- and set the new data back into the ProcessingInstruction node.

How to force xmlspy code to write out qualified namespaces on elements?

I have used XmlSpy 2013 to generate program code in Java from a schema. My application basically reads in xml from a file, modifies the xml, and writes it back out to the file. The generated code provides classes and functions to do the load:
sampleSchema2 doc = sampleSchema2.loadFromFile(filePath);
// Load the file into Java objects...
and to write the file back out:
sampleSchema2 sampleDoc = sampleSchema2.createDocument();
// Populate the doc from the modified Java objects...
sampleDoc.saveToFile(path, true);
The schema I used to generate the code has the following attributes:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:bfrs="http://www.example.com/schema/bfrs" xmlns:cnc="http://www.example.com/schema/cnc" targetNamespace="http://www.example.com/schema/cnc" elementFormDefault="qualified" attributeFormDefault="unqualified" version="2006/05/30" xml:lang="en">
The xml files I read in use qualified namespaces for the elements like so:
<?xml version="1.0" encoding="UTF-8"?>
<cnc:cnc versionNumber="v.2.2.1" versionDate="2012-04-03" xsi:schemaLocation="http://www.example.com/schema/cnc exampleSchema.xsd" xmlns:cnc="http://www.example.com/schema/cnc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<cnc:Revisions>
<cnc:Revision>S003</cnc:Revision>
</cnc:Revisions>
...
But after I write the file out again using saveToFile() as above, all of the qualified namespaces are removed from the elements like so:
<?xml version="1.0" encoding="UTF-8"?>
<cnc versionNumber="v.2.2.1" versionDate="2012-04-03" xmlns:cnc="http://www.example.com/schema/cnc">
<Revisions>
<Revision>S003</Revision>
</Revisions>
...
Does anyone know how I can get xmlspy to qualify the namespaces on the documents so they look like how I read them in? Thank you for any help.
As it turns out, this is currently impossible using Altova generated code per my response from Altova technical support:
Thanks for contacting us.
I'm afraid that it's currently not presently possible to control the
namespace prefix in the generated code.
I'll forward your message on to our development team for future
consideration.
Best regards,
Mxxxxxxxx Kxxxxx
Support Engineer
Altova GmbH

Java getNodeName and namespaces

Given 2 XML files that conform to the same schema (both valid), one has namespaces, the other one has not (sample):
XML File 1
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<message xmlns="http://www.somewhere.com/X">
<messageheader>
...
</message>
XML File 2
<?xml version="1.0" encoding="UTF-8"?>
<ns1:message xmlns:ns1="http://www.somewhere.com/X">
<ns1:messageheader>
...
</ns1:message>
The issue is that the code that parses the file uses the Element.getNodeName() getter to determine node names before extracting and storing the text content. This method was used as opposed to XPath when parsing the XML due to performance.
Therefore the following sample code was implemented to do the parsing:
for(int i = 0; i < someElement.getChildNodes().getLength(); i++) {
if(someElement.getChildNodes().item(i).getNodeType() == org.w3c.dom.Node.ELEMENT_NODE) {
Element element = (Element) someElement.getChildNodes().item(i);
if(element.getNodeName().equals("ns1:messageheader")) {
...
}
...
}
}
The above code only works with XML File 2.
Is it possible to determine whether a file uses the namespace prefix on elements so both files can be parsed using getNodeName() - so I can use the same code to parse both files?
I agree this is an awful way to parse the XML. Unfortunately, my code was implemented before switching to JAXB which I love (for the moment).
Thanks
Andez
Use getLocalName() instead of getNodeName(). This will return the unqualified name of the element.
Try this:
if (element.getLocalName().equals("messageheader") &&
"http://www.somewhere.com/X".equals(element.getNamespaceURI())) { ...
You have to check that the local name and the namespace match, regardless of the prefix.

How remove the <?xml ...?> processing instruction

I need to remove the processing instruction from a DOM. I load several files, merge them and save. But the problem is, that the result looks like this:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<frag>
Jó foxim és don Quijote húszwattos lámpánál ülve egy pár bűvös cipőt készít.
</frag>
<?xml version="1.0" encoding="iso-8859-2" standalone="no"?>
<frag>
Jó foxim és don Quijote húszwattos lámpánál ülve egy pár bűvös cipőt készít.
</frag>
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<frag>
Jó foxim és don Quijote húszwattos lámpánál ülve egy pár bűvös cipőt készít.
</frag>
I haven't found the way the <?xml ...?> process instruction can be either removed from DOM or ignored when saving the resulting DOM. I'm using Java 6 and the default parser.
There is no such method for removing the process instruction.
Your merge process is broken. I'll bet you're reading the fragment files and simply concatenating the strings together to create this example.
The right way to do it is to parse each fragment and add the Elements to want into the final DOM, which is then output.
Even if you remove the processing instruction, what you've posted is invalid XML. There's no root tag that I can see, and you must have one and only one.
You can remove processing instructions by using the SAX API - a XMLStreamReader for example. You can create a FilteredReader using the XMLInputFactory and a StreamFilter.
There is a constant XMLStreamConstants.PROCESSING_INSTRUCTION that can help your filter recognize the processing instructions and hold them back.
Similar is definitely possible with StAX too.
Regardless of the technical feasibility, the merge really looks broken as suggested by duffymo.

how to reference XSD Schema location while parsing XML Doc via SAX Xerces?

how to reference XSD Schema location while parsing XML via SAX Xerces?
< ?xml version="1.0" encoding="ISO-8859-1"?> < com.firma
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>
< !-- xsi:noNamespaceSchemaLocation="F:\...\myschema_v2.5.xsd"
Must I reference this element really within the XML Doc??? I hope, not...
-- >
I also set it as follows in Java code, which is not elegant, while schema location is fixed(not appropriate for production)
SaxParser.setProperty(
"http://java.sun.com/xml/jaxp/properties/schemaSource",
"F:...\myschema_v2.5.xsd"
);
include the schema in your jar and load it using getResourceAsStream in the following way
reader.setProperty("http://java.sun.com/xml/jaxp/properties/schemaSource",
new InputSource(getClass().getResourceAsStream(xsdLocation)));
I got it.
one must use as follows, giving "/com/firma/project/.../myschema_v2.5.xsd" as parameter.
not forgetting the "/" in the path at the very beginning.

Categories