XML Validation: Am I Doing It Right?

XML Validation: Am I Doing It Right? - java

I was just wondering if someone could give my XML validation code a once over to see if I'm doing it right. Here's the portion of code that is giving me the trouble...
SAXParserFactory factory = SAXParserFactory.newInstance();
SchemaFactory schemaFactory = SchemaFactory
.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
// *** CODE FAILS ON THE BELOW LINE **/
factory.setSchema(schemaFactory
.newSchema(new Source[] { new StreamSource(schemaStream) }));
SAXParser parser = factory.newSAXParser();
SAXReader reader = new SAXReader(parser.getXMLReader());
reader.setValidation(false);
reader.setErrorHandler(new ResultProducingErrorHandler());
reader.read(content);
Whenever I run the above code, I get an error along the lines of:
src-resolve: Cannot resolve the name 'ns:myStructure' to a(n) 'type definition' component.
The elements mentioned in the error messages are all ones that are imported into the schema via calls to <xs:import />. The schema seems to validate OK via the W3C XML Schema Validator.
Do I have to include each of these schema's individually or is Java smart enough to go off and fetch these extra schema's too? I tried adding them in the array passed to the newSchema call but that didn't make any difference.
I don't think I can give out the link to the schema, so I'm really just looking for a yes or no regarding if my code looks at least acceptable.

Ensure that the xs:import statements point to paths that are reachable from the current directory of your application. The current directory may not be what you think it is.

Related

XSL - Exclude access to ACCESS_EXTERNAL_STYLESHEET

I'm looking for informations and a solution regarding an issue that I have with my implementation of SAXParser to perform XSL Transformation.
In order to improve the quality of our project, the sonarqube sensitivity has been rised. Then a new error appearred for my implementation.
Sonarqube is asking me to set properties to empty value in order to exclude the possibilities of an attack based on those values.
Problem, if I can set the property for ACCESS_EXTERNAL_DTD and ACCESS_EXTERNAL_SCHEMA to empty correctly, the property ACCESS_EXTERNAL_STYLESHEET seems to not be a valid property for SAXParser. And without it set correctly, sonarqube doesn't remove the blocker error as it seems mandatory for XSL Transformation.
SAXParser saxParser = saxParserFactory.newSAXParser();
saxParser.setProperty(XMLConstants.ACCESS_EXTERNAL_DTD, ""); // Work
saxParser.setProperty(XMLConstants.ACCESS_EXTERNAL_SCHEMA, ""); // Work
saxParser.setProperty(XMLConstants.ACCESS_EXTERNAL_STYLESHEET, ""); // Doesn't work and throw org.xml.sax.SAXNotRecognizedException
What should I do ?
I'm under Saxon-HE:9.8.0-8
Thank you in advance

Java SAX parser, How do I prevent character references entirely? (DoS attack)

The XML files of incoming request needs to be validated. One requierement is that character references are prevented entirely because of possible DoS attacks. If I configure the SAXParserFactory like below:
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
then the parer still resolves 100.000 entity expansions.
The parser has encountered more than "100.000" entity expansions in this document; this is the limit imposed by the application.
The prevention of external references was done via an EntityResolver which works fine. But how do I prevent the character references?

Character references cannot cause a denial of service attack, so there is no reason to prevent them.

An instance of org.apache.xerces.util.SecurityManager can limit the amount of entity expansions. Here's the an example.
SAXParser saxParser = spf.newSAXParser();
org.apache.xerces.util.SecurityManager mgr = new org.apache.xerces.util.SecurityManager();
mgr.setEntityExpansionLimit(-1);
saxParser.setProperty("http://apache.org/xml/properties/security-manager", mgr);
With this, the parsing process terminates if the XML file contains at least one entity reference. Now there's no more need for an EntityResolver.
The jar file which contains the SecurityManager can be downloaded here.

validating a schema file in local location with saxparser

I was looking at http://docs.oracle.com/javaee/1.4/tutorial/doc/JAXPSAX9.html.
You can associate the xml file with a schema with 2 ways, in the app or in the xml document. In the app you call
saxParser.setProperty(JAXP_SCHEMA_SOURCE,
new File(schemaSource));
in the xml you add this
<documentRoot
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation='YourSchemaDefinition.xsd'
>
The problem is that both locations for the .xsd file are URL strings. The .xsd file i have is a local copy. Is there a way to specify the location? maybe as an input stream?

You can set the schema directly on the SAX Parser factory.
SAXParserFactory factory = SAXParserFactory.newInstance();
SchemaFactory schemafactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema sc = schemafactory.newSchema(new File("path to xsd file"));
factory.setSchema(sc);
SAXParser parser = factory.newSAXParser();
parser.parse(file, handler);
The xsd location in the xml file can also be relative to the xml file, so if your xsd is present along with the xml file locally then your current xml file should work.

I assume you're in java. If the schema is in the classpath, you can probably use this post to get it : URL to load resources from the classpath in Java
Having the schemaLocation in instance can be hard to handle if you receive the XML file from a third party. The schemaLocation may be already defined in the XML and may lead to a wrong schema (or to nothing at all). If you want to add it programmatically, you will have to change integrity of data before validation, it can be risky. For validation, IMO, better trust your local copy.

java.net.ConnectException : Validating Xml against XSD : local machine

I need to validate an XML against a local XSD and I do not have a internet connection on the target machine (on which this process runs). The code look like below :
SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
File schemaLocation = new File(xsd);
Schema schema = factory.newSchema(schemaLocation);
Validator validator = schema.newValidator();
Source source = new StreamSource(new BufferedInputStream(new FileInputStream(new File(xml))));
validator.validate(source);
I always get a java.net.ConnectException when validate() is called.
Can you please let me know what is not being done correctly ?
Many Thanks.
Abhishek

Agreed with Mads' comment - there are likely many references here that will attempt outgoing connections to the Internet, and you will need to download local copies for them. However, I'd advise against changing references within the XML or schema files, etc. - but instead, provide an EntityResolver to return the contents of your local copies instead of connecting out to the Internet. (I previously wrote a little bit about this at http://blogger.ziesemer.com/2009/01/xml-and-xslt-tips-and-tricks-for-java.html#InputValidation.)
However, in your case, since you're using a Validator instead of Validator.setResourceResolver(...) - and pass-in a LSResourceResolver, before calling validate.

XML to be validated against multiple xsd schemas

I'm writing the xsd and the code to validate, so I have great control here.
I would like to have an upload facility that adds stuff to my application based on an xml file. One part of the xml file should be validated against different schemas based on one of the values in the other part of it. Here's an example to illustrate:
<foo>
<name>Harold</name>
<bar>Alpha</bar>
<baz>Mercury</baz>
<!-- ... more general info that applies to all foos ... -->
<bar-config>
<!-- the content here is specific to the bar named "Alpha" -->
</bar-config>
<baz-config>
<!-- the content here is specific to the baz named "Mercury" -->
</baz>
</foo>
In this case, there is some controlled vocabulary for the content of <bar>, and I can handle that part just fine. Then, based on the bar value, the appropriate xml schema should be used to validate the content of bar-config. Similarly for baz and baz-config.
The code doing the parsing/validation is written in Java. Not sure how language-dependent the solution will be.
Ideally, the solution would permit the xml author to declare the appropriate schema locations and what-not so that s/he could get the xml validated on the fly in a sufficiently smart editor.
Also, the possible values for <bar> and <baz> are orthogonal, so I don't want to do this by extension for every possible bar/baz combo. What I mean is, if there are 24 possible bar values/schemas and 8 possible baz values/schemas, I want to be able to write 1 + 24 + 8 = 33 total schemas, instead of 1 * 24 * 8 = 192 total schemas.
Also, I'd prefer to NOT break out the bar-config and baz-config into separate xml files if possible. I realize that might make all the problems much easier, as each xml file would have a single schema, but I'm trying to see if there is a good single-xml-file solution.

I finally figured this out.
First of all, in the foo schema, the bar-config and baz-config elements have a type which includes an any element, like this:
<sequence>
<any minOccurs="0" maxOccurs="1"
processContents="lax" namespace="##any" />
</sequence>
In the xml, then, you must specify the proper namespace using the xmlns attribute on the child element of bar-config or baz-config, like this:
<bar-config>
<config xmlns="http://www.example.org/bar/Alpha">
... config xml here ...
</config>
</bar-config>
Then, your XML schema file for bar Alpha will have a target namespace of http://www.example.org/bar/Alpha and will define the root element config.
If your XML file has namespace declarations and schema locations for both of the schema files, this is sufficient for the editor to do all of the validating (at least good enough for Eclipse).
So far, we have satisfied the requirement that the xml author may write the xml in such a way that it is validated in the editor.
Now, we need the consumer to be able to validate. In my case, I'm using Java.
If by some chance, you know the schema files that you will need to use to validate ahead of time, then you simply create a single Schema object and validate as usual, like this:
Schema schema = factory().newSchema(new Source[] {
new StreamSource(stream("foo.xsd")),
new StreamSource(stream("Alpha.xsd")),
new StreamSource(stream("Mercury.xsd")),
});
In this case, however, we don't know which xsd files to use until we have parsed the main document. So, the general procedure is to:
Validate the xml using only the main (foo) schema
Determine the schema to use to validate the portion of the document
Find the node that is the root of the portion to validate using a separate schema
Import that node into a brand new document
Validate the brand new document using the other schema file
Caveat: it appears that the document must be built namespace-aware in order for this to work.
Here's some code (this was ripped from various places of my code, so there might be some errors introduced by the copy-and-paste):
// Contains the filename of the xml file
String filename;
// Load the xml data using a namespace-aware builder (the method
// 'stream' simply opens an input stream on a file)
Document document;
DocumentBuilderFactory docBuilderFactory =
DocumentBuilderFactory.newInstance();
docBuilderFactory.setNamespaceAware(true);
document = docBuilderFactory.newDocumentBuilder().parse(stream(filename));
// Create the schema factory
SchemaFactory sFactory = SchemaFactory.newInstance(
XMLConstants.W3C_XML_SCHEMA_NS_URI);
// Load the main schema
Schema schema = sFactory.newSchema(
new StreamSource(stream("foo.xsd")));
// Validate using main schema
schema.newValidator().validate(new DOMSource(document));
// Get the node that is the root for the portion you want to validate
// using another schema
Node node= getSpecialNode(document);
// Build a Document from that node
Document subDocument = docBuilderFactory.newDocumentBuilder().newDocument();
subDocument.appendChild(subDocument.importNode(node, true));
// Determine the schema to use using your own logic
Schema subSchema = parseAndDetermineSchema(document);
// Validate using other schema
subSchema.newValidator().validate(new DOMSource(subDocument));

Take a look at NVDL (Namespace-based Validation Dispatching Language) - http://www.nvdl.org/
It is designed to do what you want to do (validate parts of an XML document that have their own namespaces and schemas).
There is a tutorial here - http://www.dpawson.co.uk/nvdl/ - and a Java implementation here - http://jnvdl.sourceforge.net/
Hope that helps!
Kevin

You need to define a target namespace for each separately-validated portions of the instance document. Then you define a master schema that uses <xsd:include> to reference the schema documents for these components.
The limitation with this approach is that you can't let the individual components define the schemas that should be used to validate them. But it's a bad idea in general to let a document tell you how to validate it (ie, validation should something that your application controls).

You can also use a "resource resolver" to allow "xml authors" to specify their own schema file, at least to some extent, ex: https://stackoverflow.com/a/41225329/32453 at the end of the day, you want a fully compliant xml file that can be validatable with normal tools, anyway :)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

XML Validation: Am I Doing It Right? - java

Ensure that the xs:import statements point to paths that are reachable from the current directory of your application. The current directory may not be what you think it is.

Related

XSL - Exclude access to ACCESS_EXTERNAL_STYLESHEET

Java SAX parser, How do I prevent character references entirely? (DoS attack)

validating a schema file in local location with saxparser

java.net.ConnectException : Validating Xml against XSD : local machine

XML to be validated against multiple xsd schemas

Categories

Resources