Offline XML Validation with Java - java

I need to figure out how to validate my XML files with schema's offline. After looking around for a couple of days, what I was able to find was basically that I needed to have an internal reference to the schema. I needed to find them, download them, and change the reference to a local system path. What I was unable to find was exactly how to do that. Where and how can I change the reference to point internally instead of externally? What is the best way to download the schemas?

There are three ways you could do this. What they all have in common is that you need a local copy of the schema document(s). I'm assuming that the instance documents currently use xsi:schemaLocation and/or xsi:noNamespaceSchemaLocation to point to a location holding the schema document(s) on the web.
(a) Modify your instance documents to refer to the local copy of the schema documents. This is usually inconvenient.
(b) Redirect the references so that a request for a remote file is redirected to a local file. The way to set this up depends on which schema validator you are using and how you are invoking it.
(c) Tell the schema processor to ignore the values of xsi:schemaLocation and xsi:noNamespaceSchemaLocation, and to validate instead against a schema that you supply using your schema processor's invocation API. Again the details depend on which schema processor you are using.
My preferred approach is (c): if only because when you are validating a source document, then by definition you don't fully trust it - so why should you trust it to contain a correct xsi:schemaLocation attribute?

XmlValidate is a simple but powerful command-line tool that can perform offline validation of single or multiple XML files against target schemas. It can scan local xml files by file name, directory, or URL.
XmlValidate automatically adds the schemaLocation based on the schema namespace and a config file that mapping to a local file. The tool will validate against whatever XML Schema is referenced in the config file.
Here are example mappings of namespace to target Schema in config file:
http://www.opengis.net/kml/2.2=${XV_HOME}/schemas/kml22.xsd
http://appengine.google.com/ns/1.0=C:/xml/appengine-web.xsd
urn:oasis:names:tc:ciq:xsdschema:xAL:2.0=C:/xml/xAL.xsd
Note that ${XV_HOME} token above is simply an alias for the top-level directory that XmlValidate is running from. The location can likewise be a full file path.
XmlValidate is an open-source project (source code available) that runs with the Java Runtime Environment (JRE). The bundled application (Java jars, examples, etc.) can be downloaded here.
If XmlValidate is run in batch mode against multiple XML files, it will provide a summary of validation results.
Errors: 17 Warnings: 0 Files: 11 Time: 1506 ms
Valid files 8/11 (73%)

You can set your own Implementation of ResourceResolver and LSInput to the SchemaFactory so that the call of
of LSInput.getCharacterStream() will provide a schema from a local path.
I have written an extra class to do offline validation. You can call it like
new XmlSchemaValidator().validate(xmlStream, schemaStream, "https://schema.datacite.org/meta/kernel-4.1/",
"schemas/datacite/kernel-4.1/");
Two InputStream are beeing passed. One for the xml, one for the schema. A baseUrl and a localPath (relative on classpath) are passed as third and fourth parameter. The last two parameters are used by the validator to lookup additional schemas locally at localPath or relative to the provided baseUrl.
I have tested with a set of schemas and examples from https://schema.datacite.org/meta/kernel-4.1/ .
Complete Example:
#Test
public void validate4() throws Exception {
InputStream xmlStream = Thread.currentThread().getContextClassLoader().getResourceAsStream(
"schemas/datacite/kernel-4.1/example/datacite-example-complicated-v4.1.xml");
InputStream schemaStream = Thread.currentThread().getContextClassLoader()
.getResourceAsStream("schemas/datacite/kernel-4.1/metadata.xsd");
new XmlSchemaValidator().validate(xmlStream, schemaStream, "https://schema.datacite.org/meta/kernel-4.1/",
"schemas/datacite/kernel-4.1/");
}
The XmlSchemaValidator will validate the xml against the schema and will search locally for included Schemas. It uses a ResourceResolver to override the standard behaviour and to search locally.
public class XmlSchemaValidator {
/**
* #param xmlStream
* xml data as a stream
* #param schemaStream
* schema as a stream
* #param baseUri
* to search for relative pathes on the web
* #param localPath
* to search for schemas on a local directory
* #throws SAXException
* if validation fails
* #throws IOException
* not further specified
*/
public void validate(InputStream xmlStream, InputStream schemaStream, String baseUri, String localPath)
throws SAXException, IOException {
Source xmlFile = new StreamSource(xmlStream);
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
factory.setResourceResolver((type, namespaceURI, publicId, systemId, baseURI) -> {
LSInput input = new DOMInputImpl();
input.setPublicId(publicId);
input.setSystemId(systemId);
input.setBaseURI(baseUri);
input.setCharacterStream(new InputStreamReader(
getSchemaAsStream(input.getSystemId(), input.getBaseURI(), localPath)));
return input;
});
Schema schema = factory.newSchema(new StreamSource(schemaStream));
javax.xml.validation.Validator validator = schema.newValidator();
validator.validate(xmlFile);
}
private InputStream getSchemaAsStream(String systemId, String baseUri, String localPath) {
InputStream in = getSchemaFromClasspath(systemId, localPath);
// You could just return in; , if you are sure that everything is on
// your machine. Here I call getSchemaFromWeb as last resort.
return in == null ? getSchemaFromWeb(baseUri, systemId) : in;
}
private InputStream getSchemaFromClasspath(String systemId, String localPath) {
System.out.println("Try to get stuff from localdir: " + localPath + systemId);
return Thread.currentThread().getContextClassLoader().getResourceAsStream(localPath + systemId);
}
/*
* You can leave out the webstuff if you are sure that everything is
* available on your machine
*/
private InputStream getSchemaFromWeb(String baseUri, String systemId) {
try {
URI uri = new URI(systemId);
if (uri.isAbsolute()) {
System.out.println("Get stuff from web: " + systemId);
return urlToInputStream(uri.toURL(), "text/xml");
}
System.out.println("Get stuff from web: Host: " + baseUri + " Path: " + systemId);
return getSchemaRelativeToBaseUri(baseUri, systemId);
} catch (Exception e) {
// maybe the systemId is not a valid URI or
// the web has nothing to offer under this address
}
return null;
}
private InputStream urlToInputStream(URL url, String accept) {
HttpURLConnection con = null;
InputStream inputStream = null;
try {
con = (HttpURLConnection) url.openConnection();
con.setConnectTimeout(15000);
con.setRequestProperty("User-Agent", "Name of my application.");
con.setReadTimeout(15000);
con.setRequestProperty("Accept", accept);
con.connect();
int responseCode = con.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_MOVED_PERM
|| responseCode == HttpURLConnection.HTTP_MOVED_TEMP || responseCode == 307
|| responseCode == 303) {
String redirectUrl = con.getHeaderField("Location");
try {
URL newUrl = new URL(redirectUrl);
return urlToInputStream(newUrl, accept);
} catch (MalformedURLException e) {
URL newUrl = new URL(url.getProtocol() + "://" + url.getHost() + redirectUrl);
return urlToInputStream(newUrl, accept);
}
}
inputStream = con.getInputStream();
return inputStream;
} catch (SocketTimeoutException e) {
throw new RuntimeException(e);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
private InputStream getSchemaRelativeToBaseUri(String baseUri, String systemId) {
try {
URL url = new URL(baseUri + systemId);
return urlToInputStream(url, "text/xml");
} catch (Exception e) {
e.printStackTrace();
throw new RuntimeException(e);
}
}
}
prints
Try to get stuff from localdir: schemas/datacite/kernel-4.1/http://www.w3.org/2009/01/xml.xsd
Get stuff from web: http://www.w3.org/2009/01/xml.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-titleType-v4.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-contributorType-v4.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-dateType-v4.1.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-resourceType-v4.1.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-relationType-v4.1.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-relatedIdentifierType-v4.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-funderIdentifierType-v4.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-descriptionType-v4.xsd
Try to get stuff from localdir: schemas/datacite/kernel-4.1/include/datacite-nameType-v4.1.xsd
The print shows that the validator was able to validate against a set of local schemas. Only http://www.w3.org/2009/01/xml.xsd was not available locally and therefore fetched from the internet.

Related

Can't access file in .jar via URL

I need to access a file inside the currently executed .jar using a URL.
URL url = BlockConverter.class.getResource("/test.txt");
System.out.println(url.toString());
InputStream is = url.openStream();
This is what I did.
The output is:
jar:file:/C:/Users/User/Desktop/SERVER/plugins/MyJar.jar!/test.txt
My InputStream always ends up throwing an IOException when being initialized, even though the URL is being output correctly.
So how is that possible?
Why can't I open the stream?
EDIT:
Also, please don't answer with "use getResourceAsStream", since it uses the same kind of code:
public InputStream getResourceAsStream(String name) {
URL url = getResource(name);
try {
return url != null ? url.openStream() : null;
} catch (IOException e) {
return null;
}
}
I would open it as a stream directly e.g.
InputStream is = BlockConverter.class.getResourceAsStream("/test.txt");
The above method is the way I normally access resources within a jar (it will open the resource regardless of it being packaged within a jar, or simply as an unpackaged deployment, note)

How do you access .properties files inside an AppEngine app?

I've got an AppEngine app with two different instances, one for prod and one for staging. Accordingly, I'd like to configure the staging instance slightly differently, since it'll be used for testing. Disabling emails, talking to a different test backend for data, that kind of thing.
My first intuition was to use a .properties file, but I can't seem to get it to work. I'm using Gradle as a build system, so the file is saved in src/main/webapp/WEB-INF/staging.properties (and a matching production.properties next to it). I'm trying to access it like so:
public class Config {
private static Config sInstance = null;
private Properties mProperties;
public static Config getInstance() {
if (sInstance == null) {
sInstance = new Config();
}
return sInstance;
}
private Config() {
// Select properties filename.
String filename;
if (!STAGING) { // PRODUCTION SETTINGS
filename = "/WEB-INF/production.properties";
} else { // DEBUG SETTINGS
filename = "/WEB-INF/staging.properties";
}
// Get handle to file.
InputStream stream = this.getClass().getClassLoader().getResourceAsStream(filename);
if (stream == null) {
// --> Crashes here. <--
throw new ExceptionInInitializerError("Unable to open settings file: " + filename);
}
// Parse.
mProperties = new Properties();
try {
mProperties.load(stream);
} catch (IOException e) {
throw new ExceptionInInitializerError(e);
}
}
The problem is that getResourceAsStream() is always returning null. I checked the build/exploded-app directory, and the .properties file shows up there. I also checked the .war file, and found the .properties file there as well.
I've also tried moving the file into /WEB-INF/classes, but that didn't make a difference either.
What am I missing here?
Try
BufferedReader reader = new BufferedReader(new FileReader(filename));
or
InputStream stream = this.getClass().getResourceAsStream(filename);

How to match catalog.xml entities with database?

I have to validate some xml files with .xsd files, which are listed in catalog.xml, but they are in database. So i need resolver, which will match systemId from catalog.xml with .xsd file stored as blob in database.
I found that XMLInputSource resolveEntity(XMLResourceIdentifier resourceIdentifier) method doing this, but I can't find how parser uses this method, so I'm not sure how to override it to do it propertly. I thought that it returns XMLInputStram which contains .xsd file in Stream, but it's not true because of "leaving resolution of the entity and opening of the input stream up to the caller", according to XMLInputSource documentation.
So my question is - how to map entities from catalog.xml with .xsd files stored in database?
I really hope that I explained problem clearly, but I know that my english is really poor - so feel free to ask for more details or better explaation.
Greetings,
Rzysia
Here's the resolver I wrote for the maven-jaxb2-plugin. This resolver resolves system ids to resources in Maven artifacts. This is somewhat similar task to yours.
Your task is, basically, to implement the resolveEntity method.
Normally it is practical to extend an existing CatalogResolver.
Then you can override the getResolvedEntity method.
Typically you first call the super method to resolve systemId/publicId.
Then you try to do you custom resolution.
systemId is normally the resource location URL (or logical URI).
publicId is often the namespace URI.
Here's a simple code snippet from another resolver which resolves classpath:com/acme/foo/schema.xsd in the classpath:
public static final String URI_SCHEME_CLASSPATH = "classpath";
#Override
public String getResolvedEntity(String publicId, String systemId) {
final String result = super.getResolvedEntity(publicId, systemId);
if (result == null) {
return null;
}
try {
final URI uri = new URI(result);
if (URI_SCHEME_CLASSPATH.equals(uri.getScheme())) {
final String schemeSpecificPart = uri.getSchemeSpecificPart();
final URL resource = Thread.currentThread()
.getContextClassLoader()
.getResource(schemeSpecificPart);
if (resource == null) {
return null;
} else {
return resource.toString();
}
} else {
return result;
}
} catch (URISyntaxException urisex) {
return result;
}
}
In your scenario, I'd do the following:
Define the URI schema like database:schema:table:value:id:schema.xsd.
Write a catalog resolver which is capable of resolving such URIs.
Define a catalog file which rewrites namespace URIs or absolute schema location URLs to your database:... URIs.
In simple notation this would be something like:
REWRITE_SYSTEM "http://example.com/schemas" "database:schemas:content:schema_id:example/schemas"
So the "base" catalog resolver would first resolve http://example.com/schemas/schema.xsd into database:schemas:content:schema_id:example/schemas/schema.xsd.
Then your code resolves database:schemas:content:schema_id:example/schemas/schema.xsd into a database resource.
Ok, i found solution - as I thought, method XMLInputSource resolveEntity(XMLResourceIdentifier resourceIdentifier) shoud return XMLInputSource with setted my own InputStream containing speciefied xsd schema.
My version of this overrided class:
public XMLInputSource resolveEntity(XMLResourceIdentifier resourceIdentifier)
throws XNIException, IOException {
String resolvedId = resolveIdentifier(resourceIdentifier);
if (resolvedId != null) {
XMLInputSource xmlis = new XMLInputSource(resourceIdentifier.getPublicId(),
resolvedId,
resourceIdentifier.getBaseSystemId());
try {
InputStream is = getXSDFromDb(resourceIdentifier.getLiteralSystemId());
xmlis.setByteStream(is);
} catch (SQLException ex) {
ex.printStackTrace();
}
return xmlis;
}
return null;

How do I validate an XML file against an XSD through https URL?

I would like to validate an XML file using a schema located at a secure https site. How do I tell the validator to except a self-signed certificate or use an https URL? I have a file called test.xml and a schema located at https://localhost:1234/module/testschema.xsd. I'm using the same code found here. If I use a regular URL (http://localhost/module/testschema.xsd), it works great. If I substitute with an https URL, then I get this error:
schema_reference.4: Failed to read schema document 'https://localhost:1234/module/testschema.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.
Copied Code:
public boolean validateFile(String xml, String strSchemaLocation)
{
Source xmlFile = null;
try {
URL schemaFile = new URL(strSchemaLocation);
xmlFile = new StreamSource(new File(xml));
SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema(schemaFile);
Validator validator = schema.newValidator();
validator.validate(xmlFile);
System.out.println(xmlFile.getSystemId() + " is valid");
} catch (SAXException e) {
System.out.println(xmlFile.getSystemId() + " is NOT valid");
System.out.println("Reason: " + e.getLocalizedMessage());
return false;
} catch (IOException ioe) {
System.out.println("IOException");
return false;
}
return true;
}
This has very little to to do with schema validation. Your problem is that you need to establish an HTTPS connection and trust a self-signed certificate. See How can I use different certificates on specific connections? or google around for that.
I don't think you'll be able to use the SchemaFactory.newSchema factory method that takes a File, so just use the one that takes a StreamSource:
URL schemaFile = new URL(strSchemaLocation);
HttpsURLConnection schemaConn = (HttpsURLConnection)schemaFile.openConnection();
// Magic from the other answer to accept self-signed cert
InputStream is = schemaConn.getInputStream();
SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema(new StreamSource(is));
(I'm leaving out the try..catch to close the input stream and the connection)
It's not a validation problem, java.net.URL supports https, there should be bo difference. Just make sure that you can open https://localhost:1234/module/testschema.xsd with a browser.

Can Xerces support XMLCatalogResolver and <xs:include/> at the same time?

Xerces claims to allow XML Catalog support to be added to a reader like this:
XMLCatalogResolver resolver = new XMLCatalogResolver();
resolver.setPreferPublic(true);
resolver.setCatalogList(catalogs);
XMLReader reader = XMLReaderFactory.createXMLReader(
"org.apache.xerces.parsers.SAXParser");
reader.setProperty("http://apache.org/xml/properties/internal/entity-resolver",
resolver);
But as soon as I do this then any <xs:include/> tags in my schemas are no longer processed. It seems like the XMLCatalogResolver becomes the only go-to place for entity resolution once it's added, so includes can't work anymore. Eclipse OTOH successfully validates using the same catalog, so it should be possilbe.
Is there a way around this, or are there any other Java based validators that support catalogs?
Thanks, Dominic.
I finally solved this by overriding the XMLCatalogResolver and logging the various calls made to the resolveEntity() method. I observed 3 types of call being made, only one of which made sense to be resolved using the XML catalog. So, I merely returned a FileInputStream directly for the other two call types.
Here is the code I used inside my custom XMLCatalogResolver class:
public XMLInputSource resolveEntity(XMLResourceIdentifier resourceIdentifier)
throws IOException
{
if(resourceIdentifier.getExpandedSystemId() != null)
{
return new XMLInputSource(resourceIdentifier.getPublicId(),
resourceIdentifier.getLiteralSystemId(),
resourceIdentifier.getBaseSystemId(),
new FileReader(getFile(resourceIdentifier.getExpandedSystemId())),
"UTF-8");
}
else if((resourceIdentifier.getBaseSystemId() != null) &&
(resourceIdentifier.getNamespace() == null))
{
return new XMLInputSource(resourceIdentifier.getPublicId(),
resourceIdentifier.getLiteralSystemId(),
resourceIdentifier.getBaseSystemId(),
new FileReader(getFile(resourceIdentifier.getBaseSystemId())),
"UTF-8");
}
else
{
return super.resolveEntity(resourceIdentifier);
}
}
private File getFile(String urlString) throws MalformedURLException
{
URL url = new URL(urlString);
return new File(url.toURI());
}
I'm not sure why this wouldn't be done by default within Xerces, but hopefully this helps the next person that encounters this problem.

Categories