java.net.ConnectException : Validating Xml against XSD : local machine - java

I need to validate an XML against a local XSD and I do not have a internet connection on the target machine (on which this process runs). The code look like below :
SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
File schemaLocation = new File(xsd);
Schema schema = factory.newSchema(schemaLocation);
Validator validator = schema.newValidator();
Source source = new StreamSource(new BufferedInputStream(new FileInputStream(new File(xml))));
validator.validate(source);
I always get a java.net.ConnectException when validate() is called.
Can you please let me know what is not being done correctly ?
Many Thanks.
Abhishek

Agreed with Mads' comment - there are likely many references here that will attempt outgoing connections to the Internet, and you will need to download local copies for them. However, I'd advise against changing references within the XML or schema files, etc. - but instead, provide an EntityResolver to return the contents of your local copies instead of connecting out to the Internet. (I previously wrote a little bit about this at http://blogger.ziesemer.com/2009/01/xml-and-xslt-tips-and-tricks-for-java.html#InputValidation.)
However, in your case, since you're using a Validator instead of Validator.setResourceResolver(...) - and pass-in a LSResourceResolver, before calling validate.

Related

javax.xml.validation.Schema not reading server file in JBoss EAP 5.1 correctly

I have the following Java code running on a RESTEasy web service, to get a schema file to validate xml against (note: The project folder is called "MyWebService"):
String classDir = this.getClass().getProtectionDomain().getCodeSource().getLocation().getPath();
String myWebServiceDir = classDir.substring(0, classDir.lastIndexOf("MyWebService"));
String instanceXSDPath = myWebServiceDir + SCHEMAS_FOLDER + xsdFile;
System.out.println("Streamsource file location: " + instanceXSDPath);
File file = new File(instanceXSDPath);
//StreamSource streamSource = new StreamSource(file);
// Note: Using StreamSource or File seems to make no difference to the issue
Schema schema = factory.newSchema(file);
When I run the above code on my local machine, everything works great. However, when I run the above code on my development server, I get the following error:
2013-11-20 12:47:48,275 INFO [STDOUT] (ajp-127.0.0.1-8009-4)
Streamsource file location:
file:/Y:/jboss/jboss-as/server/default/deploy/DEVELOPER_DEPLOY/Schemas/Extensions/1/instance.xsd
2013-11-20 12:47:48,275 ERROR
[com.mywebservice.dao.validate]
(ajp-127.0.0.1-8009-4) Could not parse the given object for schema
validation: org.xml.sax.SAXParseException: schema_reference.4: Failed
to read schema document
'file:/Y:/jboss/jboss-as/bin/file:/Y:/jboss/jboss-as/server/default/deploy/DEVELOPER_DEPLOY/Schemas/Extensions/1/instance.xsd',
because 1) could not find the document; 2) the document could not be
read; 3) the root element of the document is not . at
org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown
Source)
I don't understand why on the server EAP logs the above code is trying to read in instance.xsd from a path that looks like this:
'file:/Y:/jboss/jboss-as/bin/file:/Y:/jboss/jboss-as/server/default/deploy/DEVELOPER_DEPLOY/Schemas/Extensions/1/instance.xsd'
... because on my localhost EAP logs it is reading instance.xsd from a location that looks like this:
'location:
/C:/Users/chen/workspace/.metadata/.plugins/org.jboss.ide.eclipse.as.core/JBoss_5.1_Runtime_Server1382499548190/deploy/Schemas/Extensions/1/instance.xsd'
Does anyone have any ideas as to why? Why is the print message saying 'file' on the server, but 'location' on my locahost in the server logs of each respectively? Perhaps that has something to do with the issue.
The answer was to use StreamSource and set the System Id of the StreamSource to it's original path like so:
StreamSource streamSource = new StreamSource(file);
System.out.println("Stream source system id: " + streamSource.getSystemId());
streamSource.setSystemId(xsltPath);
Schema schema = factory.newSchema(streamSource); // change for each artefact type
This worked, to prevent the mashing of two 'file:' locations, since it hardwired the System Id to be just one. I have no idea why the server was mixing two 'file:' locations into the System Id, but who cares, problem solved!

Validate and parse xml using woodstox with local dtd

I have seen multiple questions that relate to parsing xmls using woodstox and JAXB to unmarshal using the XMLStreamReader and validating against schemas.Reading though them hasn't helped. What I need is to validate an incoming xml with a local DTD and parse the entire contents into an object representation. The incoming xml can have a DOCTYPE which includes a DTD. This needs to be skipped and a local DTD needs to be used instead. The implementation should be very quick. Expected < 1ms to do the validation and parsing. I could manage to parse alone using the following in 5ms. Incorporating validation doesn't work with setting the schema (commented lines of code)
xmlif = XMLInputFactory2.newInstance();
xmlif.setProperty(XMLInputFactory2.SUPPORT_DTD, false);
JAXBContext ucontext;
ucontext = JAXBContext.newInstance(XMLOuterElementClass.class);
unmarshaller = ucontext.createUnmarshaller();
/*SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.XML_DTD_NS_URI);
Schema schema = sf.newSchema(new File("c:/resources/schma.dtd"));
unmarshaller.setSchema(schema);*/
XMLStreamReader xsr = xmlif
.createXMLStreamReader(new StringReader(xml));
//xsr = new StreamReaderDelegate(xsr);
long start = System.currentTimeMillis();
try {
while (xsr.hasNext()) {
if (xsr.isStartElement()
&& xsr.getLocalName() == "XMLOuterElementClass") {
break;
}
xsr.next();
}
JAXBElement<XMLOuterElementClass> jb = unmarshaller.unmarshal(xsr,
XMLOuterElementClass.class);
System.out.println("Total time taken in ms :" + (end - start));
} finally {
xsr.close();
}
There are multiple ways to do it; and the best way to get an answer with more depth is to ask this on Woodstox user list (see http://xircles.codehaus.org/projects/woodstox/lists).
But one thing to note is that JAXB knows nothing about Stax2 (Woodstox/Aalto extension over basic Stax), so you need to access it via Stax2 API, not JAXB. So, to enable "external" validation, you need to call:
xmlStreamReader2.validateAgainst(schemaFromDTD);
and you can do this right after constructing stream reader (needs to cast to XMLStreamReader2, or at least to Validatable).
Note that you can validate when reading OR writing, both work similarly (in latter case you enable it via XMLStreamWriter).
Another possibility is to define XMLResolver property (see XMLInputFactory.RESOLVER).
It gets called when trying to read an external dtd, that is, when DOCTYPE contains reference to an external file. Custom XMLResolver can then redirect this read to use some other source.
Note that the first approach (one you started with) is likely more efficient as it only needs to read and parse Schema once, assuming you read it once and reuse afterwards.
Validation itself should be fast, and if parsing takes 4 milliseconds, should not take more than 1 millisecond; especially if you include JAXB processing in 4 milliseconds (that's technically data-binding, above lower level parsing).

Commons configuration library to add elements

I am using the apache commons configuration library to read a configuration xml and it works nicely. However, I am not able to modify the value of the elements or add new ones.
To read the xml I use the following code:
XMLConfiguration config = new XMLConfiguration(dnsXmlPath);
boolean enabled = config.getBoolean("enabled", true));
int size = config.getInt("size");
To write I am trying to use:
config.setProperty("newProperty", "valueNewProperty");
config.save();
If I call config.getString("newProperty"), I obtain "valueNewProperty", but the xml has not been changed.
Obviously it is not the right way or I am missing something, because it does not work.
Could anybody tell me how to do this?
Thanks in advance.
You're modifying xml structure in memory
The parsed document will be stored keeping its structure. The class also tries to preserve as much information from the loaded XML document as possible, including comments and processing instructions. These will be contained in documents created by the save() methods, too.
Like other file based configuration classes this class maintains the name and path to the loaded configuration file. These properties can be altered using several setter methods, but they are not modified by save() and load() methods. If XML documents contain relative paths to other documents (e.g. to a DTD), these references are resolved based on the path set for this configuration.
You need to use XMLConfiguration.html#save(java.io.Writer) method
For example, after you've done all your modifications save it:
config.save(new PrintWriter(new File(dnsXmlPath)));
EDIT
As mentioned in comment, calling config.load() before calling setProperty() method fixes the issue.
I solved it with the following lines. I was missing the config.load().
XMLConfiguration config = new XMLConfiguration(dnsXmlPath);
config.load();
config.setProperty("newProperty", "valueNewProperty");
config.save();
It is true though that you can used the next line instead of config.save() and works the same.
config.save(new PrintWriter(new File(dnsXmlPath)));

XML to be validated against multiple xsd schemas

I'm writing the xsd and the code to validate, so I have great control here.
I would like to have an upload facility that adds stuff to my application based on an xml file. One part of the xml file should be validated against different schemas based on one of the values in the other part of it. Here's an example to illustrate:
<foo>
<name>Harold</name>
<bar>Alpha</bar>
<baz>Mercury</baz>
<!-- ... more general info that applies to all foos ... -->
<bar-config>
<!-- the content here is specific to the bar named "Alpha" -->
</bar-config>
<baz-config>
<!-- the content here is specific to the baz named "Mercury" -->
</baz>
</foo>
In this case, there is some controlled vocabulary for the content of <bar>, and I can handle that part just fine. Then, based on the bar value, the appropriate xml schema should be used to validate the content of bar-config. Similarly for baz and baz-config.
The code doing the parsing/validation is written in Java. Not sure how language-dependent the solution will be.
Ideally, the solution would permit the xml author to declare the appropriate schema locations and what-not so that s/he could get the xml validated on the fly in a sufficiently smart editor.
Also, the possible values for <bar> and <baz> are orthogonal, so I don't want to do this by extension for every possible bar/baz combo. What I mean is, if there are 24 possible bar values/schemas and 8 possible baz values/schemas, I want to be able to write 1 + 24 + 8 = 33 total schemas, instead of 1 * 24 * 8 = 192 total schemas.
Also, I'd prefer to NOT break out the bar-config and baz-config into separate xml files if possible. I realize that might make all the problems much easier, as each xml file would have a single schema, but I'm trying to see if there is a good single-xml-file solution.
I finally figured this out.
First of all, in the foo schema, the bar-config and baz-config elements have a type which includes an any element, like this:
<sequence>
<any minOccurs="0" maxOccurs="1"
processContents="lax" namespace="##any" />
</sequence>
In the xml, then, you must specify the proper namespace using the xmlns attribute on the child element of bar-config or baz-config, like this:
<bar-config>
<config xmlns="http://www.example.org/bar/Alpha">
... config xml here ...
</config>
</bar-config>
Then, your XML schema file for bar Alpha will have a target namespace of http://www.example.org/bar/Alpha and will define the root element config.
If your XML file has namespace declarations and schema locations for both of the schema files, this is sufficient for the editor to do all of the validating (at least good enough for Eclipse).
So far, we have satisfied the requirement that the xml author may write the xml in such a way that it is validated in the editor.
Now, we need the consumer to be able to validate. In my case, I'm using Java.
If by some chance, you know the schema files that you will need to use to validate ahead of time, then you simply create a single Schema object and validate as usual, like this:
Schema schema = factory().newSchema(new Source[] {
new StreamSource(stream("foo.xsd")),
new StreamSource(stream("Alpha.xsd")),
new StreamSource(stream("Mercury.xsd")),
});
In this case, however, we don't know which xsd files to use until we have parsed the main document. So, the general procedure is to:
Validate the xml using only the main (foo) schema
Determine the schema to use to validate the portion of the document
Find the node that is the root of the portion to validate using a separate schema
Import that node into a brand new document
Validate the brand new document using the other schema file
Caveat: it appears that the document must be built namespace-aware in order for this to work.
Here's some code (this was ripped from various places of my code, so there might be some errors introduced by the copy-and-paste):
// Contains the filename of the xml file
String filename;
// Load the xml data using a namespace-aware builder (the method
// 'stream' simply opens an input stream on a file)
Document document;
DocumentBuilderFactory docBuilderFactory =
DocumentBuilderFactory.newInstance();
docBuilderFactory.setNamespaceAware(true);
document = docBuilderFactory.newDocumentBuilder().parse(stream(filename));
// Create the schema factory
SchemaFactory sFactory = SchemaFactory.newInstance(
XMLConstants.W3C_XML_SCHEMA_NS_URI);
// Load the main schema
Schema schema = sFactory.newSchema(
new StreamSource(stream("foo.xsd")));
// Validate using main schema
schema.newValidator().validate(new DOMSource(document));
// Get the node that is the root for the portion you want to validate
// using another schema
Node node= getSpecialNode(document);
// Build a Document from that node
Document subDocument = docBuilderFactory.newDocumentBuilder().newDocument();
subDocument.appendChild(subDocument.importNode(node, true));
// Determine the schema to use using your own logic
Schema subSchema = parseAndDetermineSchema(document);
// Validate using other schema
subSchema.newValidator().validate(new DOMSource(subDocument));
Take a look at NVDL (Namespace-based Validation Dispatching Language) - http://www.nvdl.org/
It is designed to do what you want to do (validate parts of an XML document that have their own namespaces and schemas).
There is a tutorial here - http://www.dpawson.co.uk/nvdl/ - and a Java implementation here - http://jnvdl.sourceforge.net/
Hope that helps!
Kevin
You need to define a target namespace for each separately-validated portions of the instance document. Then you define a master schema that uses <xsd:include> to reference the schema documents for these components.
The limitation with this approach is that you can't let the individual components define the schemas that should be used to validate them. But it's a bad idea in general to let a document tell you how to validate it (ie, validation should something that your application controls).
You can also use a "resource resolver" to allow "xml authors" to specify their own schema file, at least to some extent, ex: https://stackoverflow.com/a/41225329/32453 at the end of the day, you want a fully compliant xml file that can be validatable with normal tools, anyway :)

XML Validation: Am I Doing It Right?

I was just wondering if someone could give my XML validation code a once over to see if I'm doing it right. Here's the portion of code that is giving me the trouble...
SAXParserFactory factory = SAXParserFactory.newInstance();
SchemaFactory schemaFactory = SchemaFactory
.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
// *** CODE FAILS ON THE BELOW LINE **/
factory.setSchema(schemaFactory
.newSchema(new Source[] { new StreamSource(schemaStream) }));
SAXParser parser = factory.newSAXParser();
SAXReader reader = new SAXReader(parser.getXMLReader());
reader.setValidation(false);
reader.setErrorHandler(new ResultProducingErrorHandler());
reader.read(content);
Whenever I run the above code, I get an error along the lines of:
src-resolve: Cannot resolve the name 'ns:myStructure' to a(n) 'type definition' component.
The elements mentioned in the error messages are all ones that are imported into the schema via calls to <xs:import />. The schema seems to validate OK via the W3C XML Schema Validator.
Do I have to include each of these schema's individually or is Java smart enough to go off and fetch these extra schema's too? I tried adding them in the array passed to the newSchema call but that didn't make any difference.
I don't think I can give out the link to the schema, so I'm really just looking for a yes or no regarding if my code looks at least acceptable.
Ensure that the xs:import statements point to paths that are reachable from the current directory of your application. The current directory may not be what you think it is.

Categories