Solve security issue parsing xml using SAX parser - java

I have an android app, in which user can enter any xml source url to parse. My app then parses the xml(if valid) and displays results.
The issue is, if the user enters an untrusted xml source url, the app and/or the device might be effected.
What are the best ways to identify risk and prevent exploit.
With my research I found that enabling FEATURE_SECURE_PROCESSING and disabling expansion might help. But can anyone tell me what it is, and how do I achieve it.
Thanks.

After researching, I found this. I hope this would solve my problem.
To enable FEATURE_SECURE_PROCESSING
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
Disable DTDs
spf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);

For SAX and DOM parsers, disallowing DTD should be sufficient as dcanh121 noted.
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
For StAX parser:
factory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);

Related

XSL - Exclude access to ACCESS_EXTERNAL_STYLESHEET

I'm looking for informations and a solution regarding an issue that I have with my implementation of SAXParser to perform XSL Transformation.
In order to improve the quality of our project, the sonarqube sensitivity has been rised. Then a new error appearred for my implementation.
Sonarqube is asking me to set properties to empty value in order to exclude the possibilities of an attack based on those values.
Problem, if I can set the property for ACCESS_EXTERNAL_DTD and ACCESS_EXTERNAL_SCHEMA to empty correctly, the property ACCESS_EXTERNAL_STYLESHEET seems to not be a valid property for SAXParser. And without it set correctly, sonarqube doesn't remove the blocker error as it seems mandatory for XSL Transformation.
SAXParser saxParser = saxParserFactory.newSAXParser();
saxParser.setProperty(XMLConstants.ACCESS_EXTERNAL_DTD, ""); // Work
saxParser.setProperty(XMLConstants.ACCESS_EXTERNAL_SCHEMA, ""); // Work
saxParser.setProperty(XMLConstants.ACCESS_EXTERNAL_STYLESHEET, ""); // Doesn't work and throw org.xml.sax.SAXNotRecognizedException
What should I do ?
I'm under Saxon-HE:9.8.0-8
Thank you in advance

jsoup to w3c-document: INVALID_CHARACTER_ERR

My usecase: Get html-pages by jsoup and returns a w3c-DOM for further processing by XML-transformations:
...
org.jsoup.nodes.Document document = connection.get();
org.w3c.dom.Document dom = new W3CDom().fromJsoup(document);
...
Works well for most documents but for some it throws INVALID_CHARACTER_ERR without telling where.
It seems extremely difficult to find the error. I changed the code to first import the url to a String and then checking for bad characters by regexp. But that does not help for bad attributes (eg. without value) etc.
My current solution is to minimize the risk by removing elements by tag in the jsoup-document (head, img, script ...).
Is there a more elegant solution?
Try setting the outputSettings to 'XML' for your document:
document
.outputSettings()
.syntax(OutputSettings.Syntax.xml);
document
.outputSettings()
.charset("UTF-8");
This should ensure that the resulting XML is valid.
Solution found by OP in reply to nyname00:
Thank you very much; this solved the problem:
Whitelist whiteList = Whitelist.relaxed();
Cleaner cleaner = new Cleaner(whiteList);
jsoupDom = cleaner.clean(jsoupDom);
"relaxed" in deed means relaxed developer...

JAXB generates unformatted XML using XMLEventWriter

I'm using JAXB in order to generate XML files, and due to a business need I'm currently writing it to the middle of some other XML file using XMLEventWriter:
marshaller.marshal(jaxbElement, xmlEventWriter);
And currently setting some properties like:
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
marshaller.setProperty(Marshaller.JAXB_ENCODING, "utf-8");
marshaller.setProperty(Marshaller.JAXB_FRAGMENT, true);
But, besides having the JAXB_FORMATTED_OUTPUT set to true, my XML is not being formatted.
Does anyone knows what may be the problem?
This only happens when I use the XMLEventWriter...
Thanks in advance.
When you are using an XMLEventWriter as a sink, the JAXB marshaller is only in charge of sending the appropriate XML events to it and the XMLEventWriter may still choose to write out unformatted XML. My advice is to check the configuration of your XMLEventWriter in addition to Marshaller.
Unfortunately, the default XmlEventWriter implementation does not indent. The stax-utils library provides a IndentingXMLEventWriter which might help in these cases.

why can't I load this URL with JDOM? Browser spoofing?

I'm writing some code to load and parse HTML docs from the web.
I'm using JDOM like so:
SAXBuilder parser = new SAXBuilder();
Document document = (Document)parser.build("http://www.google.com");
Element rootNode = document.getRootElement();
/* and so on ...*/
It works fine like that. However, when I change the URL to some other web sites, like "http://www.kijiji.com", for example, the parser.build(...) line hangs.
Any idea why it hangs? I'm wondernig if it might be because kijiji knows I'm not a "real" web browser -- perhaps I have to spoof my http request so it looks like it's coming from IE or something like that?
Any ideas are useful, thanks!
Rob
I think a few things may be going on here. The firdt issue is that you cannot parse regular HTML with JDOM, HTML is not XML....
Secondly, when I run kijiji.com through JDOM I get an immediate HTTP_400 response
When I parse google.com I get an immediate XML error about well-formedness.
If you happen to be parsing xhtml at some point though, you will likely run in to this problem here: http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/
XHTML has a doctype that references other doctypes, etc. Thes each take 30 seconds to load from w3c.org....

Validating a HUGE XML file

I'm trying to find a way to validate a large XML file against an XSD. I saw the question ...best way to validate an XML... but the answers all pointed to using the Xerces library for validation. The only problem is, when I use that library to validate a 180 MB file then I get an OutOfMemoryException.
Are there any other tools,libraries, strategies for validating a larger than normal XML file?
EDIT: The SAX solution worked for java validation, but the other two suggestions for the libxml tool were very helpful as well for validation outside of java.
Instead of using a DOMParser, use a SAXParser. This reads from an input stream or reader so you can keep the XML on disk instead of loading it all into memory.
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);
SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource(new FileReader ("document.xml")));
Use libxml, which performs validation and has a streaming mode.
Personally I like to use XMLStarlet which has a command line interface, and works on streams. It is a set of tools built on Libxml2.
SAX and libXML will help, as already mentioned. You could also try increasing the maximum heap size for the JVM using the -Xmx option. E.g. to set the maximum heap size to 512MB: java -Xmx512m com.foo.MyClass

Categories