Get schema location from XML file (noNamespaceSchemaLocation) - java

We are parsing an XML file with the SAX parser. Is it possible to get the schema location from the XML?
<view id="..." title="..."
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="{schema}">
I want to retrieve the {schema} value from the XML. Is this possible? And how to I access this value of noNamespaceSchemaLocation? I'm using the default SAX Parser.
#Override
public void startElement(String uri, String localName,
String name, Attributes attributes)
{ .... }
Thank you.

It all depends with what kind of tool/library you are working (a basic SAXParser? Xerces? JDom? ...) But what you want is the value of the attribute "noNamespaceSchemaLocation" in the namspace defined by the URI "http://www.w3.org/2001/XMLSchema-instance"
in JDom, it would be something like:
Element view = ...; // get the view element
String value = view.getAttributeValue("noNamespaceSchemaLocation", Namespace.getNamespace("http://www.w3.org/2001/XMLSchema-instance"));

Here is how I get the XSD's name using XMLStreamReader:
public static String extractXsdValueOrNull(#NonNull final InputStream xmlInput)
{
final XMLInputFactory f = XMLInputFactory.newInstance();
try
{
final XMLStreamReader r = f.createXMLStreamReader(xmlInput);
while (r.hasNext())
{
final int eventType = r.next();
if (XMLStreamReader.START_ELEMENT == eventType)
{
for (int i = 0; i <= r.getAttributeCount(); i++)
{
final boolean foundSchemaNameSpace = XMLConstants.W3C_XML_SCHEMA_INSTANCE_NS_URI.equals(r.getAttributeNamespace(i));
final boolean foundLocationAttributeName = SCHEMA_LOCATION.equals(r.getAttributeLocalName(i));
if (foundSchemaNameSpace && foundLocationAttributeName)
{
return r.getAttributeValue(i);
}
}
return null; // only checked the first element
}
}
return null;
}
catch (final XMLStreamException e)
{
throw new RuntimeException(e);
}
}
Actually XMLStreamReader does all the magic, namely:
only parses the XML's beginning (not the whole XML)
does not assume a particular namespace alias (i.e. xsi)

Related

How to improve performance of querying xml file with VTD-XML and XPath?

I am querying XML files with size of around 1 MB(20k+ lines). I am using XPath to describe what I want to get and VTD-XML library to get it. I think that I have some problems with performance.
The problem is, I am making about 5k+ queries to XML file. It takes approximately 16-17 seconds to retrieve all values. I want to ask you, if this is normal performance for such task? How I can improve it?
I am using VTD-XML library with AutoPilot navigation approach which give me opportunity to use XPath. Implementation is as following:
private VTDGen vg = new VTDGen();
private VTDNav vn;
private AutoPilot ap = new AutoPilot();
public void init(String xml) {
log.info("Creating document");
xml = xml.replace("<?xml version=\"1.0\"?>", "<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
byte[] bytes = xml.getBytes(StandardCharsets.UTF_8);
vg.setDoc(bytes);
try {
vg.parse(true);
vn = vg.getNav();
} catch (ParseException e) {
e.printStackTrace();
}
log.info("Document created");
}
public String parseXmlOrReturnNull(String query) {
String xPathStringVal = null;
try {
ap.selectXPath(query);
ap.bind(vn);
int i = -1;
while ((i = ap.evalXPath()) != -1) {
xPathStringVal = vn.getXPathStringVal();
}
}catch (XPathEvalException e) {
e.printStackTrace();
} catch (NavException e) {
e.printStackTrace();
} catch (XPathParseException e) {
e.printStackTrace();
}
return xPathStringVal;
}
My xml files have specific format, they are divided into lot of parts - segments, and my queries are same for all segments(I am querying it in a loop). For example part of xml:
<segment>
<a>
<b>value1</b>
<c>
<d>value2</d>
<e>value3</d>
</c>
</a>
</segment>
<segment>
<a>
<b>value4</b>
<c>
<d>value5</d>
<e>value6</d>
<f>value6</d>
</c>
</a>
</segment>
...
If I want to get value1 in first segment I am using query:
//segment[1]/a/b
for value 4 in second segment
//segment[2]/a/b
etc.
Intuition says a few things: in my approach every query is independent (it doesn't know anything about other query), it means that AutoPilot, my iterator, always starts at the beginning of the file when I want to query it.
My question is: Is there any way to set AutoPilot at the beginning of processing segment? And when I finish querying move AutoPilot to next segment? I think that if my method will start searching value not from the beginning but from specifying point It will be much faster.
Another way is to divide xml file into small xml files (one xml file = one segment) and querying those small xml files.
What do you think guys? Thanks in advance
Minor: The replace is not needed as UTF-8 is the default encoding; only when there is an encoding, one would need to patch it to UTF-8.
The XPath should only done once, to not start from [0] to the next index.
If you need a List representation you could use JAXB with annotations.
An event based primitive parsing without DOM object probably is best (SAXParser).
Handler handler = new org.xml.sax.helpers.DefaultHandler {
#Override
public void startElement(String uri,
String localName, String qName, Attributes attributes) throws SAXException {
}
#Override
public void endElement(String uri,
String localName, String qName) throws SAXException {
}
#Override
public void characters(char ch[], int start, int length) throws SAXException {
}
};
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
InputStream in = new ByteArrayInputStream(bytes);
parser.parse(in, handler);

How to get Embeded/nested XML from a big XML file using SAX parser

We are performing some operations on embedded/Nested XML.I am using SAXParser to parse the entire XML file.I want to get the entire nested XML with tags and value.For example my XML looks like.
I want entire XML within the <ANY_ELEMENT>.....</ANY-ELEMENT> tag.
<?xml version="1.0" encoding="UTF-8"?>
<x:xMessage xmlns:x="http://www.connecture.com/integration/x" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.connecture.com/integration/x xMessageWrapper.xsd
">
<x:xMessageHeader>
<Version>850</Version>
<Source>Source</Source>
<Target>target</Target>
<Timestamp>2013-12-31T12:00:00</Timestamp>
<RequestID>123456</RequestID>
<ResponseID>54321</ResponseID>
<Priority>3</Priority>
<Username>Deepak</Username>
<Password>Kumar</Password>
</x:xMessageHeader>
<x:xMessageBody>
<ANY-ELEMENT>
<xEnveloped_834A1 xsi:schemaLocation="....." xmlns="......."
..........................
..........................
some Complex XML
..........................
..........................
..........................
</ANY-ELEMENT>
</x:XMessageBody>
</x:XMessage>
Handler class Sample code:
public class MessageWrapperHandler extends DefaultHandler {
private boolean bActualMessage = false;
private String actualMessage = null;
private long lengthActualMessage=0;
public void startElement(String uri, String localName, String qName, Attributes attributes) {
if (qName.equalsIgnoreCase("ANY-ELEMENT")) {
bActualMessage = true;
//lengthActualMessage=How to know the length of Child XML
}
}
public void characters(char ch[], int start, int length) {
if (bActualMessage) {
actualMessage = new String(ch, start, length);
//trying to get embedded XML
bActualMessage = false;
}
}
}
But since next element after is XML content so giving me nothing.SO How to achieve it.
EDIT: You are free to modify XML after <ANY-ELEMENT> like adding contents into CDATA
Instead of SAX, I would recommend using StAX (a StAX implementation is included in the JDK/JRE since Java SE 6). StAX is similar to SAX except instead of having the events pushed to you, you pull (request) them.
In the code below the XMLStreamReader is advanced to the ANY-ELEMENT element. Once it is at the correct position you can interact with it as you wish.
import javax.xml.stream.*;
import javax.xml.transform.stream.StreamSource;
public class Demo {
public static void main(String[] args) throws Exception {
XMLInputFactory xif = XMLInputFactory.newFactory();
StreamSource xmlSource = new StreamSource("src/forum19559825/input.xml");
XMLStreamReader xsr = xif.createXMLStreamReader(xmlSource);
Demo demo = new Demo();
demo.positionXMLStreamReaderAtAnyElement(xsr);
demo.processAnyElement(xsr);
}
private void positionXMLStreamReaderAtAnyElement(XMLStreamReader xsr) throws Exception {
while(xsr.hasNext()) {
if(xsr.getEventType() == XMLStreamReader.START_ELEMENT && "ANY-ELEMENT".equals(xsr.getLocalName())) {
break;
}
xsr.next();
}
}
private void processAnyElement(XMLStreamReader xmlStreamReaderAtAnyElement) {
// TODO: Stuff
System.out.println("FOUND IT");
}
}

xml parsing using SAXParser

I am working with one application in which SAXparsing is placed. To get the City & State name from latitude and longitude I'm using Google API. Google API url google api
I want to get long_name short_name & type of header Tag address_component .
All the information I am getting successfully from this XML but problem is that when I am trying to get type Tag value . There are Two type tag in this header and I am always getting second type tag value .
Sample XML:
<address_component>
<long_name>Gujarat</long_name>
<short_name>Gujarat</short_name>
<type>administrative_area_level_1</type>
<type>political</type>
</address_component>
How can I get type Tag value is administrative_area_level_1 as well as political?
I came across the following link which is really easy to give a start-
http://javarevisited.blogspot.com/2011/12/parse-read-xml-file-java-sax-parser.html
I add the data into one file named as location.xml(if you get this from web do your own logic for getting data after getting that data convert into Inputstream pass it to following code) i wrote a method in that you can get it
public void ReadAndWriteXMLFileUsingSAXParser(){
try
{
DefaultHandler handler = new MyHandler();
// parseXmlFile("infilename.xml", handler, true);
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
InputStream rStream = null;
rStream = getClass().getResourceAsStream("location.xml");
saxParser.parse(rStream, handler);
}catch (Exception e)
{
System.out.println(e.getMessage());
}
}
This is MyHandler class. your final data stored into one vector called as "data"
class MyHandler extends DefaultHandler {
String rootname;Attributes atr;
private boolean flag=false;private Vector data;
public void startElement(String namespaceURI, String localName,
String qName, Attributes atts) {
rootname=localName;
atr=atts;
if(rootname.equalsIgnoreCase("address_component")){
data=new Vector();
flag=true;
}
}
public void characters(char[] ch, int start, int length){
String value=new String(ch,start,length);
if(flag)
{
if(rootname.equalsIgnoreCase("type")){
data.addElement(value) ;
System.out.println("++++++++++++++"+value);
}
if(rootname.equalsIgnoreCase("long_name")){
data.addElement(value) ;
System.out.println("++++++++++++++"+value);
}
if(rootname.equalsIgnoreCase("short_name")){
data.addElement(value) ;
System.out.println("++++++++++++++"+value);
}
}
}
public void endElement(String uri, String localName, String qName){
rootname=localName;
if(rootname.equalsIgnoreCase("address_component")){
flag=false;
}
}
}
you can find all data into the data vector and also you can find the data onconsole
as
++++++++++++++Gujarat
++++++++++++++Gujarat
++++++++++++++administrative_area_level_1
++++++++++++++political
Read this tutorial. This will help you to parse xml file using sax parser.

How can we parse the DOCTYPE information using XMLEventReader?

I have some existing code which parses the top-level element namespace to determine what kind of XML file we're looking at.
XMLEventReader reader = createXMLEventReader(...);
try {
while (reader.hasNext()) {
XMLEvent event = reader.nextEvent();
switch (event.getEventType()) {
case XMLStreamConstants.DTD:
// No particularly useful information here?
//((DTD) event).getDocumentTypeDeclaraion();
break;
case XMLStreamConstants.START_ELEMENT:
formatInfo.qName = ((StartElement) event).getName();
return formatInfo;
default:
break;
}
}
} finally {
reader.close();
}
If I allow the parser to load DTDs from the web, getDocumentTypeDeclaraion() contains a gigantic string with way more information than I know how to deal with, as it inserts all related DTDs into the string before handing it over. On the other hand, if I block the parser loading DTDs from the web (which is preferable anyway, for obvious reasons), it only gives me the string, "<!DOCTYPE".
Is there no way to get back the values inside the DOCTYPE?
I'm using the default parser which ships with the JRE, in case that matters.
I know it's an old post but I couldn't find an answer on the Web until I've found your question which pointed me in the right direction.
Here the external unparsed entities for a DTD are retrieved by switching on the value given by the XMLEvent#getEventType() method.
XMLInputFactory factory = XMLInputFactory.newInstance();
factory.setXMLResolver(new XMLResolver() {
#Override
public Object resolveEntity(String publicID, String systemID,
String baseURI, String namespace) throws XMLStreamException {
//return a closed input stream if external entities are not needed
return new InputStream() {
#Override
public int read() throws IOException {
return -1;
}
};
}
});
XMLEventReader reader = factory.createXMLEventReader( . . . );
try {
while(reader.hasNext()) {
XMLEvent event = reader.nextEvent();
switch (event.getEventType()) {
case XMLStreamConstants.DTD:
List<EntityDeclaration> entities = ((DTD)event).getEntities();
if (entities != null) {
for (EntityDeclaration entity : entities)
System.out.println(entity.getName() + " = " + entity.getSystemId());
}
break;
case . . .
}
}
} finally {
reader.close();
}

How to get an attribute from an XMLReader

I have some HTML that I'm converting to a Spanned using Html.fromHtml(...), and I have a custom tag that I'm using in it:
<customtag id="1234">
So I've implemented a TagHandler to handle this custom tag, like so:
public void handleTag( boolean opening, String tag, Editable output, XMLReader xmlReader ) {
if ( tag.equalsIgnoreCase( "customtag" ) ) {
String id = xmlReader.getProperty( "id" ).toString();
}
}
In this case I get a SAX exception, as I believe the "id" field is actually an attribute, not a property. However, there isn't a getAttribute() method for XMLReader. So my question is, how do I get the value of the "id" field using this XMLReader? Thanks.
Here is my code to get the private attributes of the xmlReader by reflection:
Field elementField = xmlReader.getClass().getDeclaredField("theNewElement");
elementField.setAccessible(true);
Object element = elementField.get(xmlReader);
Field attsField = element.getClass().getDeclaredField("theAtts");
attsField.setAccessible(true);
Object atts = attsField.get(element);
Field dataField = atts.getClass().getDeclaredField("data");
dataField.setAccessible(true);
String[] data = (String[])dataField.get(atts);
Field lengthField = atts.getClass().getDeclaredField("length");
lengthField.setAccessible(true);
int len = (Integer)lengthField.get(atts);
String myAttributeA = null;
String myAttributeB = null;
for(int i = 0; i < len; i++) {
if("attrA".equals(data[i * 5 + 1])) {
myAttributeA = data[i * 5 + 4];
} else if("attrB".equals(data[i * 5 + 1])) {
myAttributeB = data[i * 5 + 4];
}
}
Note you could put the values into a map but for my usage that's too much overhead.
Based on the answer by rekire I made this slightly more robust solution that will handle any tag.
private TagHandler tagHandler = new TagHandler() {
final HashMap<String, String> attributes = new HashMap<String, String>();
private void processAttributes(final XMLReader xmlReader) {
try {
Field elementField = xmlReader.getClass().getDeclaredField("theNewElement");
elementField.setAccessible(true);
Object element = elementField.get(xmlReader);
Field attsField = element.getClass().getDeclaredField("theAtts");
attsField.setAccessible(true);
Object atts = attsField.get(element);
Field dataField = atts.getClass().getDeclaredField("data");
dataField.setAccessible(true);
String[] data = (String[])dataField.get(atts);
Field lengthField = atts.getClass().getDeclaredField("length");
lengthField.setAccessible(true);
int len = (Integer)lengthField.get(atts);
/**
* MSH: Look for supported attributes and add to hash map.
* This is as tight as things can get :)
* The data index is "just" where the keys and values are stored.
*/
for(int i = 0; i < len; i++)
attributes.put(data[i * 5 + 1], data[i * 5 + 4]);
}
catch (Exception e) {
Log.d(TAG, "Exception: " + e);
}
}
...
And inside handleTag do:
#Override
public void handleTag(boolean opening, String tag, Editable output, XMLReader xmlReader) {
processAttributes(xmlReader);
...
And then the attributes will be accessible as so:
attributes.get("my attribute name");
It is possible to use XmlReader provided by TagHandler and get access to tag attribute values without reflection, but that method is even less straightforward than reflection. The trick is to replace ContentHandler used by XmlReader with custom object. Replacing ContentHandler can only be done in the call to handleTag(). That presents a problem getting attribute values for the first tag, which can be solved by adding a custom tag at the start of html.
import android.text.Editable;
import android.text.Html;
import android.text.Spanned;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import java.util.ArrayDeque;
public class HtmlParser implements Html.TagHandler, ContentHandler
{
public interface TagHandler
{
boolean handleTag(boolean opening, String tag, Editable output, Attributes attributes);
}
public static Spanned buildSpannedText(String html, TagHandler handler)
{
// add a tag at the start that is not handled by default,
// allowing custom tag handler to replace xmlReader contentHandler
return Html.fromHtml("<inject/>" + html, null, new HtmlParser(handler));
}
public static String getValue(Attributes attributes, String name)
{
for (int i = 0, n = attributes.getLength(); i < n; i++)
{
if (name.equals(attributes.getLocalName(i)))
return attributes.getValue(i);
}
return null;
}
private final TagHandler handler;
private ContentHandler wrapped;
private Editable text;
private ArrayDeque<Boolean> tagStatus = new ArrayDeque<>();
private HtmlParser(TagHandler handler)
{
this.handler = handler;
}
#Override
public void handleTag(boolean opening, String tag, Editable output, XMLReader xmlReader)
{
if (wrapped == null)
{
// record result object
text = output;
// record current content handler
wrapped = xmlReader.getContentHandler();
// replace content handler with our own that forwards to calls to original when needed
xmlReader.setContentHandler(this);
// handle endElement() callback for <inject/> tag
tagStatus.addLast(Boolean.FALSE);
}
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes)
throws SAXException
{
boolean isHandled = handler.handleTag(true, localName, text, attributes);
tagStatus.addLast(isHandled);
if (!isHandled)
wrapped.startElement(uri, localName, qName, attributes);
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException
{
if (!tagStatus.removeLast())
wrapped.endElement(uri, localName, qName);
handler.handleTag(false, localName, text, null);
}
#Override
public void setDocumentLocator(Locator locator)
{
wrapped.setDocumentLocator(locator);
}
#Override
public void startDocument() throws SAXException
{
wrapped.startDocument();
}
#Override
public void endDocument() throws SAXException
{
wrapped.endDocument();
}
#Override
public void startPrefixMapping(String prefix, String uri) throws SAXException
{
wrapped.startPrefixMapping(prefix, uri);
}
#Override
public void endPrefixMapping(String prefix) throws SAXException
{
wrapped.endPrefixMapping(prefix);
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException
{
wrapped.characters(ch, start, length);
}
#Override
public void ignorableWhitespace(char[] ch, int start, int length) throws SAXException
{
wrapped.ignorableWhitespace(ch, start, length);
}
#Override
public void processingInstruction(String target, String data) throws SAXException
{
wrapped.processingInstruction(target, data);
}
#Override
public void skippedEntity(String name) throws SAXException
{
wrapped.skippedEntity(name);
}
}
With this class reading attributes is easy:
HtmlParser.buildSpannedText("<x id=1 value=a>test<x id=2 value=b>", new HtmlParser.TagHandler()
{
#Override
public boolean handleTag(boolean opening, String tag, Editable output, Attributes attributes)
{
if (opening && tag.equals("x"))
{
String id = HtmlParser.getValue(attributes, "id");
String value = HtmlParser.getValue(attributes, "value");
}
return false;
}
});
This approach has the advantage that it allows to disable processing of some tags while using default processing for others, e.g. you can make sure that ImageSpan objects are not created:
Spanned result = HtmlParser.buildSpannedText("<b><img src=nothing>test</b><img src=zilch>",
new HtmlParser.TagHandler()
{
#Override
public boolean handleTag(boolean opening, String tag, Editable output, Attributes attributes)
{
// return true here to indicate that this tag was handled and
// should not be processed further
return tag.equals("img");
}
});
There's an alternative to the other solutions, that doesn't allow you to use custom tags, but has the same effect:
<string name="foobar">blah <annotation customTag="1234">inside blah</annotation> more blah</string>
Then read it like this:
CharSequence annotatedText = context.getText(R.string.foobar);
// wrap, because getText returns a SpannedString, which is not mutable
CharSequence processedText = replaceCustomTags(new SpannableStringBuilder(annotatedText));
public static <T extends Spannable> T replaceCustomTags(T text) {
Annotation[] annotations = text.getSpans(0, text.length(), Annotation.class);
for (Annotation a : annotations) {
String attrName = a.getKey();
if ("customTag".equals(attrName)) {
String attrValue = a.getValue();
int contentStart = text.getSpanStart(a);
int contentEnd = text.getSpanEnd(a);
int contentFlags = text.getSpanFlags(a);
Object newFormat1 = new StyleSpan(Typeface.BOLD);
Object newFormat2 = new ForegroundColorSpan(Color.RED);
text.setSpan(newFormat1, contentStart, contentEnd, contentFlags);
text.setSpan(newFormat2, contentStart, contentEnd, contentFlags);
text.removeSpan(a);
}
}
return text;
}
Depending on what you wanted to do with your custom tags, the above may help you. If you just want to read them, you don't need a SpannableStringBuilder, just cast getText to Spanned interface to investigate.
Note that Annotation representing <annotation foo="bar">...</annotation> is an Android built-in since API level 1! It's one of those hidden gems again. The It has the limitation of one attribute per <annotation> tag, but nothing prevents you from nesting multiple annotations to achieve multiple attributes:
<string name="gold_admin_user"><annotation user="admin"><annotation rank="gold">$$username$$</annotation></annotation></string>
If you use the Editable interface instead of Spannable you can also modify the content around each annotation. For example changing the above code:
String attrValue = a.getValue();
text.insert(text.getSpanStart(a), attrValue);
text.insert(text.getSpanStart(a) + attrValue.length(), " ");
int contentStart = text.getSpanStart(a);
will result as if you had this in the XML:
blah <b><font color="#ff0000">1234 inside blah</font></b> more blah
One caveat to look out for is when you make modifications that affect the length of the text, the spans move around. Make sure you read the span start/end indices at the correct times, best if you inline them to the method calls.
Editable also allows you to do simple search and replace substitution:
index = TextUtils.indexOf(text, needle); // for example $$username$$ above
text.replace(index, index + needle.length(), replacement);
If all you need is just one attribute the suggestion by vorrtex is actually pretty solid. To give you an example of just how simple it would be to handle have a look here:
<xml>Click on <user1>Johnni<user1> or <user2>Jenny<user2> to see...</<xml>
And in your custom TagHandler you don't use equals but indexOf
final static String USER = "user";
if(tag.indexOf(USER) == 0) {
// Extract tag postfix.
String postfix = tag.substring(USER.length());
Log.d(TAG, "postfix: " + postfix);
}
And you can then pass the postfix value in your onClick view parameter as a tag to keep it generic.

Categories