How to decode XML file without using XML Decoder?

How to decode XML file without using XML Decoder? - java

I have a XML file which was encoded using java.beans.XMLEncoder. I cannot use java.beans.XMLDecoder to decode it, as the class of encoded object is not present in my project. Is there a way to obtain values in xml without using java.beans.XMLDecoder, xmlDecoder.readObject() method ?

There are several APIs for parsing XML in Java which don't require you to deserialize an object. Once you have a DOM, XPath is a useful way to query its contents. You will need to know what you're looking for, though.

Related

How would you go about separating strings in a file if they may contain any arbitary characters?

I'm currently writing a Java program which involves goals. It's basically a to-do list. Each goal has a few strings, such as name, description etc. I can save and load these goals to a file. My issue was separating the strings - I couldn't think of a character that couldn't be in the string itself. I ended up prefixing each string with it's length and then a colon.
I'm sure there is something in the Java API that will handle this, like ObjectOutputStream. I'm curious about the 'general case', though. This must be an issue for any program that saves and loads strings from a file without being able to assume anything about the string. Is there a better way to go about this?

There are couple of ways to handle your case, e.g:
Encoding your String with something like base64
Applying a well defined format, e.g. JSON or CSV
There are tons of tools support you including:
Apache Commons codec for base64 encoding/decoding
Jaskson for JSON serializing/deserializing
opencsv for csv serializing/deserializing

In an XML document, is it possible to tell the difference between an entity-encoded character and one that is not?

I am being feed an XML document with metadata about online resources that I need to parse. Among the different metadata items are a collection of tags, which are comma-delimited. Here is an example:
<tags>Research skills, Searching, evaluating and referencing</tags>
The issue is that one of these "tags" contains a comma in it. The comma within the tag is encoded, but the commas intended to delimit tags are not. I am (currently) using the getText() method on org.dom4j.Node to read the text content of the <tags> element, which returns a String.
The problem is that I am not able -- as far as I'm aware -- to differentiate the encoded comma (from the ones that aren't encoded) in the String I receive.
Short of writing my own XML parser, is there another way to access the text content of this node in a more "raw" state? (viz. a state where the encoded comma is still encoded.)

When you use dom4j or DOM all the entities are already resolved, so you would need to go back to the parsing step to catch character references.
SAX is a more lowlevel interface and has support via its LexicalHandler interface to get notified when the parser encounters entity references, but it does not report character references. So it seems that you would really need to write an own parser, or patch an existing one.
But in the end it would be best if you can change the schema of your document:
<tags>
<tag>Research skills</tag>
<tag>Searching, evaluating and referencing</tag>
</tags>
In your current document character references are used to act as metadata. XML elements are a better way to express that.

Using LexEv from http://andrewjwelch.com/lexev/, putting xercesImpl.jar from Apache Xerces on the class path, I am able to compile and run some short sample using dom4j:
LexEv lexEv = new LexEv();
SAXReader reader = new SAXReader(lexEv);
Document doc = reader.read("input1.xml");
System.out.println(doc.getRootElement().asXML());
If the input1.xml has your sample XML snippet, then the output is
<tags xmlns:lexev="http://andrewjwelch.com/lexev">Research skills, Searching<lexev:char-ref name="#44">,</lexev:char-ref> evaluating and referencing</tags>
So that way you could get a representation of your input where a pure character and a character reference can be distinguished.

As far as I know, every XML processing frameworks (except vtd-xml) resolve entities during parsing....
you can only distinguish a character from its entity encoded counterpart using vtd-xml by using VTDNav's toRawString() method...

Special parsing XML with Simple XML

How to parse the XML fragment with SimpleXML in Android?
<txtList>
message <bold>message</bold> message
</txtList>
In fact, I do not know how to retrieve the value of an element that contains another element in this way?

You have to write a class with proper fields. Or - and thats what i recommend - you write a class that contains the values and a Transformer which "parses" the XML.
However, Simple XML ist not a XML-Parser, it's a Serializer. So you may have a look at e.g. JSoup, a very good HTML / XML Parser.

Escaping an xml string in java

I read elements with CDATA sections from a rss-feed which I need to convert to valid xml. The content in the CDATA section is mostly valid xhtml, but some times characters like ampersand appear in attributes (url's).
I can use .replaceAll("&", "&") to solve this but thinking a bit forward it may be that other invalid characters show up in attributes or text.
The CMS to which I'm importing the element, won't accept CDATA sections without setting up another configuration for the content, so my question is: is there any simple way to escape the string, only for attributes and text?
I'm using the jdom library to manipulate the xml after the import.
Edit: I've checked out apache's StringEscapeUtils, but this is escaping the whole string. I need something that will only escape attribute values and text inside elements.

Apache Commons provides handy functions for this: StringEscapeUtils

When you use JDOM it will automatically correctly escape ay content that needs it. Is your CMS loaded with the output of JDOM, or are you using some other library to populate the CMS...?
In essence, if you have valid XML input, and you use JDOM (something from org.jdom2.output.*) to output the data, then you will always have good output.... so, what are you doing to have broken output?
Rolf

Validate using one element of XSD in java

Is there any way to use an xsd file to validate input of a string?
I have found some examples of xsd being used to validate an xml file, but what I really want is to just use one element of the xsd to validate some user input.
Is there a simple way to do this or should I just treat the xsd file as an xml file, extract the element and compare it to the given string to see if it's valid?
Thanks

If I'm understanding your question correctly, you typically use jaxb along with an xsd(schema) to validate an xml file not the contents of a node in an xml file. You may be better off using xpath to parse the xml file and get the contents of the particular node and then do your comparison that way.
Here is a link to one of the jaxb tutorials and a linkt to an XPATH tutorial.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to decode XML file without using XML Decoder? - java

I have a XML file which was encoded using java.beans.XMLEncoder. I cannot use java.beans.XMLDecoder to decode it, as the class of encoded object is not present in my project. Is there a way to obtain values in xml without using java.beans.XMLDecoder, xmlDecoder.readObject() method ?

There are several APIs for parsing XML in Java which don't require you to deserialize an object. Once you have a DOM, XPath is a useful way to query its contents. You will need to know what you're looking for, though.

Related

How would you go about separating strings in a file if they may contain any arbitary characters?

In an XML document, is it possible to tell the difference between an entity-encoded character and one that is not?

Special parsing XML with Simple XML

Escaping an xml string in java

Validate using one element of XSD in java

Categories

Resources