Validate using one element of XSD in java - java

Is there any way to use an xsd file to validate input of a string?
I have found some examples of xsd being used to validate an xml file, but what I really want is to just use one element of the xsd to validate some user input.
Is there a simple way to do this or should I just treat the xsd file as an xml file, extract the element and compare it to the given string to see if it's valid?
Thanks

If I'm understanding your question correctly, you typically use jaxb along with an xsd(schema) to validate an xml file not the contents of a node in an xml file. You may be better off using xpath to parse the xml file and get the contents of the particular node and then do your comparison that way.
Here is a link to one of the jaxb tutorials and a linkt to an XPATH tutorial.

Related

How to parse a text file based on the modified BNF form specified in config file, in Java?

First of all, I am new here, so sorry guys if I do something wrong.
I need to write a parser of text in Java, which parses the input text file based on the modified BNF form specified in the configuration file.
I CAN'T use any of the parsing libraries and libraries for BNF forms.
Basically, the only thing I can use is Regex.
But, for me, the biggest problem is how to read that configuration file so I can then use that modified BNF form inside of it. That must be done so it would work for any configuration file so I cannot hardcode for one example.
How should I do that? Any help is appreciated.
This is a simple example of how should that config file look like:
<a> ::= regex(^[\w-\.]+#([\w-]+\.)+[\w-]{2,4}$)
a represents any expression that matches the specified regex
My output should be an XML file representing the parsing tree of the input text file based on the specified modified BNF form.. So basically any valid email address should be matched with this regex above and then I should write that to XML file..

In an XML document, is it possible to tell the difference between an entity-encoded character and one that is not?

I am being feed an XML document with metadata about online resources that I need to parse. Among the different metadata items are a collection of tags, which are comma-delimited. Here is an example:
<tags>Research skills, Searching, evaluating and referencing</tags>
The issue is that one of these "tags" contains a comma in it. The comma within the tag is encoded, but the commas intended to delimit tags are not. I am (currently) using the getText() method on org.dom4j.Node to read the text content of the <tags> element, which returns a String.
The problem is that I am not able -- as far as I'm aware -- to differentiate the encoded comma (from the ones that aren't encoded) in the String I receive.
Short of writing my own XML parser, is there another way to access the text content of this node in a more "raw" state? (viz. a state where the encoded comma is still encoded.)
When you use dom4j or DOM all the entities are already resolved, so you would need to go back to the parsing step to catch character references.
SAX is a more lowlevel interface and has support via its LexicalHandler interface to get notified when the parser encounters entity references, but it does not report character references. So it seems that you would really need to write an own parser, or patch an existing one.
But in the end it would be best if you can change the schema of your document:
<tags>
<tag>Research skills</tag>
<tag>Searching, evaluating and referencing</tag>
</tags>
In your current document character references are used to act as metadata. XML elements are a better way to express that.
Using LexEv from http://andrewjwelch.com/lexev/, putting xercesImpl.jar from Apache Xerces on the class path, I am able to compile and run some short sample using dom4j:
LexEv lexEv = new LexEv();
SAXReader reader = new SAXReader(lexEv);
Document doc = reader.read("input1.xml");
System.out.println(doc.getRootElement().asXML());
If the input1.xml has your sample XML snippet, then the output is
<tags xmlns:lexev="http://andrewjwelch.com/lexev">Research skills, Searching<lexev:char-ref name="#44">,</lexev:char-ref> evaluating and referencing</tags>
So that way you could get a representation of your input where a pure character and a character reference can be distinguished.
As far as I know, every XML processing frameworks (except vtd-xml) resolve entities during parsing....
you can only distinguish a character from its entity encoded counterpart using vtd-xml by using VTDNav's toRawString() method...

Formatting invalid XML into a pretty format

Let's say there were errors in an XML message:
Well-Formed
<Person><Name>Attila</Name><ID>001</ID><Age>45</Age></Person>
Not Well-Formed
<Pxxxon><Name>Attila</9327><ID>001</ID><Age>45</Age></Person>
Are there any Java libraries or code to format the non-well formed XML message to:
<Pxxxon>
<Name>Attila</9327>
<ID>001</ID>
<Age>45</Age>
</Person>
I understand that current Java libraries only format valid XML messages to this prettified format.
No, because what you list as "Invalid" is actually not well-formed.
Well-formed and valid are not the same thing.
Well-formed means that a textual object meets the W3C requirements for being XML.
Valid means that well-formed XML meets additional requirements given by a specified schema.
See Well-formed vs Valid XML for more details, but if data is not well-formed, it is not XML at all and no XML parser will be able to read it to reformat it.
You might then ask what about non-XML parsers? To which we would reply, if it's not XML, what format is it? For any parser to be able to read any data, the syntax of the data has to be defined. Simply saying that the data resembles XML insufficiently specifies the format, and that is why you'll not find a tool that can pretty-print the data sample you provided.

How to decode XML file without using XML Decoder?

I have a XML file which was encoded using java.beans.XMLEncoder. I cannot use java.beans.XMLDecoder to decode it, as the class of encoded object is not present in my project. Is there a way to obtain values in xml without using java.beans.XMLDecoder, xmlDecoder.readObject() method ?
There are several APIs for parsing XML in Java which don't require you to deserialize an object. Once you have a DOM, XPath is a useful way to query its contents. You will need to know what you're looking for, though.

Can I validate data type,case,length and empty check for XML elements against XSD?

I've a requirement to validate the XML file. The XML element should pass all the below conditions below.
Data type
Case
Length
Empty
Can all the above be done against XSD? I'm trying to validate this in Java API.
Yeah you can do that. XSD provides data types, like string, int and all.
For case and empty, XSD provides an option to validate with regular expression, you can use that to adhere to defined expression.
Length also can easily be set in XSD.

Categories