Validating an XML NCName in Java

Validating an XML NCName in Java - java

I'm getting some values from Java annotations in an annotation processor to generate metadata. Some of these values are supposed to indicate XML element or attribute names. I'd like to validate the input to find out if the provided values are actually legal NCNames according to the XML specification. Only the local name is important in this case, the namespace URI doesn't play a part here.
Is there some simple way of finding out if a string is a legal XML element or attribute name? Preferably I'd use some XML API that is readily available in Java SE. One of the reasons I'm doing this stuff in the first place is to cut back on dependencies. I'm using JDK 7 so I have access to the most up-to-date classes/methods.
So far, browsing through content handler classes and SAX/DOM stuff hasn't yielded any result.

If you're prepared to have Saxon on your class path you can do
new Name10Checker().isValidNCName(s);
I can't see anything simpler in the public JDK interface.

didn't find anything straightforward in any of the jdk 6 APIs (don't know about jdk 7). a quick but possibly "hackish" way to check would be to convert it to an xml doc and see if it parses:
String name = ...;
if(name.contains(">")) {
return false;
}
String xmlDoc = "<" + name + "/>";
DocumentBuilder db = ...;
db.parse(new InputSource(new StringReader(xmlDoc)));

I ran into the same problem and found lots of implementations in foss libraries, and even an old implementation in a Java class library, which has been removed ages ago... So here's a few options to choose from:
Java Class Library: XMLUtils.isValidNCName(String ncName) (note: removed in 2004)
Apache Axis: NCName.isValid(String stValue)
Saxonica: NameChecker.isValidNCName(CharSequence ncName)
OWL API: XMLUtils.isNCName(java.lang.CharSequence s)
Validator.nu HTML Parser: NCName.isNCName(java.lang.String str)
So, if you're using one of these libraries anyway, you're fine.
As I am not, I'll go with a copy of the XMLUtils from the OWL API, which has no external dependencies, is available under non-restrictive licenses (LGPL and Apache 2.0) and consists of nice and clean code.

Related

JAXP parse error on valid XML

I am trying to run some XPath Queries on XML in Java and apparently the recommended way to do so is to construct a document first.
Here is the standard JAXP code sample that I was using:
import org.w3c.dom.Document;
import javax.xml.parsers.*;
final DocumentBuilder xmlParser = DocumentBuilderFactory.newInstance().newDocumentBuilder();
final Document doc = xmlParser.parse(xmlFile);
I also tried the Saxon API, but got the same errors:
import net.sf.saxon.s9api.*;
final DocumentBuilder documentBuilder = new Processor(false).newDocumentBuilder();
final XdmNode xdm = documentBuilder.build(new File("out/data/blog.xml"));
Here is a minimal reconstructed example XML which the DocumentBuilder in JDK 1.8 can't parse:
<?xml version="1.1" encoding="UTF-8" ?>
<xml>
<![CDATA[Some example text with [funny highlight]]]>
</xml>
According to the spec, the square bracket ] just before the end of CDATA marker ]]> is perfectly legal, but the parser just exits with a stack trace and the message org.xml.sax.SAXParseException; XML document structures must start and end within the same entity..
On my original data file which contains a lot of CDATA sections, the message is instead org.xml.sax.SAXParseException; The element type "item" must be terminated by the matching end-tag "</item>". In both cases ´com.sun.org.apache.xerces´ shows up in the stacktrace a lot.
Form both observations it seems as if the parser just didn't end the CDATA section at ]]>.
EDIT: As it turned out, the example will pass when the <?xml ... ?> declaration is omitted. I hadn't checked that before posting here and added it just now.

Short answer: add Apache Xerces to the build path, it will automatically be loaded instead of the parser from the JDK and the XML will be parsed just fine! Copy-paste Gradle Dependency:
implementation "xerces:xercesImpl:2.11.0"
Some background: Apache Xerces is indeed the same parser which is also used in the JDK, but even though Xerces 2.11 dates from 2013 the JDK comes with a much older version. That really sucks!
As the Saxon team puts it:
Saxonica recommends use of the Xerces parser from Apache in preference to the version bundled in the JDK, which is known to have some serious bugs.
In case you wonder how simply putting Xerces on the classpath makes the problem disappear: even though the JDK and Saxon DocumentBuilders construct entirely different document types, they both use the same Standard Java Interfaces to call the parser and also the same mechanism to find and load the parser (or rather, the parser factory). In short, a java.util.ServiceLoader is called and looks into all the JARs in the classpath for properties files in META-INF/services and this is how the xercesJar announces that it does provide an XML parser. And good for us, the JDK's own implementation is superseded by anything found there.
After making this bad experience with JDK XML classes, I am even more motivated to refactor projects to use Saxon for XPath processing instead of the implementation of XPath in the JDK. The other reason is the technical advantage of XDM over DOM (same link as above).

OpenSearch Compatible Response From Java

Here is an example OpenSearch description file:
http://webcat.hud.ac.uk/OpenSearch.xml
When I send a query as like that:
http://webcat.hud.ac.uk/perl/opensearch.pl?keyword=new&startpage=1&itemsperpage=20
I get a response which is compatible to OpenSearch. How can I implement OpenSearch specification at Java or is there any library for it or is there any xsd that I can generate a Java code from it?

According to the OpenSearch website's section on "Reading OpenSearch", there is a Java library which can do this, called Apache Abdera. I have not used it myself, so I cannot comment on its quality, but it should be worth looking into - apparently it can both interpret AND create OpenSearch responses, so this may be exactly what you're looking for.
Alternatively, there are quite a few very good XML parsers for Java (see this question for some suggestions), so writing your own parser for a simple OpenSearch XML file shouldn't be too difficult, since the full specification is available online.
As for an XSD, I can't find an "official" one, however there are XSD's for OpenSearch in various open source projects which have been tested and you can use, such as this one, which is part of a project called "OpenSearch Validator."
Another potential choice for writing OpenSearch results is the very mature and widely-used Apache Lucene library, which is in the list of software "writing OpenSearch results" in the previously linked OpenSearch website.

ROME also supports OpenSearch with its ROME Module A9 OpenSearch.
Sample usage:
SyndFeed feed = new SyndFeedImpl();
feed.setFeedType(feedType);
// Add the opensearch module, you would get information like totalResults from the
// return results of your search
List mods = feed.getModules();
OpenSearchModule osm = new OpenSearchModuleImpl();
osm.setItemsPerPage(1);
osm.setStartIndex(1);
osm.setTotalResults(1024);
osm.setItemsPerPage(50);
OSQuery query = new OSQuery();
query.setRole("superset");
query.setSearchTerms("Java Syndication");
query.setStartPage(1);
osm.addQuery(query);
Link link = new Link();
link.setHref("http://www.bargainstriker.com/opensearch-description.xml");
link.setType("application/opensearchdescription+xml");
osm.setLink(link);
mods.add(osm);
feed.setModules(mods);
// end add module

Using org.apache.commons.digester.Digester in XML with attributes

I'm going to extract values from this XML/RDF:
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:j.0="urn:turismoculturale.itdc.filas-1.0.0-RC1#">
<j.0:Chiesa rdf:ID="turismoCulturale_POI_880">
<j.0:title xml:lang="en">Church of S. Giuda Taddeo or S. Onofrio - Gaeta</j.0:title>
<j.0:title xml:lang="it">Chiesa S. Giuda Taddeo o S. Onofrio - Gaeta</j.0:title>
</j.0:Chiesa>
</rdf:RDF>
I would like to get en title when I am in "en" language and "it" title otherwise. I am able to set the title value in the Poi bean by using:
Digester digester = new Digester();
digester.setNamespaceAware( true );
digester.setRuleNamespaceURI( "urn:turismoculturale.itdc.filas-1.0.0-RC1#" );
digester.addObjectCreate( "*/Chiesa", Poi.class);
digester.addBeanPropertySetter("*/title", "title");
...
but I don't know if it is the english title or the italian one.

Ok - first and foremost, don't try to parse RDF/XML with an XML parser. It's never going to work because the semantics of the XML document are irrelevant with respect to RDF/XML and it is a bad idea (if you know how RDF/XML works), especially in your case where the RDF/XML is being generated dynamically (you can tell by the namespaces). You need to use an RDF parser to parse RDF.
So that means don't use an XML to Java object mapping tool, use an RDF to Java Object mapping tool.
Here is a great link explaining how to do this:
http://answers.semanticweb.com/questions/859/best-way-to-convert-rdfxml-file-to-pojos
And another:
http://answers.semanticweb.com/questions/3251/experience-using-java-based-frameworks-for-rdf-to-pojo-and-vice-versa-mapping
Along with links to all the tools in the aforementioned resource:
Jenabean
Empire
AliBaba
RDFReactor
For an out and out RDF parser, look at Jena:
http://incubator.apache.org/jena
It's an Apache project that is also nicely Maven'ed up.

The Commons Digester FAQ says:
Occasionally, people ask how they can fire a rule for an element based on the value of an attribute
There is no simple way to do this with Digester; the built-in rule-matching engines only provide the ability to match on element name. There is no support available for XPath expressions
It might be possible to create a custom "filtering" rule that has a child rule, and fires that child rule only when the appropriate conditions are set. There are no examples of such a solution, however.
Digester isn't a very good tool. It's too simplistic. Consider using a more comprehensive event-based API such as StAX.

how to create an odt file programmatically with java?

How can I create an odt (LibreOffice/OpenOffice Writer) file with Java programmatically? A "hello world" example will be sufficient. I looked at the OpenOffice website but the documentation wasn't clear.

Take a look at ODFDOM - the OpenDocument API
ODFDOM is a free OpenDocument Format
(ODF) library. Its purpose is to
provide an easy common way to create,
access and manipulate ODF files,
without requiring detailed knowledge
of the ODF specification. It is
designed to provide the ODF developer
community with an easy lightwork
programming API portable to any
object-oriented language.
The current reference implementation
is written in Java.
// Create a text document from a standard template (empty documents within the JAR)
OdfTextDocument odt = OdfTextDocument.newTextDocument();
// Append text to the end of the document.
odt.addText("This is my very first ODF test");
// Save document
odt.save("MyFilename.odt");
later
As of this writing (2016-02), we are told that these classes are deprecated... big time, and the OdfTextDocument API documentation tells you:
As of release 0.8.8, replaced by org.odftoolkit.simple.TextDocument in
Simple API.
This means you still include the same active .jar file in your project, simple-odf-0.8.1-incubating-jar-with-dependencies.jar, but you want to be unpacking the following .jar to get the documentation: simple-odf-0.8.1-incubating-javadoc.jar, rather than odfdom-java-0.8.10-incubating-javadoc.jar.
Incidentally, the documentation link downloads a bunch of jar files inside a .zip which says "0.6.1"... but most of the stuff inside appears to be more like 0.8.1. I have no idea why they say "as of 0.8.8" in the documentation for the "deprecated" classes: just about everything is already marked deprecated.
The equivalent simple code to the above is then:
odt_doc = org.odftoolkit.simple.TextDocument.newTextDocument()
para = odt_doc.getParagraphByIndex( 0, False )
para.appendTextContent( 'stuff and nonsense' )
odt_doc.save( 'mySpankingNewFile.odt' )
PS am using Jython, but the Java should be obvious.

I have not tried it, but using JOpenDocument may be an option. (It seems to be a pure Java library to generate OpenDocument files.)

A complement of previously given solutions would be JODReports, which allows creating office documents and reports in ODT format (from templates, composed using the LibreOffice/OpenOffice.org Writer word processor).
DocumentTemplateFactory templateFactory = new DocumentTemplateFactory();
DocumentTemplate template = templateFactory .getTemplate(new File("template.odt"));
Map data = new HashMap();
data.put("title", "Title of my doc");
data.put("picture", new RenderedImageSource(ImageIO.read(new File("/tmp/lena.png"))));
data.put("answer", "42");
//...
template.createDocument(data, new FileOutputStream("output.odt"));
Optionally the documents can then be converted to PDF, Word, RTF, etc. with JODConverter.
Edit/update
Here you can find a sample project using JODReports (with non-trivial formatting cases).

I have written a jruby DSL for programmatically manipulating ODF documents.
https://github.com/noah/ocelot
It's not strictly java, but it aims to be much simpler to use than the ODFDOM.
Creating a hello world document is as easy as:
% cat examples/hello.rb
include OCELOT
Text::create "hello" do
paragraph "Hello, world!"
end
There are a few more examples (including a spreadsheet example or two) here.

I have been searching for an answer about this question for myself. I am working on a project for generating documents with different formats and I was in a bad need for library to generate ODT files.
I finally can say the that ODFToolkit with the latest version of the simple-odf library is the answer for generating text documents.
You can find the the official page here :
Apache ODF Toolkit(Incubating) - Simple API
Here is a page to download version 0.8.1 (the latest version of Simple API) as I didn't find the latest version at the official page, only version 0.6.1
And here you can find Apache ODF Toolkit (incubating) cookbook

You can try using JasperReports to generate your reports, then export it to ODS. The nice thing about this approach is
you get broad support for all JasperReports output formats, e.g. PDF, XLS, HTML, etc.
Jasper Studio makes it easy to design your reports

The ODF Toolkit project (code hosted at Github) is the new home of the former ODFDOM project, which was until 2018-11-27 a Apache Incubator project.

the solution may be JODF Java API Independentsoft company.
For example, if we want to create an Open Document file using this Java API we could do the following:
import com.independentsoft.office.odf.Paragraph;
import com.independentsoft.office.odf.TextDocument;
public class Example {
public static void main(String[] args)
{
try
{
TextDocument doc = new TextDocument();
Paragraph p1 = new Paragraph();
p1.add("Hello World");
doc.getBody().add(p1);
doc.save("c:\\test\\output.odt", true);
}
catch (Exception e)
{
System.out.println(e.getMessage());
e.printStackTrace();
}
}
}
There are also .NET solutions for this API.

How can I transform xml to html on android?

I'm relatively new to java and android, in my android application I need to get an XML file, transform it and show it to a user.
I know how to parse XML, but I don't want to parse it and generate views after. I'd like to transform it to an HTML and display in a WebView.
I'm trying to find something on the Internet but can't find anything.
How can I do it? Any ideas or links will be appreciated,
thanks

The usual tool to use for transforming XML to HTML is XSLT. Search SO for XSLT tutorial and you'll get some good results.
Here is another question showing how one developer used XSLT on Android.
You can also search for examples of the use of Transformer in Java, as in this helpful article:
// JAXP reads data using the Source interface
Source xmlSource = new StreamSource(xmlFile);
Source xsltSource = new StreamSource(xsltFile);
// the factory pattern supports different XSLT processors
TransformerFactory transFact =
TransformerFactory.newInstance();
Transformer trans = transFact.newTransformer(xsltSource);
trans.transform(xmlSource, new StreamResult(System.out));
Update:
For older versions of the Android java API:
This article shows how to use a SAX parser to parse the input XML, then use XmlSerializer to output XML. The latter could easily output whatever XHTML you want. Both are available since API level 1.
Unfortunately I don't see a way to do XPath in API level 3, but if your input XML isn't too complex you should be able to code your own transformations. I know you "don't want to parse it and generate view after", but if you mean you don't want to even use an XML parser that's provided by Android, then I don't know of any alternative.
Update 2:
I just learned about XOM, which supports a subset of XPath. This question shows someone using XOM on Android (to write XML) with API level 4. You could take advantage of the XPath features, as well as the serialization features. It requires a small external library, XOM's jar. I don't know if it's compatible with API level 3.

Please see the complete working example, created as a part of "AndStatus" application.
It uses XSLT to show localized (!) Application Change Log in a WebView of the HelpActivity.
The working example consists of these parts:
The small utility class without any dependencies (i.e. it may be easily reused) which has this function:
/**
Transform XML input files using supplied XSL stylesheet and show it in the WebView
#param activity Activity hosting the WebView
#param resView WebView in which the output should be shown
#param resXml XML file to transform. This file is localized! It should be put into "raw-" folder
#param resXsl XSL stylesheet. In the "raw" folder. May be single for all languages...
*/
public static void toWebView(Activity activity, int resView, int resXml, int resXsl) {
...
}
See full source code here: Xslt.java
The example of its usage: See HelpActivity.java
The XML file to be transformed: changes.xml
and the corresponding XSL stylesheet: changesxsl.xsl

Use XSLT, XSLT convert your XML in to HTML that we can display in Android Webview.
This question will help.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.