Bad Characters when parsing GML in Java - java

I'm using the org.w3c.dom package to parse the gml schemas (http://schemas.opengis.net/gml/3.1.0/base/).
When I parse the gmlBase.xsd schema and then save it back out, the quote characters around GeometryCollections in the BagType complex type come out converted to bad characters (See code below).
Is there something wrong with how I'm parsing or saving the xml, or is there something in the schema that is off?
Thanks,
Curtis
public static void main(String[] args) throws IOException
{
File schemaFile = File.createTempFile("gml_", ".xsd");
FileUtils.writeStringToFile(schemaFile, getSchema(new URL("http://schemas.opengis.net/gml/3.1.0/base/gmlBase.xsd")));
System.out.println("wrote file: " + schemaFile.getAbsolutePath());
}
public static String getSchema(URL schemaURL)
{
try
{
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(new StringReader(IOUtils.toString(schemaURL.openStream()))));
Element rootElem = doc.getDocumentElement();
rootElem.normalize();
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
DOMSource source = new DOMSource(doc);
ByteArrayOutputStream xmlOutStream = new ByteArrayOutputStream();
StreamResult result = new StreamResult(xmlOutStream);
transformer.transform(source, result);
return xmlOutStream.toString();
}
catch (Exception e)
{
e.printStackTrace();
}
return "";
}

I'm suspicious of this line:
Document doc = db.parse(new InputSource(
new StringReader(IOUtils.toString(schemaURL.openStream()))));
I don't know what IOUtils.toString does here but presumably it's assuming a particular encoding, without taking account of the XML declaration.
Why not just use:
Document doc = db.parse(schemaURL.openStream());
Likewise your FileUtils.writeStringToFile doesn't appear to specify a character encoding... which encoding does it use, and why encoding is in the StreamResult?

Related

Import and parse an xml file without FileOutputStream

Consider the code fragment that I have at the moment which works and the right elements are found and placed into my map:
public void importXml(InputSource emailAttach)throws Exception {
Map<String, String> hWL = new HashMap<String, String>();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(emailAttach);
FileOutputStream fos=new FileOutputStream("temp.xml");
OutputStreamWriter os = new OutputStreamWriter(fos,"UTF-8");
// Transform to XML UTF-8 format
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
t.transform(new DOMSource(doc), new StreamResult(os));
os.close();
fos.close();
doc = db.parse(new File("temp.xml"));
NodeList nl = doc.getElementsByTagName("Email");
Element eE=(Element)nl.item(0);
int ctr=eE.getChildNodes().getLength();
String sNName;
String sNValue;
Node nTemp;
for (int i=0;i<ctr;i++){
nTemp=eE.getChildNodes().item(i);
sNName=nTemp.getNodeName().toUpperCase().trim();
if (nTemp.getChildNodes().item(0)!=null) {
sNValue=nTemp.getChildNodes().item(0).getNodeValue().trim();
hWL.put(sNName,sNValue);
}
}
}
However I prefer not to create a temp file first after converting the data to UTF-8 and parsing from the temp file. Is there anyway I can do this?
I've tried using a ByteArrayOutputStream in place of OutputStreamWriter, and calling toString() on the ByteArrayOutputStream as such:
doc = db.parse(bos.toString("UTF-8");
But then my Map ends up being empty.
From the API docs (the ability of its meticulous studying is a valuable asset for any programmer) - the parse method with the String argument seems to take something different from what you feed to it:
Document parse(String uri)
Parse the content of the given URI as an XML document and return a new DOM >Document object.
This might be your friend:
db.parse ( new ByteArrayInputStream( bos.toByteArray()));
Update
#user2496748 sorry I should have searched for the API but instead I was looking at the source code through a decompiler which tells me the parameter is arg0 instead of uri. Big difference.
I think I understand stream readers/writers and byte to char or vice versa a little more now.
After some review I was able to simply my code to this and achieve what I wanted to do. Since I am able to get the email attachment as a InputSource:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
emailAttach.setEncoding("UTF-8");
Document doc = db.parse(emailAttach);
Works as well and tested with non-english characters.
You don't need to write and re-read and re-parse the transformed document. Just change this:
t.transform(new DOMSource(doc), new StreamResult(os));
to this:
DOMResult result = new DOMResult();
t.transform(new DOMSource(doc), result);
doc = (Document)result.getNode();
and then continue from after your present doc = db.parse(new File("temp.xml"));.

Replace text in XML before XSLT

I need to replace a certain text in a XML file before giving it to the XSL-Transformer.
It's the DTD-URL in the DOCTYPE tag. It points to a webserver, but I want it to be usable offline, so I want to change it to a URL pointing to a local file.
However I mustn't edit the original XML directly. I thought of reading the file into a string, use String.replaceAll() on the text and save it into another file, which I pass to the Transformer. I already tried it, but it's really slow; the file I'm using has a size of ca. 500kiB.
Is there any better (=faster) way to accomplish this?
EDIT: The code used for the transformation:
public String getPlaylist(String playlist) {
Source source = new StreamSource(library);
StreamSource xsl = new StreamSource(getClass().getResourceAsStream("M3Utransformation.xml"));
StringWriter w = new StringWriter();
Result result = new StreamResult(w);
try {
Transformer transformer = TransformerFactory.newInstance().newTransformer(xsl);
transformer.setParameter("playlist", playlist);
transformer.transform(source, result);
return w.getBuffer().toString();
} catch (Throwable t) {
t.printStackTrace();
return null;
}
}
You can create an entity resolver, and make use of it.
The following example uses the JAXP DocumentBuilder, and a CatalogResolver
public static void main(String[] args) throws ParserConfigurationException,
SAXException, IOException, TransformerConfigurationException, TransformerException {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
db.setEntityResolver(new CatalogResolver());
File src = new File("src/article.xml");
Document doc = db.parse(src);
// Here, we execute the transformation
// Use a Transformer for output
File stylesheet = new File("src/aticle.xsl");
TransformerFactory tFactory = TransformerFactory.newInstance();
StreamSource stylesource = new StreamSource(stylesheet);
Transformer transformer = tFactory.newTransformer(stylesource);
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);
}
create a catalog properties file, and place it on your classpath
CatalogManager.properties has to be the name, see CatalogManager API documentation
define a catalog XML file, point your properties file, above to it. From
http://www.xml.com/pub/a/2004/03/03/catalogs.html you can find a very simple catalog XML file :
<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<public publicId="-//OYRM/foo" uri="src/bar.dtd"/>
</catalog>
With the above catalog.xml and CatalogManager.properties, you'll end up resolving references to the publicId "-//OYRM/foo" to the uri src/bar.dtd
xml-commons contains the resolver :
http://xerces.apache.org/mirrors.cgi#binary
for a more complete treatment of the topic of Resolvers read Tom White's article from XML.com
The transformer application was cribbed from the Java trail for Extensible StyleSheet Language Transformations > Transforming Data with XSLT

append node to an xml in Java

I can't append correctly some info to my xml file. That's the scrivi function
public String scrivi (Document doc, File dest)
{
try
{
DOMSource sorgente = new DOMSource (doc);
StreamResult sr = new StreamResult (dest);
TransformerFactory tf =
TransformerFactory.newInstance();
Transformer transf = tf.newTransformer();
transf.transform (sorgente, sr);
return "Tutto ok";
}
catch (TransformerConfigurationException tce)
{
System.out.println(tce.getMessage());
return "<h1> Config </h1>";
}
catch (TransformerException te)
{
System.out.println(te.getMessage());
return "<h1> Transformer Errore </h1>";
}
}
and tath is my code:
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(getClass().getResourceAsStream("/azioni.xml"));
Element root = document.getDocumentElement();
Element new_azione = document.createElement("azione");
Element id = document.createElement("id_azione");
id.setTextContent(id_azione);
Element nome = document.createElement("nome_azione");
nome.setTextContent(nome_azione);
Element prezzo_a = document.createElement("prezzo");
prezzo_a.setTextContent(prezzo);
new_azione.appendChild(id);
new_azione.appendChild(nome);
new_azione.appendChild(prezzo_a);
document.getDocumentElement().appendChild(new_azione);
String nomexmlOut="/azioni.xml";
File filedest = new File(nomexmlOut);
out.println(this.scrivi(document, filedest));
}
I get the error Transformer Errore ... how can I solve? what's Wrong?
* UPDATE *
Error Info
java.io.FileNotFoundException: /azioni.xml (Permission denied)
Hard to tell without actual exception trace or message, but my guess is that your problem is the ouput stream.
File("/azioni.xml");
is not the same as
getClass().getResourceAsStream("/azioni.xml")
Try with directing the output to system out and see if it works. i.e. declare scrivi
public String scrivi (Document doc, OutputStream out)
and call it with
scrivi(document, System.out);
UPDATE:
To write to the same file location, try something like this (untested)
File out = new File(getClasss().getResource("...").getFile());
and make sure that you close the input stream that you originally read from, before trying to write.

How to avoid encoding of <,>,& with Document.createTextNode

class XMLencode
{
public static void main(String[] args)
{
try{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = factory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
Element root = doc.createElement("roseindia");
doc.appendChild(root);
Text elmnt=doc.createTextNode("<data>sun</data><abcdefg/><end/>");
root.appendChild(elmnt);
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
Source src = new DOMSource(doc);
Result dest = new StreamResult(System.out);
aTransformer.transform(src, dest);
}catch(Exception e){
System.out.println(e.getMessage());
}
}
}
Here is my above piece of code.
The output generated is like this
<?xml version="1.0" encoding="UTF-8" standalone="no"?><roseindia><data>sun</data><abcdefg/><end/></roseindia>
I dont want the tags to be encoded. I need the output in this fashion.
<?xml version="1.0" encoding="UTF-8" standalone="no"?><roseindia><data>sun</data><abcdefg/><end/></roseindia>
Please help me on this.
Thanks,
Mohan
Short Answer
You could leverage the CDATA mechanism in XML to prevent characters from being escaped. Below is an example of the DOM code:
doc.createCDATASection("<foo/>");
The content will be:
<![CDATA[<foo/>]]>
LONG ANSWER
Below is a complete example of leveraging a CDATA section using the DOM APIs.
package forum12525152;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.*;
public class Demo {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.newDocument();
Element rootElement = document.createElement("root");
document.appendChild(rootElement);
// Create Element with a Text Node
Element fooElement = document.createElement("foo");
fooElement.setTextContent("<foo/>");
rootElement.appendChild(fooElement);
// Create Element with a CDATA Section
Element barElement = document.createElement("bar");
CDATASection cdata = document.createCDATASection("<bar/>");
barElement.appendChild(cdata);
rootElement.appendChild(barElement);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(System.out);
t.transform(source, result);
}
}
Output
Note the difference in the foo and bar elements even though they have similar content. I have formatted the result of running the demo code to make it more readable:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root>
<foo><foo/></foo>
<bar><![CDATA[<bar/>]]></bar>
</root>
Instead of writing like this doc.createTextNode("<data>sun</data><abcdefg/><end/>");
You should create each element.
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import org.w3c.dom.*;
class XMLencode {
public static void main(String[] args) {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory
.newInstance();
DocumentBuilder docBuilder = factory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
Element root = doc.createElement("roseindia");
doc.appendChild(root);
Element data = doc.createElement("data");
root.appendChild(data);
Text elemnt = doc.createTextNode("sun");
data.appendChild(elemnt);
Element data1 = doc.createElement("abcdefg");
root.appendChild(data1);
//Text elmnt = doc.createTextNode("<data>sun</data><abcdefg/><end/>");
//root.appendChild(elmnt);
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
Source src = new DOMSource(doc);
Result dest = new StreamResult(System.out);
aTransformer.transform(src, dest);
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
}
You can use the doc.createTextNode and use a workaround (long) for the escaped characters.
SOAPMessage msg = messageContext.getMessage();
header.setTextContent(seched);
Then use
Source src = msg.getSOAPPart().getContent();
To get the content, the transform it to string
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer. setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
StreamResult result1 = new StreamResult(new StringWriter());
transformer.transform(src, result1);
Replace the string special characters
String xmlString = result1.getWriter().toString()
.replaceAll("<", "<").
replaceAll(">", ">");
System.out.print(xmlString);
the oposite string to dom with the fixed escaped characters
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xmlString));
Document doc = db.parse(is);
Source src123 = new DOMSource(doc);
Then set it back to the soap message
msg.getSOAPPart().setContent(src123);
Don't use createTextNode - the whole point of it is to insert some text (as data) into the document, not a fragment of raw XML.
Use a combination of createTextNode for the text and createElement for the elements.
I dont want the tags to be encoded. I need the output in this fashion.
Then you don't want a text node at all - which is why createTextNode isn't working for you. (Or rather, it's working fine - it's just not doing what you want). You should probably just parse your XML string, then import the document node from the result into your new document.
Of course, if you know the elements beforehand, don't express them as text in the first place - use a mixture of createElement, createAttribute, createTextNode and appendChild to create the structure.
It's entirely possible that something like JDOM will make this simpler, but that's the basic approach.
Mohan,
You can't use Document.createTextNode(). That methos transforms (or escapes) the charactes in your XML.
Instead, you need to build two separate Documents from the 2 XML's and use importNode.
I use Document.importNode() like this to solve my problem:
Build your builders:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = dbf.newDocumentBuilder();
Document oldDoc = builder.parse(isOrigXml); //this is XML as InputSource
Document newDoc = builder.parse(isInsertXml); //this is XML as InputSource
Next, build a NodeList of the Element/Node you want to import. Create a Node from the NodeList. Create another Node of what you are going to import using importNode. Build the last Node of the final XML as such:
NodeList nl = newDoc.getElementByTagName("roseindia"); //or whatever the element name is
Node xmlToInsert = nl.item(0);
Node importNode = oldDoc.importNode(xmlToImport, true);
Node target = ((NodeList) oldDoc.getElementsByTagName("ELEMENT_NAME_OF_LOCATION")).item(0);
target.appendChild(importNode);
Source source = new DOMSource(target);
....
The rest is standard Transformer - StringWriter to StreamResult stuff to get the results.

Java, XML DocumentBuilder - setting the encoding when parsing

I'm trying to save a tree (extends JTree) which holds an XML document to a DOM Object having changed it's structure.
I have created a new document object, traversed the tree to retrieve the contents successfully (including the original encoding of the XML document), and now have a ByteArrayInputStream which has the tree contents (XML document) with the correct encoding.
The problem is when I parse the ByteArrayInputStream the encoding is changed to UTF-8 (in the XML document) automatically.
Is there a way to prevent this and use the correct encoding as provided in the ByteArrayInputStream.
It's also worth adding that I have already used the
transformer.setOutputProperty(OutputKeys.ENCODING, encoding) method to retrieve the right encoding.
Any help would be appreciated.
Here's an updated answer since OutputFormat is deprecated :
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");
StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(document), new StreamResult(writer));
String output = writer.getBuffer().toString().replaceAll("\n|\r", "");
The second part will return the XML Document as String
// Read XML
String xml = "xml"
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xml)));
// Append formatting
OutputFormat format = new OutputFormat(document);
if (document.getXmlEncoding() != null) {
format.setEncoding(document.getXmlEncoding());
}
format.setLineWidth(100);
format.setIndenting(true);
format.setIndent(5);
Writer out = new StringWriter();
XMLSerializer serializer = new XMLSerializer(out, format);
serializer.serialize(document);
String result = out.toString();
I solved it, given alot of trial and errors.
I was using
OutputFormat format = new OutputFormat(document);
but changed it to
OutputFormat format = new OutputFormat(d, encoding, true);
and this solved my problem.
encoding is what I set it to be
true refers to whether or not indent is set.
Note to self - read more carefully - I had looked at the javadoc hours ago - if only I'd have read more carefully.
This worked for me and is very simple. No need for a transformer or output formatter:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource(inputStream);
is.setEncoding("ISO-8859-1"); // set your encoding here
Document document = builder.parse(is);

Categories