i'm trying to write the header for an xml file so it would be something like this:
<file xmlns="http://my_namespace"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://my_namespace file.xsd">
however, I can't seem to find how to do it using the Document class in java. This is what I have:
public void exportToXML() {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder;
try {
dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.newDocument();
doc.setXmlStandalone(true);
doc.createTextNode("<file xmlns=\"http://my_namespace"\n" +
"xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n" +
"xsi:schemaLocation=\"http://my_namespace file.xsd\">");
Element mainRootElement = doc.createElement("MainRootElement");
doc.appendChild(mainRootElement);
for(int i = 0; i < tipoDadosParaExportar.length; i++) {
mainRootElement.appendChild(criarFilhos(doc, tipoDadosParaExportar[i]));
}
Transformer tr = TransformerFactory.newInstance().newTransformer();
tr.transform(new DOMSource(doc),
new StreamResult(new FileOutputStream(filename)));
} catch (Exception e) {
e.printStackTrace();
}
}
I tried writing it on the file using the createTextNode but it didn't work either, it only writes the version before showing the elements.
PrintStartXMLFile
Would appreciate if you could help me. Have a nice day
Your createTextNode() method is only suitable for creating text nodes, it's not suitable for creating elements. You need to use createElement() for this. If you're doing this by building a tree, then you need to build nodes, you can't write lexical markup.
I'm not sure what MainRootElement is supposed to be; you've only given a fragment of your desired output so it's hard to tell.
Creating a DOM tree and then serializing it is a pretty laborious way of constructing an XML file. Using something like an XMLEventWriter is easier. But to be honest, I got frustrated by all the existing approaches and wrote a new library for the purpose as part of Saxon 10. It's called simply "Push", and looks something like this:
Processor proc = new Processor();
Serializer serializer = proc.newSerializer(new File(fileName));
Push push = proc.newPush(serializer);
Document doc = push.document(true);
doc.setDefaultNamespace("http://my_namespace");
Element root = doc.element("root")
.attribute(new QName("xsi", "http://www.w3.org/2001/XMLSchema-instance", "schemaLocation"),
"http://my_namespace file.xsd");
doc.close();
Related
Consider the code fragment that I have at the moment which works and the right elements are found and placed into my map:
public void importXml(InputSource emailAttach)throws Exception {
Map<String, String> hWL = new HashMap<String, String>();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(emailAttach);
FileOutputStream fos=new FileOutputStream("temp.xml");
OutputStreamWriter os = new OutputStreamWriter(fos,"UTF-8");
// Transform to XML UTF-8 format
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
t.transform(new DOMSource(doc), new StreamResult(os));
os.close();
fos.close();
doc = db.parse(new File("temp.xml"));
NodeList nl = doc.getElementsByTagName("Email");
Element eE=(Element)nl.item(0);
int ctr=eE.getChildNodes().getLength();
String sNName;
String sNValue;
Node nTemp;
for (int i=0;i<ctr;i++){
nTemp=eE.getChildNodes().item(i);
sNName=nTemp.getNodeName().toUpperCase().trim();
if (nTemp.getChildNodes().item(0)!=null) {
sNValue=nTemp.getChildNodes().item(0).getNodeValue().trim();
hWL.put(sNName,sNValue);
}
}
}
However I prefer not to create a temp file first after converting the data to UTF-8 and parsing from the temp file. Is there anyway I can do this?
I've tried using a ByteArrayOutputStream in place of OutputStreamWriter, and calling toString() on the ByteArrayOutputStream as such:
doc = db.parse(bos.toString("UTF-8");
But then my Map ends up being empty.
From the API docs (the ability of its meticulous studying is a valuable asset for any programmer) - the parse method with the String argument seems to take something different from what you feed to it:
Document parse(String uri)
Parse the content of the given URI as an XML document and return a new DOM >Document object.
This might be your friend:
db.parse ( new ByteArrayInputStream( bos.toByteArray()));
Update
#user2496748 sorry I should have searched for the API but instead I was looking at the source code through a decompiler which tells me the parameter is arg0 instead of uri. Big difference.
I think I understand stream readers/writers and byte to char or vice versa a little more now.
After some review I was able to simply my code to this and achieve what I wanted to do. Since I am able to get the email attachment as a InputSource:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
emailAttach.setEncoding("UTF-8");
Document doc = db.parse(emailAttach);
Works as well and tested with non-english characters.
You don't need to write and re-read and re-parse the transformed document. Just change this:
t.transform(new DOMSource(doc), new StreamResult(os));
to this:
DOMResult result = new DOMResult();
t.transform(new DOMSource(doc), result);
doc = (Document)result.getNode();
and then continue from after your present doc = db.parse(new File("temp.xml"));.
I have some 3000 elements in arraylist.Need to write these 3000 elements to xml.
The execution is taking too much time like 20 minutes in eclipse.
Is there any efficient way to do this?
OR any modifications to my code?
The elements in arraylist are supposed to grow in future...
MY code snippet..
---------------------------------------------------
---------------------------------------------------
for(int i=0;i<candidates.size();i++)//Candidates is my arraylist
{
String text=candidates.get(i);
//System.out.println(text);
text=text+"\n";
file= new File("./test.xml");
WriteToXML wr= new WriteToXML(file,"fullname",text);
}
-------------------------------------------------------
-------------------------------------------------------
//WritetoXML class constructor
public WriteToXML(File xml,String tag,String data)
{
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new FileInputStream(new File(xml)));
Element element = doc.getDocumentElement();
NodeList node1 = doc.getElementsByTagName(tag);
Element fn= (Element) node1.item(0);
Text text = doc.createTextNode(data);
fn.appendChild(text);
printtoXML(doc);
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
public static final void printtoXML(Document xml) throws Exception {
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
tf.setOutputProperty(OutputKeys.INDENT, "yes");
StringWriter sw = new StringWriter();
StreamResult result = new StreamResult(sw);
DOMSource source = new DOMSource(xml);
tf.transform(source, result);
String xmlString = sw.toString();
File file= new File(xml);
FileWriter fw=new FileWriter(file,false);
BufferedWriter bw = new BufferedWriter(fw);
bw.write(xmlString);
bw.flush();
bw.close();
}
Right now you are doing this for each of your 3000 elements:
Open the file
Parse the document in there
add an element to the dom structure
flush the file to disk and close it
Way faster would be to do steps 1, 2 and 4 only once (1 & 2 before the loop; 4 after the loop) and just do step 3 for every element in your list (in the loop).
Just write a new method which takes your tag variable and a Document instance and add the tag to the document.
What is really expensive here is the multiple conversion of Object structures to XML and back. This has a huge overhead.
File IO is also giving a lot of overhead, but even this should be small compared to the multiple creation and parsing of DOM structures.
put the for loop inside the WriteToXML().
Main function:
file= new File("./test.xml");
WriteToXML wr= new WriteToXML(file,"fullname",candidates)
Inside WriteToXML
public WriteToXML(File xml,String tag,List candidates )
{
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new FileInputStream(new File(xml)));
Element element = doc.getDocumentElement();
NodeList node1 = doc.getElementsByTagName(tag);
Element fn= (Element) node1.item(0);
for (int i=0;i<candidates.size();i++) {
Text text = doc.createTextNode(candidates.get(i)+"\n");
fn.appendChild(text);
}
printtoXML(doc);
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
This way you are not re-parsing the XML all the time and write it only once.
I've tried to do minimal changes. I would not recommend doing this in the constructor - unless there is a good reason to do so.
Use SAX instead of DOM. SAX is more efficient than DOM
as #Masud suggests, Use SAX there is no other way. There is a great example about this Generating XML from an Arbitrary Data Structure
I am starting an Android application that will parse XML from the web. I've created a few Android apps but they've never involved parsing XML and I was wondering if anyone had any tips on the best way to go about it?
Here's an example:
try {
URL url = new URL(/*your xml url*/);
URLConnection conn = url.openConnection();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(conn.getInputStream());
NodeList nodes = doc.getElementsByTagName(/*tag from xml file*/);
for (int i = 0; i < nodes.getLength(); i++) {
Element element = (Element) nodes.item(i);
NodeList title = element.getElementsByTagName(/*item within the tag*/);
Element line = (Element) title.item(0);
phoneNumberList.add(line.getTextContent());
}
}
catch (Exception e) {
e.printStackTrace();
}
In my example, my XML file looks a little like:
<numbers>
<phone>
<string name = "phonenumber1">555-555-5555</string>
</phone>
<phone>
<string name = "phonenumber2">555-555-5555</string>
</phone>
</numbers>
and I would replace /*tag from xml file*/ with "phone" and /*item within the tag*/ with "string".
I always use the w3c dom classes. I have a static helper method that I use to parse the xml data as a string and returns to me a Document object. Where you get the xml data can vary (web, file, etc) but eventually you load it as a string.
something like this...
Document document = null;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
try
{
builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(data));
document = builder.parse(is);
}
catch (SAXException e) { }
catch (IOException e) { }
catch (ParserConfigurationException e) { }
There are different types of parsing mechanisms available, one is SAX Here is SAX parsing example, second is DOM parsing Here is DOM Parsing example.. From your question it is not clear what you want, but these may be good starting points.
There are three types of parsing I know: DOM, SAX and XMLPullParsing.
In my example here you need the URL and the parent node of the XML element.
try {
URL url = new URL("http://www.something.com/something.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();
NodeList nodeList1 = doc.getElementsByTagName("parent node here");
for (int i = 0; i < nodeList1.getLength(); i++) {
Node node = nodeList1.item(i);
}
} catch(Exception e) {
}
Also try this.
I would use the DOM parser, it is not as efficient as SAX, if the XML file is not too large, as it is easier in that case.
I have made just one android App, that involved XML parsing. XML received from a SOAP web service. I used XmlPullParser. The implementation from Xml.newPullParser() had a bug where calls to nextText() did not always advance to the END_TAG as the documentation promised. There is a work around for this.
I try to transform XML document using XSLT. As an input I have www.wordpress.org XHTML source code, and XSLT is dummy example retrieving site's title (actually it could do nothing - it doesn't change anything).
Every single API or library I use, transformation takes about 2 minutes! If you take a look at wordpress.org source, you will notice that it is only 183 lines of code. As I googled it is probably due to DOM tree building. No matter how simple XSLT is, it is always 2 minutes - so it confirms idea that it's related to DOM building, but anyway it should not take 2 minutes in my opinion.
Here is an example code (nothing special):
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = null;
try {
transformer = tFactory.newTransformer(
new StreamSource("/home/pd/XSLT/transf.xslt"));
} catch (TransformerConfigurationException e) {
e.printStackTrace();
}
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
System.out.println("START");
try {
transformer.transform(new SAXSource(new InputSource(
new FileInputStream("/home/pd/XSLT/wordpress.xml"))),
new StreamResult(outputStream));
} catch (TransformerException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
System.out.println("STOP");
System.out.println(new String(outputStream.toByteArray()));
It's between START and STOP where java "pauses" for 2 minutes. If I take a look at the processor or memory usage, nothing increases. It looks like really JVM stopped...
Do you have any experience in transforming XMLs that are longer than 50 (this is random number ;)) lines? As I read XSLT always needs to build DOM tree in order to do its work. Fast transformation is crucial for me.
Thanks in advance,
Piotr
Does the sample HTML file use namespaces? If so, your XML parser may be attempting to retrieve contents (a schema, perhaps) from the namespace URIs. This is likely if each run takes exactly two minutes -- it's likely one or more TCP timeouts.
You can verify this by timing how long it takes to instantiate your InputSource object (where the WordPress XML is actually parsed), as this is likely the line which is causing the delay. After reviewing the sample file you posted, it does include a declared namespace (xmlns="http://www.w3.org/1999/xhtml").
To work around this, you can implement your own EntityResolver which essentially disables the URL-based resolution. You may need to use a DOM -- see DocumentBuilder's setEntityResolver method.
Here's a sample using DOM and disabling resolution (note -- this is untested):
try {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbFactory.newDocumentBuilder();
db.setEntityResolver(new EntityResolver() {
#Override
public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {
return null; // Never resolve any IDs
}
});
System.out.println("BUILDING DOM");
Document doc = db.parse(new FileInputStream("/home/pd/XSLT/wordpress.xml"));
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer(
new StreamSource("/home/pd/XSLT/transf.xslt"));
System.out.println("RUNNING TRANSFORM");
transformer.transform(
new DOMSource(doc.getDocumentElement()),
new StreamResult(outputStream));
System.out.println("TRANSFORMED CONTENTS BELOW");
System.out.println(outputStream.toString());
} catch (Exception e) {
e.printStackTrace();
}
If you want to use SAX, you would have to use a SAXSource with an XMLReader which uses your custom resolver.
The commenters who've posted that the answer likely resides with the EntityResolver are probably correct. However, the solution may not be to simply not load the schemas but rather load them from the local file system.
So you could do something like this
db.setEntityResolver(new EntityResolver() {
#Override
public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {
try {
FileInputStream fis = new FileInputStream(new File("classpath:xsd/" + systemId));
InputSource is = new InputSource(fis);
return is
} catch (FileNotFoundException ex) {
logger.error("File Not found", ex);
return null;
}
}
});
Chances are the problem isn't with the call transfomer.transform. It's more likely that you are doing something in your xslt that is taking forever. My suggestion would be use a tool like Oxygen or XML Spy to profile your XSLT and find out which templates are taking the longest to execute. Once you've determined this you can begin to optimize the template.
If you are debugging your code on an android device, make sure you try it without eclipse attached to the process. When I was debugging my app xslt transformations were taking 8 seconds, where the same process took a tenth of a second on ios in native code. Once I ran the code without eclipse attached to it, the process took a comparable amount of time to the c based counterpart.
I have the following code which turns a string, that I pass into the function, into a document:
DocumentBuilderFactory dbFactory_ = DocumentBuilderFactory.newInstance();
Document doc_;
void toXml(String s)
{
documentBuild();
DocumentBuilder dBuilder = dbFactory_.newDocumentBuilder();
StringReader reader = new StringReader(s);
InputSource inputSource = new InputSource(reader);
doc_ = dBuilder.parse(inputSource);
}
The problem is that some of the legacy code that I'm using passes into this toXml function a single word like RANDOM or FICTION. I would like to turn these calls into valid xml before trying to parse it. Right now if I call the function with s = FICTION it returns a SAXParseExeption error. Could anyone advise me on the right way to do this? If you have any questions let me know.
Thank you for your time
-Josh
This creates an XmlDocument with an element test
function buildXml(string s) {
XmlDocument d = new XmlDocument();
d.AppendChild(d.CreateElement(s));
StringWriter sw = new StringWriter();
XmlTextWriter xw = new XmlTextWriter(sw);
d.WriteTo(xw);
return sw.ToString();
}
buildXml("Test"); //This will return <Test />
Its a bit ugly but it will create the XML without having to do any string work on your own ;)
You could add this in a try catch in your method so if it fails to load it as an XML directly it passes the string to this and then tries to load it.
Have you tried the seemingly obvious <FICTION/> or <FICTION></FICTION>?