Writing text to XML is taking long time --Java - java

I have some 3000 elements in arraylist.Need to write these 3000 elements to xml.
The execution is taking too much time like 20 minutes in eclipse.
Is there any efficient way to do this?
OR any modifications to my code?
The elements in arraylist are supposed to grow in future...
MY code snippet..
---------------------------------------------------
---------------------------------------------------
for(int i=0;i<candidates.size();i++)//Candidates is my arraylist
{
String text=candidates.get(i);
//System.out.println(text);
text=text+"\n";
file= new File("./test.xml");
WriteToXML wr= new WriteToXML(file,"fullname",text);
}
-------------------------------------------------------
-------------------------------------------------------
//WritetoXML class constructor
public WriteToXML(File xml,String tag,String data)
{
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new FileInputStream(new File(xml)));
Element element = doc.getDocumentElement();
NodeList node1 = doc.getElementsByTagName(tag);
Element fn= (Element) node1.item(0);
Text text = doc.createTextNode(data);
fn.appendChild(text);
printtoXML(doc);
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
public static final void printtoXML(Document xml) throws Exception {
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
tf.setOutputProperty(OutputKeys.INDENT, "yes");
StringWriter sw = new StringWriter();
StreamResult result = new StreamResult(sw);
DOMSource source = new DOMSource(xml);
tf.transform(source, result);
String xmlString = sw.toString();
File file= new File(xml);
FileWriter fw=new FileWriter(file,false);
BufferedWriter bw = new BufferedWriter(fw);
bw.write(xmlString);
bw.flush();
bw.close();
}

Right now you are doing this for each of your 3000 elements:
Open the file
Parse the document in there
add an element to the dom structure
flush the file to disk and close it
Way faster would be to do steps 1, 2 and 4 only once (1 & 2 before the loop; 4 after the loop) and just do step 3 for every element in your list (in the loop).
Just write a new method which takes your tag variable and a Document instance and add the tag to the document.
What is really expensive here is the multiple conversion of Object structures to XML and back. This has a huge overhead.
File IO is also giving a lot of overhead, but even this should be small compared to the multiple creation and parsing of DOM structures.

put the for loop inside the WriteToXML().
Main function:
file= new File("./test.xml");
WriteToXML wr= new WriteToXML(file,"fullname",candidates)
Inside WriteToXML
public WriteToXML(File xml,String tag,List candidates )
{
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new FileInputStream(new File(xml)));
Element element = doc.getDocumentElement();
NodeList node1 = doc.getElementsByTagName(tag);
Element fn= (Element) node1.item(0);
for (int i=0;i<candidates.size();i++) {
Text text = doc.createTextNode(candidates.get(i)+"\n");
fn.appendChild(text);
}
printtoXML(doc);
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
This way you are not re-parsing the XML all the time and write it only once.
I've tried to do minimal changes. I would not recommend doing this in the constructor - unless there is a good reason to do so.

Use SAX instead of DOM. SAX is more efficient than DOM

as #Masud suggests, Use SAX there is no other way. There is a great example about this Generating XML from an Arbitrary Data Structure

Related

Link XML and XSD using java

i'm trying to write the header for an xml file so it would be something like this:
<file xmlns="http://my_namespace"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://my_namespace file.xsd">
however, I can't seem to find how to do it using the Document class in java. This is what I have:
public void exportToXML() {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder;
try {
dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.newDocument();
doc.setXmlStandalone(true);
doc.createTextNode("<file xmlns=\"http://my_namespace"\n" +
"xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n" +
"xsi:schemaLocation=\"http://my_namespace file.xsd\">");
Element mainRootElement = doc.createElement("MainRootElement");
doc.appendChild(mainRootElement);
for(int i = 0; i < tipoDadosParaExportar.length; i++) {
mainRootElement.appendChild(criarFilhos(doc, tipoDadosParaExportar[i]));
}
Transformer tr = TransformerFactory.newInstance().newTransformer();
tr.transform(new DOMSource(doc),
new StreamResult(new FileOutputStream(filename)));
} catch (Exception e) {
e.printStackTrace();
}
}
I tried writing it on the file using the createTextNode but it didn't work either, it only writes the version before showing the elements.
PrintStartXMLFile
Would appreciate if you could help me. Have a nice day
Your createTextNode() method is only suitable for creating text nodes, it's not suitable for creating elements. You need to use createElement() for this. If you're doing this by building a tree, then you need to build nodes, you can't write lexical markup.
I'm not sure what MainRootElement is supposed to be; you've only given a fragment of your desired output so it's hard to tell.
Creating a DOM tree and then serializing it is a pretty laborious way of constructing an XML file. Using something like an XMLEventWriter is easier. But to be honest, I got frustrated by all the existing approaches and wrote a new library for the purpose as part of Saxon 10. It's called simply "Push", and looks something like this:
Processor proc = new Processor();
Serializer serializer = proc.newSerializer(new File(fileName));
Push push = proc.newPush(serializer);
Document doc = push.document(true);
doc.setDefaultNamespace("http://my_namespace");
Element root = doc.element("root")
.attribute(new QName("xsi", "http://www.w3.org/2001/XMLSchema-instance", "schemaLocation"),
"http://my_namespace file.xsd");
doc.close();

How to Parse Complete XML into a string using DOM PARSER?

For example:
If I pass tagValue=1 then it should return complete xml1 as a string to me String input is in some function.
String input = "<1><xml1></xml1></1><2><xml2></xml2><2>.......<10000><xml10000></xml10000></10000‌​>";
String output = "<xml1></xml1>"; // for tagValue=1;
If the xml has a root element then it is doable
1. Parse the XML using dom parser.
2. Iterate through each node
3. Find the desired node.
4. Write the node in a different xml using transform
Sample code
Step 1: I used XML as a string, you can read from file.
DocumentBuilder dBuilder = DocumentBuilderFactory.newInstance()
.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(uri));
Document doc = dBuilder.parse(is);
Iterate through each node
if (doc.hasChildNodes()) {
printNote(doc.getChildNodes(), doc);
}
Please put in your logic to iterate thorugh the nodes and find the right child node which you want to process.
Write back as xml. Here assumption is that tempNode is the one you want to write as XML.
TransformerFactory tFactory =
TransformerFactory.newInstance();
Transformer transformer;
try {
transformer = tFactory.newTransformer();
DOMSource source = new DOMSource(tempNode);
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);
You are not parsing XML, you are parsing a String, a replace should be enough in your particular case
String input = "<xml1><note><to>Tove</to><from>Jani</from<heading>Reminder</heading><body>Don't forget me this weekend</body></note><xml1>";
input = input.replace("<xml1>", "");

Import and parse an xml file without FileOutputStream

Consider the code fragment that I have at the moment which works and the right elements are found and placed into my map:
public void importXml(InputSource emailAttach)throws Exception {
Map<String, String> hWL = new HashMap<String, String>();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(emailAttach);
FileOutputStream fos=new FileOutputStream("temp.xml");
OutputStreamWriter os = new OutputStreamWriter(fos,"UTF-8");
// Transform to XML UTF-8 format
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
t.transform(new DOMSource(doc), new StreamResult(os));
os.close();
fos.close();
doc = db.parse(new File("temp.xml"));
NodeList nl = doc.getElementsByTagName("Email");
Element eE=(Element)nl.item(0);
int ctr=eE.getChildNodes().getLength();
String sNName;
String sNValue;
Node nTemp;
for (int i=0;i<ctr;i++){
nTemp=eE.getChildNodes().item(i);
sNName=nTemp.getNodeName().toUpperCase().trim();
if (nTemp.getChildNodes().item(0)!=null) {
sNValue=nTemp.getChildNodes().item(0).getNodeValue().trim();
hWL.put(sNName,sNValue);
}
}
}
However I prefer not to create a temp file first after converting the data to UTF-8 and parsing from the temp file. Is there anyway I can do this?
I've tried using a ByteArrayOutputStream in place of OutputStreamWriter, and calling toString() on the ByteArrayOutputStream as such:
doc = db.parse(bos.toString("UTF-8");
But then my Map ends up being empty.
From the API docs (the ability of its meticulous studying is a valuable asset for any programmer) - the parse method with the String argument seems to take something different from what you feed to it:
Document parse(String uri)
Parse the content of the given URI as an XML document and return a new DOM >Document object.
This might be your friend:
db.parse ( new ByteArrayInputStream( bos.toByteArray()));
Update
#user2496748 sorry I should have searched for the API but instead I was looking at the source code through a decompiler which tells me the parameter is arg0 instead of uri. Big difference.
I think I understand stream readers/writers and byte to char or vice versa a little more now.
After some review I was able to simply my code to this and achieve what I wanted to do. Since I am able to get the email attachment as a InputSource:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
emailAttach.setEncoding("UTF-8");
Document doc = db.parse(emailAttach);
Works as well and tested with non-english characters.
You don't need to write and re-read and re-parse the transformed document. Just change this:
t.transform(new DOMSource(doc), new StreamResult(os));
to this:
DOMResult result = new DOMResult();
t.transform(new DOMSource(doc), result);
doc = (Document)result.getNode();
and then continue from after your present doc = db.parse(new File("temp.xml"));.

How to remove extra empty lines from XML file?

In short; i have many empty lines generated in an XML file, and i am looking for a way to remove them as a way of leaning the file. How can i do that ?
For detailed explanation; I currently have this XML file :
<recent>
<paths>
<path>path1</path>
<path>path2</path>
<path>path3</path>
<path>path4</path>
</paths>
</recent>
And i use this Java code to delete all tags, and add new ones instead :
public void savePaths( String recentFilePath ) {
ArrayList<String> newPaths = getNewRecentPaths();
Document recentDomObject = getXMLFile( recentFilePath ); // Get the <recent> element.
NodeList pathNodes = recentDomObject.getElementsByTagName( "path" ); // Get all <path> nodes.
//1. Remove all old path nodes :
for ( int i = pathNodes.getLength() - 1; i >= 0; i-- ) {
Element pathNode = (Element)pathNodes.item( i );
pathNode.getParentNode().removeChild( pathNode );
}
//2. Save all new paths :
Element pathsElement = (Element)recentDomObject.getElementsByTagName( "paths" ).item( 0 ); // Get the first <paths> node.
for( String newPath: newPaths ) {
Element newPathElement = recentDomObject.createElement( "path" );
newPathElement.setTextContent( newPath );
pathsElement.appendChild( newPathElement );
}
//3. Save the XML changes :
saveXMLFile( recentFilePath, recentDomObject );
}
After executing this method a number of times i get an XML file with right results, but with many empty lines after the "paths" tag and before the first "path" tag, like this :
<recent>
<paths>
<path>path5</path>
<path>path6</path>
<path>path7</path>
</paths>
</recent>
Anyone knows how to fix that ?
------------------------------------------- Edit: Add the getXMLFile(...), saveXMLFile(...) code.
public Document getXMLFile( String filePath ) {
File xmlFile = new File( filePath );
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document domObject = db.parse( xmlFile );
domObject.getDocumentElement().normalize();
return domObject;
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
public void saveXMLFile( String filePath, Document domObject ) {
File xmlOutputFile = null;
FileOutputStream fos = null;
try {
xmlOutputFile = new File( filePath );
fos = new FileOutputStream( xmlOutputFile );
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty( OutputKeys.INDENT, "yes" );
transformer.setOutputProperty( "{http://xml.apache.org/xslt}indent-amount", "2" );
DOMSource xmlSource = new DOMSource( domObject );
StreamResult xmlResult = new StreamResult( fos );
transformer.transform( xmlSource, xmlResult ); // Save the XML file.
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (TransformerConfigurationException e) {
e.printStackTrace();
} catch (TransformerException e) {
e.printStackTrace();
} finally {
if (fos != null)
try {
fos.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
First, an explanation of why this happens — which might be a bit off since you didn't include the code that is used to load the XML file into a DOM object.
When you read an XML document from a file, the whitespaces between tags actually constitute valid DOM nodes, according to the DOM specification. Therefore, the XML parser treats each such sequence of whitespaces as a DOM node (of type TEXT);
To get rid of it, there are three approaches I can think of:
Associate the XML with a schema, and then use setValidating(true) along with setIgnoringElementContentWhitespace(true) on the DocumentBuilderFactory.
(Note: setIgnoringElementContentWhitespace will only work if the parser is in validating mode, which is why you must use setValidating(true))
Write an XSL to process all nodes, filtering out whitespace-only TEXT nodes.
Use Java code to do this: use XPath to find all whitespace-only TEXT nodes, iterate through them and remove each one from its parent (using getParentNode().removeChild()). Something like this would do (doc would be your DOM document object):
XPath xp = XPathFactory.newInstance().newXPath();
NodeList nl = (NodeList) xp.evaluate("//text()[normalize-space(.)='']", doc, XPathConstants.NODESET);
for (int i=0; i < nl.getLength(); ++i) {
Node node = nl.item(i);
node.getParentNode().removeChild(node);
}
I was able to fix this by using this code after removing all the old "path" nodes :
while( pathsElement.hasChildNodes() )
pathsElement.removeChild( pathsElement.getFirstChild() );
This will remove all the generated empty spaces in the XML file.
Special thanks to MadProgrammer for commenting with the helpful link mentioned above.
You could look at something like this if you only need to "clean" your xml quickly.
Then you could have a method like:
public static String cleanUp(String xml) {
final StringReader reader = new StringReader(xml.trim());
final StringWriter writer = new StringWriter();
try {
XmlUtil.prettyFormat(reader, writer);
return writer.toString();
} catch (IOException e) {
e.printStackTrace();
}
return xml.trim();
}
Also, to compare anche check differences, if you need it: XMLUnit
I faced the same problem, and I had no idea for the long time, but now, after this Brad's question and his own answer on his own question, I figured out where is the trouble.
I have to add my own answer, because Brad's one isn't really perfect, how Isaac said:
I wouldn't be a huge fan of blindly removing child nodes without knowing what they are
So, better "solution" (quoted because it is more likely workaround) is:
pathsElement.setTextContent("");
This completely removes useless blank lines. It is definitely better than removing all the child nodes. Brad, this should work for you too.
But, this is an effect, not the cause, and we got how to remove this effect, not the cause.
Cause is: when we call removeChild(), it removes this child, but it leaves indent of removed child, and line break too. And this indent_and_like_break is treated as a text content.
So, to remove the cause, we should figure out how to remove child and its indent. Welcome to my question about this.
There is a very simple way to get rid of the empty lines if using an DOM handling API (for example DOM4J):
place the text you want to keep in a variable(ie text)
set the node text to "" using node.setText("")
set the node text to text using node.setText(text)
et voila! there are no more empty lines. The other answers delineate very well how the extra empty lines in the xml output are actually extra nodes of type text.
This technique can be used with any DOM parsing system, so long as the name of the text setting function is changed to suit the one in your API, hence the way of representing it slightly more abstractly.
Hope this helps:)
When i used dom4j to remove some elements and i met the same question,the solution above not useful without adding some other required jars.Finally,i find out a simple solution only need to use JDK io pakage:
use BufferedReader to read the xml file and filter empty lines.
StringBuilder stringBuilder = new StringBuilder();
FileInputStream fis = new FileInputStream(outFile);
InputStreamReader isr = new InputStreamReader(fis);
BufferedReader br = new BufferedReader(isr);
String s;
while ((s = br.readLine()) != null) {
if (s.trim().length() > 0) {
stringBuilder.append(s).append("\n");
}
}
write the string to the xml file
OutputStreamWriter osw = new OutputStreamWriter(fou);
BufferedWriter bw = new BufferedWriter(osw);
String str = stringBuilder.toString();
bw.write(str);
bw.flush();
remember to close all the stream
In my case, I converted it to a string then just did a regex:
//save as String
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
tr.transform(new DOMSource(document), result);
strResult = writer.toString();
//remove empty lines
strResult = strResult.replaceAll("\\n\\s*\\n", "\n");
Couple of remarks:
1) When your are manipulating XML (removing elements / adding new one) I strongly advice you to use XSLT (and not DOM)
2) When you tranform a XML Document by XSLT (as you do in your save method), set the OutputKeys.INDENT to "no"
3) For simple post processing of your xml (removing white space, comments, etc.) you can use a simple SAX2 filter
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setIgnoringElementContentWhitespace(true);
I am using below code:
System.out.println("Start remove textnode");
i=0;
while (parentNode.getChildNodes().item(i)!=null) {
System.out.println(parentNode.getChildNodes().item(i).getNodeName());
if (parentNode.getChildNodes().item(i).getNodeName().equalsIgnoreCase("#text")) {
parentNode.removeChild(parentNode.getChildNodes().item(i));
System.out.println("text node removed");
}
i=i+1;
}
Very late answer, but maybe it is still helpful to someone.
I had this code in my class, where the document is built after transformation (Just like you):
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
Change the last line to
transformer.setOutputProperty(OutputKeys.INDENT, "no");

Bad Characters when parsing GML in Java

I'm using the org.w3c.dom package to parse the gml schemas (http://schemas.opengis.net/gml/3.1.0/base/).
When I parse the gmlBase.xsd schema and then save it back out, the quote characters around GeometryCollections in the BagType complex type come out converted to bad characters (See code below).
Is there something wrong with how I'm parsing or saving the xml, or is there something in the schema that is off?
Thanks,
Curtis
public static void main(String[] args) throws IOException
{
File schemaFile = File.createTempFile("gml_", ".xsd");
FileUtils.writeStringToFile(schemaFile, getSchema(new URL("http://schemas.opengis.net/gml/3.1.0/base/gmlBase.xsd")));
System.out.println("wrote file: " + schemaFile.getAbsolutePath());
}
public static String getSchema(URL schemaURL)
{
try
{
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(new StringReader(IOUtils.toString(schemaURL.openStream()))));
Element rootElem = doc.getDocumentElement();
rootElem.normalize();
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
DOMSource source = new DOMSource(doc);
ByteArrayOutputStream xmlOutStream = new ByteArrayOutputStream();
StreamResult result = new StreamResult(xmlOutStream);
transformer.transform(source, result);
return xmlOutStream.toString();
}
catch (Exception e)
{
e.printStackTrace();
}
return "";
}
I'm suspicious of this line:
Document doc = db.parse(new InputSource(
new StringReader(IOUtils.toString(schemaURL.openStream()))));
I don't know what IOUtils.toString does here but presumably it's assuming a particular encoding, without taking account of the XML declaration.
Why not just use:
Document doc = db.parse(schemaURL.openStream());
Likewise your FileUtils.writeStringToFile doesn't appear to specify a character encoding... which encoding does it use, and why encoding is in the StreamResult?

Categories