How to remove extra empty lines from XML file? - java

In short; i have many empty lines generated in an XML file, and i am looking for a way to remove them as a way of leaning the file. How can i do that ?
For detailed explanation; I currently have this XML file :
<recent>
<paths>
<path>path1</path>
<path>path2</path>
<path>path3</path>
<path>path4</path>
</paths>
</recent>
And i use this Java code to delete all tags, and add new ones instead :
public void savePaths( String recentFilePath ) {
ArrayList<String> newPaths = getNewRecentPaths();
Document recentDomObject = getXMLFile( recentFilePath ); // Get the <recent> element.
NodeList pathNodes = recentDomObject.getElementsByTagName( "path" ); // Get all <path> nodes.
//1. Remove all old path nodes :
for ( int i = pathNodes.getLength() - 1; i >= 0; i-- ) {
Element pathNode = (Element)pathNodes.item( i );
pathNode.getParentNode().removeChild( pathNode );
}
//2. Save all new paths :
Element pathsElement = (Element)recentDomObject.getElementsByTagName( "paths" ).item( 0 ); // Get the first <paths> node.
for( String newPath: newPaths ) {
Element newPathElement = recentDomObject.createElement( "path" );
newPathElement.setTextContent( newPath );
pathsElement.appendChild( newPathElement );
}
//3. Save the XML changes :
saveXMLFile( recentFilePath, recentDomObject );
}
After executing this method a number of times i get an XML file with right results, but with many empty lines after the "paths" tag and before the first "path" tag, like this :
<recent>
<paths>
<path>path5</path>
<path>path6</path>
<path>path7</path>
</paths>
</recent>
Anyone knows how to fix that ?
------------------------------------------- Edit: Add the getXMLFile(...), saveXMLFile(...) code.
public Document getXMLFile( String filePath ) {
File xmlFile = new File( filePath );
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document domObject = db.parse( xmlFile );
domObject.getDocumentElement().normalize();
return domObject;
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
public void saveXMLFile( String filePath, Document domObject ) {
File xmlOutputFile = null;
FileOutputStream fos = null;
try {
xmlOutputFile = new File( filePath );
fos = new FileOutputStream( xmlOutputFile );
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty( OutputKeys.INDENT, "yes" );
transformer.setOutputProperty( "{http://xml.apache.org/xslt}indent-amount", "2" );
DOMSource xmlSource = new DOMSource( domObject );
StreamResult xmlResult = new StreamResult( fos );
transformer.transform( xmlSource, xmlResult ); // Save the XML file.
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (TransformerConfigurationException e) {
e.printStackTrace();
} catch (TransformerException e) {
e.printStackTrace();
} finally {
if (fos != null)
try {
fos.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}

First, an explanation of why this happens — which might be a bit off since you didn't include the code that is used to load the XML file into a DOM object.
When you read an XML document from a file, the whitespaces between tags actually constitute valid DOM nodes, according to the DOM specification. Therefore, the XML parser treats each such sequence of whitespaces as a DOM node (of type TEXT);
To get rid of it, there are three approaches I can think of:
Associate the XML with a schema, and then use setValidating(true) along with setIgnoringElementContentWhitespace(true) on the DocumentBuilderFactory.
(Note: setIgnoringElementContentWhitespace will only work if the parser is in validating mode, which is why you must use setValidating(true))
Write an XSL to process all nodes, filtering out whitespace-only TEXT nodes.
Use Java code to do this: use XPath to find all whitespace-only TEXT nodes, iterate through them and remove each one from its parent (using getParentNode().removeChild()). Something like this would do (doc would be your DOM document object):
XPath xp = XPathFactory.newInstance().newXPath();
NodeList nl = (NodeList) xp.evaluate("//text()[normalize-space(.)='']", doc, XPathConstants.NODESET);
for (int i=0; i < nl.getLength(); ++i) {
Node node = nl.item(i);
node.getParentNode().removeChild(node);
}

I was able to fix this by using this code after removing all the old "path" nodes :
while( pathsElement.hasChildNodes() )
pathsElement.removeChild( pathsElement.getFirstChild() );
This will remove all the generated empty spaces in the XML file.
Special thanks to MadProgrammer for commenting with the helpful link mentioned above.

You could look at something like this if you only need to "clean" your xml quickly.
Then you could have a method like:
public static String cleanUp(String xml) {
final StringReader reader = new StringReader(xml.trim());
final StringWriter writer = new StringWriter();
try {
XmlUtil.prettyFormat(reader, writer);
return writer.toString();
} catch (IOException e) {
e.printStackTrace();
}
return xml.trim();
}
Also, to compare anche check differences, if you need it: XMLUnit

I faced the same problem, and I had no idea for the long time, but now, after this Brad's question and his own answer on his own question, I figured out where is the trouble.
I have to add my own answer, because Brad's one isn't really perfect, how Isaac said:
I wouldn't be a huge fan of blindly removing child nodes without knowing what they are
So, better "solution" (quoted because it is more likely workaround) is:
pathsElement.setTextContent("");
This completely removes useless blank lines. It is definitely better than removing all the child nodes. Brad, this should work for you too.
But, this is an effect, not the cause, and we got how to remove this effect, not the cause.
Cause is: when we call removeChild(), it removes this child, but it leaves indent of removed child, and line break too. And this indent_and_like_break is treated as a text content.
So, to remove the cause, we should figure out how to remove child and its indent. Welcome to my question about this.

There is a very simple way to get rid of the empty lines if using an DOM handling API (for example DOM4J):
place the text you want to keep in a variable(ie text)
set the node text to "" using node.setText("")
set the node text to text using node.setText(text)
et voila! there are no more empty lines. The other answers delineate very well how the extra empty lines in the xml output are actually extra nodes of type text.
This technique can be used with any DOM parsing system, so long as the name of the text setting function is changed to suit the one in your API, hence the way of representing it slightly more abstractly.
Hope this helps:)

When i used dom4j to remove some elements and i met the same question,the solution above not useful without adding some other required jars.Finally,i find out a simple solution only need to use JDK io pakage:
use BufferedReader to read the xml file and filter empty lines.
StringBuilder stringBuilder = new StringBuilder();
FileInputStream fis = new FileInputStream(outFile);
InputStreamReader isr = new InputStreamReader(fis);
BufferedReader br = new BufferedReader(isr);
String s;
while ((s = br.readLine()) != null) {
if (s.trim().length() > 0) {
stringBuilder.append(s).append("\n");
}
}
write the string to the xml file
OutputStreamWriter osw = new OutputStreamWriter(fou);
BufferedWriter bw = new BufferedWriter(osw);
String str = stringBuilder.toString();
bw.write(str);
bw.flush();
remember to close all the stream

In my case, I converted it to a string then just did a regex:
//save as String
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
tr.transform(new DOMSource(document), result);
strResult = writer.toString();
//remove empty lines
strResult = strResult.replaceAll("\\n\\s*\\n", "\n");

Couple of remarks:
1) When your are manipulating XML (removing elements / adding new one) I strongly advice you to use XSLT (and not DOM)
2) When you tranform a XML Document by XSLT (as you do in your save method), set the OutputKeys.INDENT to "no"
3) For simple post processing of your xml (removing white space, comments, etc.) you can use a simple SAX2 filter

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setIgnoringElementContentWhitespace(true);

I am using below code:
System.out.println("Start remove textnode");
i=0;
while (parentNode.getChildNodes().item(i)!=null) {
System.out.println(parentNode.getChildNodes().item(i).getNodeName());
if (parentNode.getChildNodes().item(i).getNodeName().equalsIgnoreCase("#text")) {
parentNode.removeChild(parentNode.getChildNodes().item(i));
System.out.println("text node removed");
}
i=i+1;
}

Very late answer, but maybe it is still helpful to someone.
I had this code in my class, where the document is built after transformation (Just like you):
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
Change the last line to
transformer.setOutputProperty(OutputKeys.INDENT, "no");

Related

Edit a XML file in java

I have the following XML file:
<Tables>
<table>
<row></row>
</table>
<Tables>
and I want to edit it to :
<Tables>
<table>
<row>some value</row>
</table>
<Tables>
I write the XML file using file writer. How can I edit it?
What I was found that I create a temp file contains edits then delete the original file and rename the temp file. Is there any other way?
that's my code to write the file:
public boolean createTable(String path, String name, String[] properties) throws IOException {
FileWriter writer = new FileWriter(path);
writer.write("<Tables>");
writer.write("\t<" + name + ">");
for(int i=0; i<properties.length; i++){
writer.write("\t\t<" + properties[0] + "></" + properties[0] + ">");
}
writer.write("\t</" + name + ">");
writer.write("</Tables>");
writer.close();
return false;
}
Don't read and write XML yourself. Java comes with multiple API's for parsing and generating XML, which takes care of all the encoding and escaping issues for you:
DOM XML is loaded into memory in a tree structure.
SAX XML is processed as a sequence of events. This is a push-parser, where the parser calls your code for each event.
StAX XML is read as a sequence of events/tokens. This is a pull-parser, where your code calls the parser to get next value.
You can also find many third-party libraries for parsing XML, and Java itself also supports marshalling of XML to POJO's.
In your case I'd suggest DOM, since it's easiest to use. Don't use DOM for huge XML files, since it loads the entire file into memory. For huge files, I'd suggest StAX.
Other than encoding issues, using an XML parser will make the code less susceptible to minor variations in the input, e.g. the 3 empty row elements below all mean the same. Or is the row element even empty, and how to get rid of existing content like shown:
<!-- row is empty -->
<row></row>
<row/>
<row />
<!-- row has content -->
<row>5 + 7 < 10</row>
<row><![CDATA[5 + 7 < 10]]></row>
<row><condition expr="5 + 7 < 10"></row>
Using DOM:
// Load XML from file
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder domBuilder = domFactory.newDocumentBuilder();
Document document = domBuilder.parse(file);
// Modify DOM tree (simple version)
NodeList rowNodes = document.getElementsByTagName("row");
for (int i = 0; i < rowNodes.getLength(); i++) {
Node rowNode = rowNodes.item(i);
// Remove existing content (if any)
while (rowNode.getFirstChild() != null)
rowNode.removeChild(rowNode.getFirstChild());
// Add text content
rowNode.appendChild(document.createTextNode("some value"));
}
// Save XML to file
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(document),
new StreamResult(file));
if your xml is static you can use this, here input.xml is your xml file
File file = new File("input.xml");
byte[] data;
try (FileInputStream fis = new FileInputStream(file)) {
data = new byte[(int) file.length()];
fis.read(data);
}
String input = new String(data, "UTF-8");
String tag = "<row>";
String newXML = input.substring(0, input.indexOf(tag) + tag.length()) + "your value" + input.substring(input.indexOf(tag) + tag.length(), input.length());
try (FileWriter fw = new FileWriter(file)) {
fw.write(newXML);
}
System.out.println("XML Updated");

Keep numeric character entity characters such as ` ` when parsing XML in Java

I am parsing XML that contains numeric character entity characters such as (but not limited to)
< > (line feed carriage return < >) in Java. While parsing, I am appending text content of nodes to a StringBuffer to later write it out to a textfile.
However, these unicode characters are resolved or transformed into newlines/whitespace when I write the String to a file or print it out.
How can I keep the original numeric character entity characters symbols when iterating over nodes of an XML file in Java and storing the text content nodes to a String?
Example of demo xml file:
<?xml version="1.0" encoding="UTF-8"?>
<ABCD version="2">
<Field attributeWithChar="A string followed by special symbols
" />
</ABCD>
Example Java code. It loads the XML, iterates over the nodes and collects the text content of each node to a StringBuffer. After the iteration is over, it writes the StringBuffer to the console and also to a file (but no
) symbols.
What would be a way to keep these symbols when storing them to a String? Could you please help me? Thank you.
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, TransformerException {
DocumentBuilderFactory documentFactory = DocumentBuilderFactory.newInstance();
Document document = null;
DocumentBuilder documentBuilder = documentFactory.newDocumentBuilder();
document = documentBuilder.parse(new File("path/to/demo.xml"));
StringBuilder sb = new StringBuilder();
NodeList nodeList = document.getElementsByTagName("*");
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
NamedNodeMap nnp = node.getAttributes();
for (int j = 0; j < nnp.getLength(); j++) {
sb.append(nnp.item(j).getTextContent());
}
}
}
System.out.println(sb.toString());
try (Writer writer = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream("path/to/demo_output.xml"), "UTF-8"))) {
writer.write(sb.toString());
}
}
You need to escape all the XML entities before parsing the file into a Document. You do that by escaping the ampersand & itself with its corresponding XML entity &. Something like,
DocumentBuilder documentBuilder =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
String xmlContents = new String(Files.readAllBytes(Paths.get("demo.xml")), "UTF-8");
Document document = documentBuilder.parse(
new InputSource(new StringReader(xmlContents.replaceAll("&", "&"))
));
Output :
2A string followed by special symbols
P.S. This is complement of Ravi Thapliyal's answer, not an alternative.
I am having the same problem with handling an XML file which is exported from 2003 format Excelsheet. This XML file stores line-breaks in text contents as
along with other numeric character references. However, after reading it with Java DOM parser, manipulating the content of some elements and transforming it back to the XML file, I see that all the numeric character references are expanded (i.e. The line-break is converted to CRLF) in Windows with J2SE1.6. Since my goal is to keep the content format unchanged as much as possible while manipulating some elements (i.e. retain numeric character references), Ravi Thapliyal's suggestion seems to be the only working solution.
When writing the XML content back to the file, it is necessary to replace all & with &, right? To do that, I had to give a StringWriter to the transformer as StreamResult and obtain String from it, replace all and dump the string to the xml file.
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
DOMSource source = new DOMSource(document);
//write into a stringWriter for further processing.
StringWriter stringWriter = new StringWriter();
StreamResult result = new StreamResult(stringWriter);
t.transform(source, result);
//stringWriter stream contains xml content.
String xmlContent = stringWriter.getBuffer().toString();
//revert "&" back to "&" to retain numeric character references.
xmlContent = xmlContent.replaceAll("&", "&");
BufferedWriter wr = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outputFile), "UTF-8"));
wr.write(xmlContent);
wr.close();

Writing text to XML is taking long time --Java

I have some 3000 elements in arraylist.Need to write these 3000 elements to xml.
The execution is taking too much time like 20 minutes in eclipse.
Is there any efficient way to do this?
OR any modifications to my code?
The elements in arraylist are supposed to grow in future...
MY code snippet..
---------------------------------------------------
---------------------------------------------------
for(int i=0;i<candidates.size();i++)//Candidates is my arraylist
{
String text=candidates.get(i);
//System.out.println(text);
text=text+"\n";
file= new File("./test.xml");
WriteToXML wr= new WriteToXML(file,"fullname",text);
}
-------------------------------------------------------
-------------------------------------------------------
//WritetoXML class constructor
public WriteToXML(File xml,String tag,String data)
{
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new FileInputStream(new File(xml)));
Element element = doc.getDocumentElement();
NodeList node1 = doc.getElementsByTagName(tag);
Element fn= (Element) node1.item(0);
Text text = doc.createTextNode(data);
fn.appendChild(text);
printtoXML(doc);
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
public static final void printtoXML(Document xml) throws Exception {
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
tf.setOutputProperty(OutputKeys.INDENT, "yes");
StringWriter sw = new StringWriter();
StreamResult result = new StreamResult(sw);
DOMSource source = new DOMSource(xml);
tf.transform(source, result);
String xmlString = sw.toString();
File file= new File(xml);
FileWriter fw=new FileWriter(file,false);
BufferedWriter bw = new BufferedWriter(fw);
bw.write(xmlString);
bw.flush();
bw.close();
}
Right now you are doing this for each of your 3000 elements:
Open the file
Parse the document in there
add an element to the dom structure
flush the file to disk and close it
Way faster would be to do steps 1, 2 and 4 only once (1 & 2 before the loop; 4 after the loop) and just do step 3 for every element in your list (in the loop).
Just write a new method which takes your tag variable and a Document instance and add the tag to the document.
What is really expensive here is the multiple conversion of Object structures to XML and back. This has a huge overhead.
File IO is also giving a lot of overhead, but even this should be small compared to the multiple creation and parsing of DOM structures.
put the for loop inside the WriteToXML().
Main function:
file= new File("./test.xml");
WriteToXML wr= new WriteToXML(file,"fullname",candidates)
Inside WriteToXML
public WriteToXML(File xml,String tag,List candidates )
{
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new FileInputStream(new File(xml)));
Element element = doc.getDocumentElement();
NodeList node1 = doc.getElementsByTagName(tag);
Element fn= (Element) node1.item(0);
for (int i=0;i<candidates.size();i++) {
Text text = doc.createTextNode(candidates.get(i)+"\n");
fn.appendChild(text);
}
printtoXML(doc);
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
This way you are not re-parsing the XML all the time and write it only once.
I've tried to do minimal changes. I would not recommend doing this in the constructor - unless there is a good reason to do so.
Use SAX instead of DOM. SAX is more efficient than DOM
as #Masud suggests, Use SAX there is no other way. There is a great example about this Generating XML from an Arbitrary Data Structure

XML Transformation from Java Code

I am doing transformation of an XML File using XSL through Java code. I used this tutorial for the same. Now the problem is that the output file is created but it has no contents in it. I am also adding the code snippet. Kindly check it and tell me what am I missing :
TransformerFactory factory = TransformerFactory.newInstance();
System.out.println("In transform");
File temp = new File(xslFile);
if(temp.exists()){
System.out.println("File found");
}
StreamSource xsl = new StreamSource(temp);
Transformer transformer = factory.newTransformer(xsl);
temp = new File(xmlFile);
if(temp.exists()){
System.out.println("Found Again!!");
}
StreamSource xml = new StreamSource(temp);
temp = new File(outputFile);
if(temp.createNewFile()){
System.out.println("New File Created");
}
StreamResult output = new StreamResult(temp);
transformer.transform(xml, output);
Here, the xslFile, xmlFile and outputFile are string and passed as parameters to the method.
Try to declare three different variables, one for each file. Code will look clearer and you most likely will avoid some unexpected behaviour, as you don't know how is Transformer using those three references.

How to 'transform' a String object (containing XML) to an element on an existing JSP page

Currently, I have a String object that contains XML elements:
String carsInGarage = garage.getCars();
I now want to pass this String as an input/stream source (or some kind of source), but am unsure which one to choose and how to implement it.
Most of the solutions I have looked at import the package: javax.xml.transform and accept a XML file (stylerXML.xml) and output to a HTML file (outputFile.html) (See code below).
try
{
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer(new StreamSource("styler.xsl"));
transformer.transform(new StreamSource("stylerXML.xml"), new StreamResult(new FileOutputStream("outputFile.html")));
}
catch (Exception e)
{
e.printStackTrace();
}
I want to accept a String object and output (using XSL) to a element within an existing JSP page. I just don't know how to implement this, even having looked at the code above.
Can someone please advise/assist. I have searched high and low for a solution, but I just can't pull anything out.
Use a StringReader and a StringWriter:
try {
StringReader reader = new StringReader("<xml>blabla</xml>");
StringWriter writer = new StringWriter();
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer(
new javax.xml.transform.stream.StreamSource("styler.xsl"));
transformer.transform(
new javax.xml.transform.stream.StreamSource(reader),
new javax.xml.transform.stream.StreamResult(writer));
String result = writer.toString();
} catch (Exception e) {
e.printStackTrace();
}
If at some point you want the source to contain more than just a single string, or you don't want to generate the XML wrapper element manually, create a DOM document that contains your source and pass it to the transformer using a DOMSource.
This worked for me.
String str = "<my>xml</my>"
StreamSource src = new StreamSource(new StringReader(str));
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Result res = new StreamResult(baos);
transformer.transform(src, res);

Categories