Strange characters appended at beginning of file in Hadoop - java

Whenever I create a new file in Hadoop using Java, and write the contents, special characters are appended at the beginning of the file. Is there a way to eliminate? Below is the code
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(document), new StreamResult(writer));
String extractedXML = writer.getBuffer().toString().replaceAll("\\r$", "");
FSDataOutputStream fin = fs.create("/filelocation/input.txt");
fin.writeUTF(extractedXML);
fin.close();
$ hadoop fs -cat /filelocation/input.txt|head -5
)â–’hello world
input1
hello again
hello
welcome again

It worked for me, just by replacing the below lines
FSDataOutputStream fin = fs.create("/filelocation/input.txt");
fin.writeUTF(extractedXML);
fin.close();
with below code :
OutputStream os = fs.create( "/filelocation/input.txt", new Progressable() {
public void progress() {
}
});
BufferedWriter br = new BufferedWriter( new OutputStreamWriter( os, "UTF-8" ) );
br.write(extractedXML);
br.close();

Related

Print the syncml payload to a xml file

I want to print the syncml payload in a xml file before using it. Is there a method in java to print the syncml payload into a xml file or a way to check the payload?
Solved by converting it to a string. (able to find more details from --> http://www.theserverside.com/news/thread.tss?thread_id=26060)
public void printXML(Document request){
try
{
DOMSource domSource = new DOMSource(request);
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.transform(domSource, result);
String checkpayload = writer.toString();
System.out.print(checkpayload);
}
catch(TransformerException ex)
{
ex.printStackTrace();
}
}

Reading Transformer result line by line (Java)

I'm using Transformer to prettify and to insert indentation to an XML which is originally one big line.
Here is my code:
BufferedWriter br = null;
Source xmlInput = new StreamSource(inputSR);
StringWriter stringWriter = new StringWriter();
StreamResult xmlOutput = new StreamResult(stringWriter);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
transformerFactory.setAttribute("indent-number", 2);
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.transform(xmlInput, xmlOutput);
How can I write the xmlOutput to a file, line by line (without loading the whole string to the memory)?
Instead of a StringWriter, use a FileOutputStream:
StreamResult xmlOutput = new StreamResult(new FileOutputStream("output.xml"))
This will write incrementally to the file. It won't necessarily write line-by-line (lines don't mean much in XML), but I can't see why you would want that.

Storing the contents in a String variable rather than a file

I have a process of XML transform in which I am writing the output transformed XML to a file. But instead of storing it in a file I want to store it in a string variable. I have created a string variable, please advise how can I store the generated XML in a string variable (msgxml instead of writing a file).
String msgxml;
System.setProperty("javax.xml.transform.TransformerFactory",
"org.apache.xalan.processor.TransformerFactoryImpl");
FileInputStream xml = new FileInputStream(xmlInput);
FileInputStream xsl = new FileInputStream(xslInput);
FileOutputStream os = new FileOutputStream(outputXmlFile);
TransformerFactory tFactory = TransformerFactory.newInstance();
// Use the TransformerFactory to process the stylesheet source and produce a Transformer
StreamSource styleSource = new StreamSource(xsl);
Transformer transformer = tFactory.newTransformer(styleSource);
StreamSource xmlSource = new StreamSource(xml);
StreamResult result = new StreamResult(os);
//here we are storing it in a file ,
try {
transformer.transform(xmlSource, result);
} catch (TransformerException e) {
e.printStackTrace();
}
One way is to use an ByteArrayOutputStream instead of a FileOutputStream:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
TransformerFactory tFactory = TransformerFactory.newInstance();
...
StreamSource xmlSource = new StreamSource(xml);
StreamResult result = new StreamResult(baos); // write to the byte array stream
//here we are storing it in a file ,
try {
transformer.transform(xmlSource, result);
}
...
msgxml = baos.toString("UTF-8"); // get contents of stream using UTF-8 encoding
Another solution is to use a java.io.StringWriter:
StringWriter stringWriter = new StringWriter();
StreamResult result = new StreamResult(stringWriter);
...
msgxml = stringWriter.toString();

StringWriter write to folder

Good Afternoon. I want to use the StringWriter to write the new file to a network folder. Can anyone give me some examples using the code below on how to do this? It's my first time working with the StringWriter class.
public static final void newOutput(Document xml) throws Exception {
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
tf.setOutputProperty(OutputKeys.INDENT, "yes");
StringWriter out = new StringWriter();
tf.transform(new DOMSource(xml), new StreamResult(out));
/*
* need to update to write to folder
*/
System.out.println(out.toString());
}
}
public static final void newOutput(Document xml) throws Exception {
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
tf.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(xml);
// Use StreamResult to write to a file to current directory
StreamResult out = new StreamResult(new File("test.txt"));
// to print to console
// StreamResult out = new StreamResult(System.out);
tf.transform(source, out);
/*
* console output is redirected to SRC folder to check format
* need to update to write to folder
*/
System.out.println(out.toString());
}

XML Document to String?

I've been fiddling with this for over twenty minutes and my Google-foo is failing me.
Let's say I have an XML Document created in Java (org.w3c.dom.Document):
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document document = docBuilder.newDocument();
Element rootElement = document.createElement("RootElement");
Element childElement = document.createElement("ChildElement");
childElement.appendChild(document.createTextNode("Child Text"));
rootElement.appendChild(childElement);
document.appendChild(rootElement);
String documentConvertedToString = "?" // <---- How?
How do I convert the document object into a text string?
public static String toString(Document doc) {
try {
StringWriter sw = new StringWriter();
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.transform(new DOMSource(doc), new StreamResult(sw));
return sw.toString();
} catch (Exception ex) {
throw new RuntimeException("Error converting to String", ex);
}
}
You can use this piece of code to accomplish what you want to:
public static String getStringFromDocument(Document doc) throws TransformerException {
DOMSource domSource = new DOMSource(doc);
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.transform(domSource, result);
return writer.toString();
}

Categories