I am trying to get the DOM element from a UTF-8 Encoded XML parsed file containing arabic characters.
The below method take the parsed xml string and is supposed to return the Document.
here is a link to the xml:
http://212.12.165.44:7201/UniNews121.xml
public Document getDomElement(String xml){
Document doc = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
StringReader xmlstring=new StringReader(xml);
is.setCharacterStream(xmlstring);
is.setEncoding("UTF-8");
//APP CRASHES HERE
doc = db.parse(is);
} catch (ParserConfigurationException e) {
Log.e("Error: ", e.getMessage());
return null;
} catch (SAXException e) {
Log.e("Error: ", e.getMessage());
return null;
} catch (IOException e) {
Log.e("Error: ", e.getMessage());
return null;
}
// return DOM
return doc;
}
Error:
09-18 13:36:20.031: E/Error:(3846): Unexpected token (position:TEXT xml version="1.0...#2:1 in java.io.InputStreamReader#4144ac08)
I would appreciate your help but please be specific in your answers
It happened a lot of times to me, you should double check the encoding of the file you are opening. I suggest you to test this with a local copy of the file where you set the encoding by hand.
Related
I've been reading over this SAXParseException error for a while and trying to resolve it but to no success.
[Fatal Error] schedule.xml:1:1: Premature end of file.
org.xml.sax.SAXParseException; systemId: file:/U:/schedule.xml; lineNumber: 1; columnNumber: 1; Premature end of file.
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at package.DOMclass.importDocument(DOMclass.java:681) //<---.parse(new File(fileToImport));
...
I've read how it could be the stream being used is empty (or xml file being read is empty), as well as it can't be reused? Also, the problem of reading and writing back to the same file. Here's a link that explains it but I still don't understand.
Also, another possibility could be that my xml is malformed/not correct but I've looked over it and it's fine, the tags are good and I've encoded it in UTF-8 (without BOM) in Notepad++ since some posts have said there might be hidden whitespace before the xml declaration but again, no success.
This is my reading in code
public TestClass(){
Node root = DOMclass.importDocument("U:\\schedule.xml");
... read nodes, attribute values, etc..
}
public static Document importDocument(String fileToImport)
{
Document document = null;
try
{
document = DocumentBuilderFactory.newInstance().newDocumentBuilder()
.parse(new File(fileToImport)); //<--- where the error is at
//Do I need a Reader here? to close?
}
catch(SAXException saxe)
{
saxe.printStackTrace();
}
catch(IOException ioe)
{
ioe.printStackTrace();
}
catch(ParserConfigurationException pce)
{
pce.printStackTrace();
}
return document;
}
This is my writing out.
public static void addNode(String name, String lastName,...){
String filepath = "U;\\schedule.xml";
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder;
try {
docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.parse(filepath);
...
... make/traverse xml nodes, elements, appending
...
}
DOMUtils.exportDocument(doc, filepath);
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
public static void exportDocument(Document documentToExport, String fileName)
{
Transformer transformer = null;
DOMSource domSource = null;
StreamResult streamResult = null;
try
{
BufferedWriter out = new BufferedWriter(new FileWriter(fileName));
transformer = TransformerFactory.newInstance().newTransformer();
domSource = new DOMSource(documentToExport);
streamResult = new StreamResult(out);
System.out.println("\nStream Result: " + streamResult);
transformer.transform(domSource, streamResult);
out.close();
}
catch (IOException e)
{
e.printStackTrace();
}
catch(TransformerException e)
{
e.printStackTrace();
}
finally
{
transformer = null;
domSource = null;
streamResult = null;
}
}
}
Any thoughts? Thank you.
I have been recently trying to read from a xml file. I believe I am parsing correctly but it returns null,however not always.I received the input that I want 1 out of 100 executions, with no drastic changes on code.This is the code that receives the file path of xml file(also file in code)
public Document getXmlDoc(String filePath) {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
Document doc = null;
try {
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
doc = dBuilder.parse(new File(filePath));
doc.getDocumentElement().normalize();
System.out.println(doc.getDocumentElement().getNodeValue());//null
System.out.println(doc.getDocumentElement().getChildNodes().item(1));//null
System.out.println(doc.getDocumentElement().getLastChild().getAttributes().getNamedItem("id"));//not null and correct
System.out.println(doc.getDocumentElement().getElementsByTagName("entry").item(1));//null
} catch (ParserConfigurationException e) {
e.printStackTrace();//pipe these
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return doc;
}
Also I call my method here and turn my node to string
NodeList nList = getXmlDoc(newFilePath).getElementsByTagName("entry");
System.out.println(nodeToString(nList.item(i)));
Where nList is a node list and nodeToString method is like this:
private String nodeToString(Node node){
StringWriter sw = new StringWriter();
try{
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION,"yes");
t.transform(new DOMSource(node),new StreamResult(sw));
} catch (TransformerException e) {
e.printStackTrace();
}
return sw.toString();
}
When I run HP fortify the following code is given as a XML External Entity injection.Problem line is specified as Error Line.Any Help is appreciated.
private Document parseXmlString(String stringname, boolean validating) {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(validating);
ByteArrayInputStream is = new ByteArrayInputStream(stringname.getBytes());
Document doc = factory.newDocumentBuilder().parse(is);//Error Line
return doc;
} catch (SAXException e) {
// A parsing error occurred; the xml input is not valid
} catch (ParserConfigurationException e) {
} catch (IOException e) {
}
return null;
}
I hope this is what you are looking for:
https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing
Here is my code snippet:
public EpsXmlParser(String xmlAnswer) {
DOMParser parser = new DOMParser();
try {
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xmlAnswer));
parser.parse(is);
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
Document doc = parser.getDocument();
parseXmlDoc(doc.getDocumentElement());
}
In this code block String xmlAnswer is actually an UTF-8 encoding xml taken from another machine using web service client program.
I realized when debugging that the problem here is after parser.getDocument() method implemented Document doc is being null.
I can not fix the problem. Please help me what should I do?
I can not get any exception. Code runs pretty well but Document doc will be like this (look at the snapshot below). I can not understand what is the problem is. Any help will be appreciated.
I used an XML like this. Is this XML format standard? If it is standard how can not I get any exception while using the predefined xml parsing codes.
<?xml version="1.0" encoding="UTF-8"?>
<extra_result><status>00</status><data><transaction_id>3c704f15-7c09-4bba-9046- ffbdb8c97b51</transaction_id><card_status>11</card_status><status_msg>Kart numarası yanlış. </status_msg><card_no>48422</card_no><name_surname></name_surname><gsm></gsm><bonus></bonus></data></extra_result>
I have managed to run the code with no problem by providing an example XML string document.
It suggests the problem is most likely with the XML string itself given in the xmlAnswer argument.
In order to see where exactly is your problem try to change your code and run the following:
public EpsXmlParser(String xmlAnswer) {
DOMParser parser = new DOMParser();
try {
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xmlAnswer));
parser.parse(is);
} catch (Exception e) {
throw new RuntimeException("Error on parsing document", e);
}
Document doc = parser.getDocument();
parseXmlDoc(doc.getDocumentElement());
}
I expect the exception will be thrown with actual reason of parsing error.
I want to parse the following xml structure:
<?xml version="1.0" encoding="utf-8"?>
<documents>
<document>
<element name="title">
<value><![CDATA[Personnel changes: Müller]]></value>
</element>
</document>
</documents>
For parsing this element name="????? structure I use XPath in the following way:
XPath xPath = XPathFactory.newInstance().newXPath();
String currentString = (String) xPath.evaluate("/documents/document/element[#name='title']/value",pCurrentXMLAsDOM, XPathConstants.STRING);
The parsing itself works fine, but there are just some problems with german umlauts (vowels) like "Ü", "ß" or something like this. When I print out currentString the String is:
Personnel changes: Müller
But I want to have the String like in the Xml:
Personnel changes: Müller
Just to add: I cant change the content of the xml file, I have to parse it like I get it, so I definitely have to parse everey String in the correct way.
Sounds like an encoding problem. The XML is UTF-8 encoded Unicode which you seem to print encoded as ISO-8859-1. Check the encoding settings of your Java source.
Edit: See Setting the default Java character encoding? for how to set file.encoding.
I found a good and fast solution now:
public static String convertXMLToString(File pCurrentXML) {
InputStream is = null;
try {
is = new FileInputStream(pCurrentXML);
} catch (FileNotFoundException e1) {
e1.printStackTrace();
}
String contents = null;
try {
try {
contents = IOUtils.toString(is, "UTF-8");
} catch (IOException e) {
e.printStackTrace();
}
} finally {
IOUtils.closeQuietly(is);
}
return contents;
}
Afterwars I convert the String to a DOM object:
static Document convertStringToXMLDocumentObject(String string) {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
Document document = null;
try {
builder = factory.newDocumentBuilder();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
try {
document = builder.parse(new InputSource(new StringReader(string)));
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return document;
}
And then I can just parse the DOM with XPath for example and all element values are in UTF-8!!
Demonstration:
currentString = (String) xPath.evaluate("/documents/document/element[#name='title']/value",pCurrentXMLAsDOM, XPathConstants.STRING);
System.out.println(currentString);
Output:
Personnel changes: Müller
:)
if you know file is utf8 encoded try something like :
FileInputStream fis = new FileInputStream("yourfile.xml");
InputStreamReader in = new InputStreamReader(fis, "UTF-8");
InputSource pCurrentXMLAsDOM = new InputSource(in);