Here is my code snippet:
public EpsXmlParser(String xmlAnswer) {
DOMParser parser = new DOMParser();
try {
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xmlAnswer));
parser.parse(is);
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
Document doc = parser.getDocument();
parseXmlDoc(doc.getDocumentElement());
}
In this code block String xmlAnswer is actually an UTF-8 encoding xml taken from another machine using web service client program.
I realized when debugging that the problem here is after parser.getDocument() method implemented Document doc is being null.
I can not fix the problem. Please help me what should I do?
I can not get any exception. Code runs pretty well but Document doc will be like this (look at the snapshot below). I can not understand what is the problem is. Any help will be appreciated.
I used an XML like this. Is this XML format standard? If it is standard how can not I get any exception while using the predefined xml parsing codes.
<?xml version="1.0" encoding="UTF-8"?>
<extra_result><status>00</status><data><transaction_id>3c704f15-7c09-4bba-9046- ffbdb8c97b51</transaction_id><card_status>11</card_status><status_msg>Kart numarası yanlış. </status_msg><card_no>48422</card_no><name_surname></name_surname><gsm></gsm><bonus></bonus></data></extra_result>
I have managed to run the code with no problem by providing an example XML string document.
It suggests the problem is most likely with the XML string itself given in the xmlAnswer argument.
In order to see where exactly is your problem try to change your code and run the following:
public EpsXmlParser(String xmlAnswer) {
DOMParser parser = new DOMParser();
try {
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xmlAnswer));
parser.parse(is);
} catch (Exception e) {
throw new RuntimeException("Error on parsing document", e);
}
Document doc = parser.getDocument();
parseXmlDoc(doc.getDocumentElement());
}
I expect the exception will be thrown with actual reason of parsing error.
Related
When I run HP fortify the following code is given as a XML External Entity injection.Problem line is specified as Error Line.Any Help is appreciated.
private Document parseXmlString(String stringname, boolean validating) {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(validating);
ByteArrayInputStream is = new ByteArrayInputStream(stringname.getBytes());
Document doc = factory.newDocumentBuilder().parse(is);//Error Line
return doc;
} catch (SAXException e) {
// A parsing error occurred; the xml input is not valid
} catch (ParserConfigurationException e) {
} catch (IOException e) {
}
return null;
}
I hope this is what you are looking for:
https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing
In my android application I am using an xml file to store some history information within the application.
Following is the code I use to enter a new record to the file.
String filename = "file.xml";
File xmlFilePath = new File("/data/data/com.testproject/files/" + filename);
private void addNewRecordToFile(History history)
{
try
{
DocumentBuilderFactory dbfac = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = dbfac.newDocumentBuilder();
Document doc = docBuilder.parse(xmlFilePath);
Element rootEle = doc.getDocumentElement();
Element historyElement = doc.createElement("History");
rootEle.appendChild(historyElement);
Element customerEle = doc.createElement("customer");
customerEle.appendChild(doc.createTextNode(history.getCustomer()));
historyElement.appendChild(customerEle);
Element productEle = doc.createElement("product");
productEle.appendChild(doc.createTextNode(history.getProduct()));
historyElement.appendChild(productEle);
//-------->
DOMSource source = new DOMSource(doc);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
StreamResult result = new StreamResult(xmlFilePath);
transformer.transform(source, result);
}
catch (ParserConfigurationException e)
{
Log.v("State", "ParserConfigurationException" + e.getMessage());
}
catch (SAXException e)
{
Log.v("State", "SAXException" + e.getMessage());
}
catch (IOException e)
{
Log.v("State", "IOException" + e.getMessage());
}
catch (TransformerConfigurationException e) {
e.printStackTrace();
}
catch (TransformerFactoryConfigurationError e) {
e.printStackTrace();
}
catch (TransformerException e) {
e.printStackTrace();
}
}
XML file format
<?xml version="1.0" encoding="UTF-8"?>
<HistoryList>
<History>
<customer>Gordon Brown Ltd</customer>
<product>Imac</product>
</History>
<History>
<customer>GG Martin and Sons</customer>
<product>Sony Vaio</product>
</History>
<History>
<customer>PR Thomas Ltd</customer>
<product>Acer Laptop</product>
</History>
</HistoryList>
So using this code I can successfully add a new rocord to the file. But My minimum target version in android shoud be API level 4. This code works well with API Level 8 and above.
DOMSource, TransformerFactory classes are not available in android API levels under 8. So All the things before the comment //--------> works in APIs below 8.
Does anyone know any way that I can write to the xml file without using Transformer APIs. Thanks in advance...
EDITS.....
In my case I have to use a xml file to store information. That's why I don't look for sharedpreferences or Sqlite DB to store data. Thanks.
Maybe you should store this information is SharedPreferences instead? If you have a specific reason why you are trying to write this to XML, that is fine.
Otherwise, I would suggest using a different method of persistence in Android - as there are a few built in ways to do this that are easier that working with XML.
I am trying to get the DOM element from a UTF-8 Encoded XML parsed file containing arabic characters.
The below method take the parsed xml string and is supposed to return the Document.
here is a link to the xml:
http://212.12.165.44:7201/UniNews121.xml
public Document getDomElement(String xml){
Document doc = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
StringReader xmlstring=new StringReader(xml);
is.setCharacterStream(xmlstring);
is.setEncoding("UTF-8");
//APP CRASHES HERE
doc = db.parse(is);
} catch (ParserConfigurationException e) {
Log.e("Error: ", e.getMessage());
return null;
} catch (SAXException e) {
Log.e("Error: ", e.getMessage());
return null;
} catch (IOException e) {
Log.e("Error: ", e.getMessage());
return null;
}
// return DOM
return doc;
}
Error:
09-18 13:36:20.031: E/Error:(3846): Unexpected token (position:TEXT xml version="1.0...#2:1 in java.io.InputStreamReader#4144ac08)
I would appreciate your help but please be specific in your answers
It happened a lot of times to me, you should double check the encoding of the file you are opening. I suggest you to test this with a local copy of the file where you set the encoding by hand.
I want to parse the following xml structure:
<?xml version="1.0" encoding="utf-8"?>
<documents>
<document>
<element name="title">
<value><![CDATA[Personnel changes: Müller]]></value>
</element>
</document>
</documents>
For parsing this element name="????? structure I use XPath in the following way:
XPath xPath = XPathFactory.newInstance().newXPath();
String currentString = (String) xPath.evaluate("/documents/document/element[#name='title']/value",pCurrentXMLAsDOM, XPathConstants.STRING);
The parsing itself works fine, but there are just some problems with german umlauts (vowels) like "Ü", "ß" or something like this. When I print out currentString the String is:
Personnel changes: Müller
But I want to have the String like in the Xml:
Personnel changes: Müller
Just to add: I cant change the content of the xml file, I have to parse it like I get it, so I definitely have to parse everey String in the correct way.
Sounds like an encoding problem. The XML is UTF-8 encoded Unicode which you seem to print encoded as ISO-8859-1. Check the encoding settings of your Java source.
Edit: See Setting the default Java character encoding? for how to set file.encoding.
I found a good and fast solution now:
public static String convertXMLToString(File pCurrentXML) {
InputStream is = null;
try {
is = new FileInputStream(pCurrentXML);
} catch (FileNotFoundException e1) {
e1.printStackTrace();
}
String contents = null;
try {
try {
contents = IOUtils.toString(is, "UTF-8");
} catch (IOException e) {
e.printStackTrace();
}
} finally {
IOUtils.closeQuietly(is);
}
return contents;
}
Afterwars I convert the String to a DOM object:
static Document convertStringToXMLDocumentObject(String string) {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
Document document = null;
try {
builder = factory.newDocumentBuilder();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
try {
document = builder.parse(new InputSource(new StringReader(string)));
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return document;
}
And then I can just parse the DOM with XPath for example and all element values are in UTF-8!!
Demonstration:
currentString = (String) xPath.evaluate("/documents/document/element[#name='title']/value",pCurrentXMLAsDOM, XPathConstants.STRING);
System.out.println(currentString);
Output:
Personnel changes: Müller
:)
if you know file is utf8 encoded try something like :
FileInputStream fis = new FileInputStream("yourfile.xml");
InputStreamReader in = new InputStreamReader(fis, "UTF-8");
InputSource pCurrentXMLAsDOM = new InputSource(in);
GUI utility of Apache Tika provides an option for getting main content ( apart from format text and structured text ) of the given document or the URL. I just want to know which method is responsible for extracting the main content of the docs/url. So that I can incorporate that method in my program. Also whether they are using any heuristic algorithm while extracting data from HTML pages. Because sometimes in the extracted content, I can't able to see the advertisements.
UPDATE : I found out that BoilerPipeContentHandler is responsible for it.
The "main content" feature in the Tika GUI is implemented using the BoilerpipeContentHandler class that relies on the boilerpipe library for the heavy lifting.
I believe this is powered by the BodyContentHandler, which fetches just the HTML contents of the document body. This can additionally be combined with other handlers to return just the plain text of the body, if required.
public String[] tika_autoParser() {
String[] result = new String[3];
try {
InputStream input = new FileInputStream(new File(path));
ContentHandler textHandler = new BodyContentHandler();
Metadata metadata = new Metadata();
AutoDetectParser parser = new AutoDetectParser();
ParseContext context = new ParseContext();
parser.parse(input, textHandler, metadata, context);
result[0] = "Title: " + metadata.get(metadata.TITLE);
result[1] = "Body: " + textHandler.toString();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (TikaException e) {
e.printStackTrace();
}
return result;
}