I want to parse the following xml structure:
<?xml version="1.0" encoding="utf-8"?>
<documents>
<document>
<element name="title">
<value><![CDATA[Personnel changes: Müller]]></value>
</element>
</document>
</documents>
For parsing this element name="????? structure I use XPath in the following way:
XPath xPath = XPathFactory.newInstance().newXPath();
String currentString = (String) xPath.evaluate("/documents/document/element[#name='title']/value",pCurrentXMLAsDOM, XPathConstants.STRING);
The parsing itself works fine, but there are just some problems with german umlauts (vowels) like "Ü", "ß" or something like this. When I print out currentString the String is:
Personnel changes: Müller
But I want to have the String like in the Xml:
Personnel changes: Müller
Just to add: I cant change the content of the xml file, I have to parse it like I get it, so I definitely have to parse everey String in the correct way.
Sounds like an encoding problem. The XML is UTF-8 encoded Unicode which you seem to print encoded as ISO-8859-1. Check the encoding settings of your Java source.
Edit: See Setting the default Java character encoding? for how to set file.encoding.
I found a good and fast solution now:
public static String convertXMLToString(File pCurrentXML) {
InputStream is = null;
try {
is = new FileInputStream(pCurrentXML);
} catch (FileNotFoundException e1) {
e1.printStackTrace();
}
String contents = null;
try {
try {
contents = IOUtils.toString(is, "UTF-8");
} catch (IOException e) {
e.printStackTrace();
}
} finally {
IOUtils.closeQuietly(is);
}
return contents;
}
Afterwars I convert the String to a DOM object:
static Document convertStringToXMLDocumentObject(String string) {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
Document document = null;
try {
builder = factory.newDocumentBuilder();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
try {
document = builder.parse(new InputSource(new StringReader(string)));
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return document;
}
And then I can just parse the DOM with XPath for example and all element values are in UTF-8!!
Demonstration:
currentString = (String) xPath.evaluate("/documents/document/element[#name='title']/value",pCurrentXMLAsDOM, XPathConstants.STRING);
System.out.println(currentString);
Output:
Personnel changes: Müller
:)
if you know file is utf8 encoded try something like :
FileInputStream fis = new FileInputStream("yourfile.xml");
InputStreamReader in = new InputStreamReader(fis, "UTF-8");
InputSource pCurrentXMLAsDOM = new InputSource(in);
Related
I am trying to 'GET' a rss feed.
public RssFeed(String url) {
_url = url;
String res = this.api.get(url);
ByteArrayInputStream bis = new ByteArrayInputStream(res.getBytes());
try {
bis.close();
} catch (IOException e) {
e.printStackTrace();
}
XMLDecoder decoder = new XMLDecoder(bis);
try {
Object xml = decoder.readObject();
_response = xml.toString();
} catch(Exception e) {
e.printStackTrace();
} finally {
decoder.close();
}
}
When I check what's inside of 'res'. It appears to get this entire XML.
But then, I am trying to decode it and I get:
java.lang.IllegalArgumentException: Unsupported element: rss
Can someone help me with that? I am new to Java.
Thanks!
XMLDecoder is meant to be used on elements created by XMLEncoder. Since you're scraping this XML from the web, the elements in this XML may not be valid according to these classes. Use a more generic XML parser, such as DocumentBuilder::parse() to handle this.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
try {
builder.parse(url);
} catch (IOException e) {
e.printStackTrace();
} catch (SAXParseException e) {
e.printStackTrace();
} catch (IllegalArgumentException e) {
e.printStackTrace();
}
When I run HP fortify the following code is given as a XML External Entity injection.Problem line is specified as Error Line.Any Help is appreciated.
private Document parseXmlString(String stringname, boolean validating) {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(validating);
ByteArrayInputStream is = new ByteArrayInputStream(stringname.getBytes());
Document doc = factory.newDocumentBuilder().parse(is);//Error Line
return doc;
} catch (SAXException e) {
// A parsing error occurred; the xml input is not valid
} catch (ParserConfigurationException e) {
} catch (IOException e) {
}
return null;
}
I hope this is what you are looking for:
https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing
Here is my code snippet:
public EpsXmlParser(String xmlAnswer) {
DOMParser parser = new DOMParser();
try {
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xmlAnswer));
parser.parse(is);
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
Document doc = parser.getDocument();
parseXmlDoc(doc.getDocumentElement());
}
In this code block String xmlAnswer is actually an UTF-8 encoding xml taken from another machine using web service client program.
I realized when debugging that the problem here is after parser.getDocument() method implemented Document doc is being null.
I can not fix the problem. Please help me what should I do?
I can not get any exception. Code runs pretty well but Document doc will be like this (look at the snapshot below). I can not understand what is the problem is. Any help will be appreciated.
I used an XML like this. Is this XML format standard? If it is standard how can not I get any exception while using the predefined xml parsing codes.
<?xml version="1.0" encoding="UTF-8"?>
<extra_result><status>00</status><data><transaction_id>3c704f15-7c09-4bba-9046- ffbdb8c97b51</transaction_id><card_status>11</card_status><status_msg>Kart numarası yanlış. </status_msg><card_no>48422</card_no><name_surname></name_surname><gsm></gsm><bonus></bonus></data></extra_result>
I have managed to run the code with no problem by providing an example XML string document.
It suggests the problem is most likely with the XML string itself given in the xmlAnswer argument.
In order to see where exactly is your problem try to change your code and run the following:
public EpsXmlParser(String xmlAnswer) {
DOMParser parser = new DOMParser();
try {
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xmlAnswer));
parser.parse(is);
} catch (Exception e) {
throw new RuntimeException("Error on parsing document", e);
}
Document doc = parser.getDocument();
parseXmlDoc(doc.getDocumentElement());
}
I expect the exception will be thrown with actual reason of parsing error.
Here I try to access two third party API.
I got two xml response, I merge them in one single file and store it in local system.
If i print the output in console I got the output in xml format, but I want to print it in the browser.
My solution won't work please help me.
Here's my code:
protected void doGet(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
response.setContentType("text/xml");
PrintWriter out=response.getWriter();
String btn1=request.getParameter("btn1");
String btn2=request.getParameter("btn2");
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setIgnoringComments(true);
DocumentBuilder builder = null;
try {
builder = domFactory.newDocumentBuilder();
} catch (ParserConfigurationException e2) {
e2.printStackTrace();
}
Document doc = null;
try {
doc = builder.parse(new URL("valid url in my program").openStream());
response.setContentType("text/xml");
Object con=doc.getDoctype();
} catch (SAXException e2) {
e2.printStackTrace();
}
Document doc1 = null;
try {
doc1 = builder.parse(new URL(valid url).openStream());
} catch (SAXException e2) {
e2.printStackTrace();
}
NodeList nodes = doc.getElementsByTagName("events");
NodeList node1=doc.getElementsByTagName("events");
Element root=doc.getDocumentElement();
Element root1 = doc.createElement("ObjectId");
doc.getDocumentElement().appendChild(root1);
root.getElementsByTagName("ObjectId").item(0).setTextContent("1");
node1.item(0).getParentNode().insertBefore(root1,node1.item(0));
NodeList nodes1 = doc1.getElementsByTagName("events");
NodeList node2=doc1.getElementsByTagName("event");
Element root2=doc1.getDocumentElement();
Element root3= doc1.createElement("ObjectId");
doc1.getDocumentElement().appendChild(root3);
root2.getElementsByTagName("ObjectId").item(0).setTextContent("2");
node2.item(0).getParentNode().insertBefore(root3,node2.item(0));
for(int i=0;i<nodes1.getLength();i=i+1){
Node n= (Node) doc.importNode(nodes1.item(i), true);
nodes.item(i).getParentNode().appendChild(n);
}
Transformer transformer = null;
try {
transformer = TransformerFactory.newInstance().newTransformer();
} catch (TransformerConfigurationException e1) {
e1.printStackTrace();
} catch (TransformerFactoryConfigurationError e1) {
e1.printStackTrace();
}
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
StreamResult result = new StreamResult(new StringWriter());
DOMSource source = new DOMSource(doc);
try {
transformer.transform(source,result);
} catch (TransformerException e) {
e.printStackTrace();
}
Writer output = null;
output = new BufferedWriter(new FileWriter("merge.xml"));
String xmlout = result.getWriter().toString();
output.write(xmlout);
response.setContentType("text/xml");
out.write(xmlout);
//out.println(xmlout);
//System.out.println(xmlout);
//I tried many ways but
//it will not print to the browser in xml the format
}
A point of note; you're setting the ContentType three times. It only needs setting once?
You don't appear to be returning the response, but I presume some other part of your program is handling that.
EDITED: the other answer that was here disappeared.
Going to need more information to work out what is going wrong. Please bear in mind that System.out.println() calls will not show up in the browser as they're console-specific.
I am trying to get the DOM element from a UTF-8 Encoded XML parsed file containing arabic characters.
The below method take the parsed xml string and is supposed to return the Document.
here is a link to the xml:
http://212.12.165.44:7201/UniNews121.xml
public Document getDomElement(String xml){
Document doc = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
StringReader xmlstring=new StringReader(xml);
is.setCharacterStream(xmlstring);
is.setEncoding("UTF-8");
//APP CRASHES HERE
doc = db.parse(is);
} catch (ParserConfigurationException e) {
Log.e("Error: ", e.getMessage());
return null;
} catch (SAXException e) {
Log.e("Error: ", e.getMessage());
return null;
} catch (IOException e) {
Log.e("Error: ", e.getMessage());
return null;
}
// return DOM
return doc;
}
Error:
09-18 13:36:20.031: E/Error:(3846): Unexpected token (position:TEXT xml version="1.0...#2:1 in java.io.InputStreamReader#4144ac08)
I would appreciate your help but please be specific in your answers
It happened a lot of times to me, you should double check the encoding of the file you are opening. I suggest you to test this with a local copy of the file where you set the encoding by hand.