How to read XML response in specific character code in Java - java

I use Wikimedia API Sandbox for Japanese.
Japanese Version
English Version
I send a HTTP request to Wikimedia and I get a result formed in XML.
When I try to send a request and get a result on API Sandbox Webpage, there is no character corruption in a result.
But when I get a result in Java, a result includes character corruptions.
I cannot assign a specific character code in XML file.
How can I assign a result a specific character code?
How can I resolve my problem?
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db
.parse(new URL(
"http://ja.wikipedia.org/w/api.php?action=query&prop=categories&format=xml&cllimit=10&titles="
+ key).openStream());
Element root = doc.getDocumentElement();
NodeList queryList = root.getChildNodes();
Node query = queryList.item(0);
if (query instanceof Element) {
Element queryEle = (Element) query;
NodeList pagesList = queryEle.getChildNodes();
Node pgs = pagesList.item(0);
if (pgs instanceof Element) {
Element pagesElement = (Element) pgs;
NodeList pageList = pagesElement.getChildNodes();
Node page = pageList.item(0);
if (page instanceof Element) {
Element pageElement = (Element) page;
String title = pageElement.getAttribute("title");
title = new String(title.getBytes("UTF-8"), "UTF-8");
}
}
}
} catch (ParserConfigurationException e) {
} catch (SAXException e) {
} catch (IOException e) {
}
Now I send a request, I got a result whose page title is "大学". But in Java, it shows "??".
I use above code for Android Application.

title = new String(title.getBytes("UTF-8"), "UTF-8"); can be left out.
It worked for me, for key=1 (receiving UTF-8). I have a UTF-8 Linux PC though. Maybe you did not output in a UTF-8 context or so. Try write the Document to a file.
You could do more inspection:
URLConnection connection = new URL("...").openConnection();
... connection.getContentEncoding();
... connection.getContentType();
InputStream in = connection.openStream();

Related

Unexpected token (<) using Document builder in Android Studio

I'm using document builder and NodeList in Android Studio to parse an xml document. I previously found that the xml was incorrect and had un-escaped ampersands within the text. After taking care of this though and double check with w3 XML validator, I still get an unexpected token error:
e: "org.xml.sax.SAXParseException: Unexpected token (position:TEXT \n \n 601\n ...#5262:1 in java.io.StringReader#cd0db4a)"
However, when I open the xml and look at the line referred to, I don't see anything that would be considered troublesome:
... ...
5257 <WebSvcLocation>
5258 <Id>1521981</Id>
5259 <Name>Warehouse: Row 3</Name>
5260 <SiteName>Warehouse</SiteName>
5261 </WebSvcLocation>
5262 </ArrayOfWebSvcLocation>
I have checked the xml as well for non printing characters and I have not found any. Below is the code I have been using:
public List<Location> SpinnerXML(String xml){
List<Location> list = new ArrayList<Location>();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
InputSource is;
String s = xml.replaceAll("[&]"," and ");
try {
builder = factory.newDocumentBuilder();
is = new InputSource(new StringReader(s));
Document doc = builder.parse(is);
NodeList lt = doc.getElementsByTagName("WebSvcLocation");
int id;
String name,siteName;
for (int i = 0; i < lt.getLength(); i++) {
Element el = (Element) lt.item(i);
id = Integer.parseInt(getValue(el, "Id"));
name = getValue(el, "Name");
siteName = getValue(el, "SiteName");
list.add(new Location(id, name, siteName));
}
} catch (ParserConfigurationException e){
} catch (SAXException e){
e.printStackTrace();
} catch (IOException e){
}
return list;
}
The XML I have been trying to read is hosted here.
Thanks in advance for the help!
InputSource seems to do some guessing as to the encoding, so here's some things to try.
From here it says:
Android note: The Android platform default (encoding) is always UTF-8.
Referenced from here
Java stores strings as UTF-16 internally.
"Java stores strings as UTF-16 internally, but the encoding used
externally, the "system default encoding", varies.
(1) I would initially recommend:
is.setEncoding("UTF-8");
(2) But it should do no harm to replace this:
Document doc = builder.parse(is);
With this:
Document doc = builder.parse(new ByteArrayInputStream(s.getBytes()));
(3) OR try this:
String s1 = URLDecoder.decode(s, "UTF-8");
Document doc = builder.parse(new ByteArrayInputStream(s1.getBytes()));
NOTE:
if you try (2) or (3) comment OUT:
is = new InputSource(new StringReader(s));
As it may mess up String s.

Looking for a specific child in XML using java

I've been looking for the past few hours, and I can't find how to do it.
My XML file:
<list>
<Company id="01">
<Name>Atari</Name>
<Founded>1972</Founded>
<Consoles>
2600
5200
7800
</Consoles>
</Company>
<Company id="02">
<Name>Sega</Name>
<Founded>1960</Founded>
<Consoles>
Master System
Megadrive
Saturn
</Consoles>
</Company>
</list>
Basically, I want the code to find not only the name in one company block, but the name in any company block I wish, and be able to show it to other classes. So far, I've been able to show either or, but only by changing the code directly and not on the fly. The code I'm using:
private static String getTextValue(String def, Element doc, String tag) {
String value = def;
NodeList nl;
nl = doc.getElementsByTagName(tag);
if (nl.getLength() > 0 && nl.item(0).hasChildNodes()) {
value = nl.item(0).getFirstChild().getNodeValue();
}
if(value==null) value = " ";
return value;
}
public static boolean readXML(String xml) {
rolev = new ArrayList<String>();
Document dom;
// Make an instance of the DocumentBuilderFactory
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
// use the factory to take an instance of the document builder
DocumentBuilder db = dbf.newDocumentBuilder();
// parse using the builder to get the DOM mapping of the
// XML file
dom = db.parse(xml);
Element doc = dom.getDocumentElement();
role1 = getTextValue(role1, doc, "Name");
if (getRole1() != null) {
if (!getRole1().isEmpty())
rolev.add(getRole1());
}
role2 = getTextValue(role2, doc, "Founded");
if (role2 != null) {
if (!role2.isEmpty())
rolev.add(role2);
}
role3 = getTextValue(role3, doc, "Consoles");
if (role3 != null) {
if (!role3.isEmpty())
rolev.add(role3);
}
role4 = getTextValue(role4, doc, "Name");
if ( role4 != null) {
if (!role4.isEmpty())
rolev.add(role4);
}
return true;
} catch (ParserConfigurationException pce) {
System.out.println(pce.getMessage());
} catch (SAXException se) {
System.out.println(se.getMessage());
} catch (IOException ioe) {
System.err.println(ioe.getMessage());
}
return false;
}
I used a lot of different methods, some searched up and some made on my own, but this one is the closest I can get to working for what I need. I need to be able to have the place it's looking change on the fly though.
You can use xpath to read XML. See Parsing XML with XPath in Java and/or How to read XML using XPath in Java
In your case, if you wanted to get all of the company names the set path you would use (basically the tag filter) would be "//list/Company/Name/text()"
The rest of the code can basically be copied from the questions I posted. Only your set (filter) would be different.

Retrieve XML Element names with Java from unknown message format

I am parsing XML from lots of JMS messaging topics, so the structure of each message varies a lot and I'd like to make one general tool to parse them all.
To start, all I want to do is get the element names:
<gui-action>
<action>some action</action>
<params>
<param1>blue</param1>
<param2>tall</param2>
<params>
</gui-action>
I just want to retrieve the strings "gui-action", "action", "params", "param1", and "param2." Duplicates are just fine.
I've tried using org.w3c.dom.Node, Element, NodeLists and I'm not having much luck. I keep getting the element values, not the names.
private Element root;
private Document doc;
private NodeList nl;
//messageStr is passed in elsewhere in the code
//but is a string of the full XML message.
doc = xmlParse( messageStr );
root = doc.getDocumentElement();
nl = root.getChildNodes();
int size = nl.getLength();
for (int i=0; i<size; i++) {
log.info( nl.item(i).getNodeName() );
}
public Document xmlParse( String xml ){
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db;
InputSource is;
try {
//Using factory get an instance of document builder
db = dbf.newDocumentBuilder();
is = new InputSource(new StringReader( xml ) );
doc = db.parse( is );
} catch(ParserConfigurationException pce) {
pce.printStackTrace();
} catch(SAXException se) {
se.printStackTrace();
} catch(IOException ioe) {
ioe.printStackTrace();
}
return doc;
//parse using builder to get DOM representation of the XML file
}
My logged "parsed" XML looks like this:
#text
action
#text
params
#text
Figured it out. I was iterating over only the child nodes, and not including the parent. So now I just filter out the #texts, and include the parent. Derp.
log.info(root.getNodeName() );
for (int i=0; i<size; i++) {
nodeName = nl.item(i).getNodeName();
if( nodeName != "#text" ) {
log.info( nodeName );
}
}
Now if anyone knows a way to get a NodeList of the entire document, that would be awesome.

null pointer exception on getNodeValue() when parsing XML web page - android

I looked over several of the answers posted here, but I can't find the answer I need. It may have to do with the web site itself, but I don't think it is.
I'm trying to parse an XML on a web site and I'm getting a null pointer exception error.
I run the parsing is a separate thread following Google demand when reading from the web.
please see my code and try to help.
class BackgroundTask1 extends AsyncTask<String, Void, String[]> {
protected String[] doInBackground(String... url) {
new HttpGet();
new StringBuffer();
InputStream is = null;
HttpURLConnection con = null;
try {
//Log.d("eyal", "URL: " + boiUrl);
URL url1 = new URL("http://www.boi.org.il/currency.xml");
con = (HttpURLConnection)url1.openConnection();
con.setRequestMethod("GET");
con.connect();
is = con.getInputStream();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(is);
NodeList lastVld = doc.getElementsByTagName("LAST_UPDATE");
String lastV = lastVld.item(0).getFirstChild().getNodeValue();
}
catch (Exception e) {
e.printStackTrace();
}
I get the error on the last line.
Thanks for your help.
This code worked for me
InputStream is = null;
HttpURLConnection con = null;
try {
URL url1 = new URL("http://www.boi.org.il/currency.xml");
con = (HttpURLConnection)url1.openConnection();
con.setRequestMethod("GET");
con.connect();
is = con.getInputStream();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(is);
NodeList lastVld = doc.getElementsByTagName("LAST_UPDATE");
Element elem = (Element) lastVld.item(0);
String lastV = elem.getTextContent();
System.out.println(lastV);
} catch (Exception e) {
e.printStackTrace();
}
I verified I was getting good content by adding a transformer to print out the results to the console.
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer xform = tFactory.newTransformer();
xform.transform(new DOMSource(doc), new StreamResult(System.out));
There was a couple times I tried running where elem came out null, which I think had something to do with some bad content being retrieved from the URL. This was the content that was printed by the transformer.
<html>
<body>
<script>document.cookie='iiiiiii=11a887d6iiiiiii_11a887d6; path=/';window.location.href=window.location.href;</script>
</body>
</html>
I noticed that if I had this file open in my browser, the code would all of a sudden quit working until I refreshed the page, then it started giving me the right output.
I suspect there's an issue with something at this URL, because when it works properly, this code works fine.
Good luck...
You only have one LAST_UPDATE tag in your xml and it has an inner value, so try just using the node value from the Node Class you get from item(0)
String lastV = lastVld.item(0).getNodeValue();
HTHs
There is no node returned for that tag name. You may want to first check the size of the lastVld and then try to access the items in there.

Submitting an XML post request using Java

I am trying to integrate a Usurv survey to my website. To do this, I need to submit an XML request to the URL http://app.usurv.com/API/Gateway.svc/getcampaignforframe, using HTTP POST. Then the response should contain a unique URL pointing to a survey.
Unfortunately I can't get it to work - the code compiles correctly but when I load the webpage I get the following exception:
"WARNING: URL = http://app.usurv.com/API/Gateway.svc/getcampaignforframe
[Fatal Error] CampaignFrameRequest%3E:6:3: The element type "link" must be terminated by the matching end-tag "</link>"."
I'm really confused about that as the XML doesn't even have a link
tag, so I'm not sure where the error could be coming from. Does anyone have any ideas what could be causing this and how I can fix it?
Here is the Java code:
public class UsurvSurveyElement extends RenderController
{
private static Logger LOG = Logger.getLogger(UsurvSurveyElement.class.getName());
String xml = "<CampaignFrameRequest xmlns='http://Qsurv/api' xmlns:i='http://www.w3.org/2001/XMLSchema-instance'><PartnerId>236</PartnerId><PartnerWebsiteID>45</PartnerWebsiteID><RespondentID>1</RespondentID><RedirectUrlComplete>http://localhost:8080/eveningstar/home</RedirectUrlComplete><RedirectUrlSkip>http://localhost:8080/eveningstar/home</RedirectUrlSkip></CampaignFrameRequest>";
String strURL = "http://app.usurv.com/API/Gateway.svc/getcampaignforframe";
#Override
public void populateModelBeforeCacheKey(RenderRequest renderRequest, TopModel topModel, ControllerContext controllerContext )
{
super.populateModelBeforeCacheKey( renderRequest, topModel, controllerContext );
PostMethod post = new PostMethod(strURL);
try
{
// Specify content type and encoding
// If content encoding is not explicitly specified
// ISO-8859-1 is assumed
post.setRequestHeader(
"Content-type", "text/xml; charset=ISO-8859-1");
LOG.warning("request headers: " +post.getRequestHeader("Content-type"));
StringRequestEntity requestEntity = new StringRequestEntity(xml);
post.setRequestEntity(requestEntity);
LOG.warning("request entity: " +post.getRequestEntity());
String response = post.getResponseBodyAsString();
LOG.warning("XML string = " + xml);
LOG.warning("URL = " + strURL);
topModel.getLocal().setAttribute("thexmlresponse",response);
}
catch(Exception e)
{
LOG.warning("Errors while executing postMethod "+ e);
}
try
{
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document document = docBuilder.parse(strURL+xml);
processNode(document.getDocumentElement());
LOG.warning("doc output = " + document);
}
catch(Exception e)
{
LOG.warning("Errors while parsing XML: "+ e);
}
}
private void processNode(Node node) {
// do something with the current node instead of System.out
LOG.warning(node.getNodeName());
NodeList nodeList = node.getChildNodes();
for (int i = 0; i < nodeList.getLength(); i++) {
Node currentNode = nodeList.item(i);
if (currentNode.getNodeType() == Node.ELEMENT_NODE) {
//calls this method for all the children which is Element
LOG.warning("current node: " + currentNode);
processNode(currentNode);
}
}
}
}
This line looks really strange, don't you mean to parse the response body instead?
Document document = docBuilder.parse(strURL+xml);
The parse method with a string parameter uses this string as an URL, so the XML parser is connection to the server again, using a GET request. The server is probably responding with an error message in HTML format, leading to the exception complaining about the link element.
Something like the following should work better:
Document document = docBuilder.parse(new InputSource(new StringReader(response)));

Categories