Parsing XML from webpage

Parsing XML from webpage - java

If I copy and paste the xml from this site into a xml file I can parse it with java
http://api.indeed.com/ads/apisearch?publisher=8397709210207872&q=java&l=austin%2C+tx&sort&radius&st&jt&start&limit&fromage&filter&latlong=1&chnl&userip=1.2.3.4&v=2
However, I want to parse it directly from a webpage if possible!
Here's my current code:
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import org.xml.sax.SAXException;
import java.io.File;
import java.io.IOException;
public class XMLParser {
public void readXML(String parse) {
File xml = new File(parse);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder;
try {
dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xml);
// System.out.println("Root element :"
// + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("result");
System.out.println("----------------------------");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
// System.out.println("\nCurrent Element :" +
nNode.getNodeName());
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println("job title : "
+
eElement.getElementsByTagName("jobtitle").item(0)
.getTextContent());;
System.out.println("Company: "
+
eElement.getElementsByTagName("company")
.item(0).getTextContent());
System.out.println("City : "
+
eElement.getElementsByTagName("city").item(0)
.getTextContent());
System.out.println("State : "
+
eElement.getElementsByTagName("state").item(0)
.getTextContent());
System.out.println("Country : "
+
eElement.getElementsByTagName("country").item(0)
.getTextContent());
System.out.println("Date posted : "
+
eElement.getElementsByTagName("date").item(0)
.getTextContent());
System.out.println("Job summary : "
+
eElement.getElementsByTagName("snippet").item(0)
.getTextContent());
System.out.println("Latitude : "
+
eElement.getElementsByTagName("latitude").item(0).getTextContent());
System.out.println("longitude : "
+
eElement.getElementsByTagName("longitude").item(0).getTextContent());
}
}
} catch (ParserConfigurationException | SAXException | IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static void main(String[] args) {
new XMLParser().readXML("test.xml");
}
}
any help would be appreciated.

Give it the URI instead of the XML. It will download it for you.
Document doc = dBuilder.parse(uriString)

Please find the code snippet like this
String url = "http://api.indeed.com/ads/apisearch?publisher=8397709210207872&q=java&l=austin%2C+tx&sort&radius&st&jt&start&limit&fromage&filter&latlong=1&chnl&userip=1.2.3.4&v=2";
try
{
DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
DocumentBuilder b = f.newDocumentBuilder();
Document doc = b.parse(url);
}

you need to have the element/nodes you want in a for loop. So it can scan through xml file, and find the right node you searching for.
reads the xml file as a string, and creates a xml structure
builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(connection.getInputStream());
NodeList nodes = doc.getElementsByTagName("mode");
for (int i = 0; i < nodes.getLength(); i++)
Element element = (Element) nodes.item(i);
//Gets tag from XML and it´s content
NodeList nodeMode = element.getElementsByTagName("mode");
Element elemMode = (Element) nodeMode.item(0);
and after if you want to pick out a value and parse to an int or what you want you do like this:
int currentMode = Integer.parseInt(elemMode.getFirstChild().getTextContent());

That's how I parsed data directly from url http://www.nbp.pl/kursy/xml/+something
static class Kurs {
public float kurs_sprzedazy;
public float kurs_kupna;
}
private static DocumentBuilder dBuilder;
private static Kurs getData(String filename, String currency) throws Exception {
Document doc = dBuilder.parse("http://www.nbp.pl/kursy/xml/"+filename+".xml");
doc.getDocumentElement().normalize();
NodeList nList = doc.getElementsByTagName("pozycja");
for(int i = 0; i < nList.getLength(); i++) {
Element nNode = (Element)nList.item(i);
if(nNode.getElementsByTagName("kod_waluty").item(0).getTextContent().equals(currency)) {
Kurs kurs = new Kurs();
String data = nNode.getElementsByTagName("kurs_sprzedazy").item(0).getTextContent();
data = data.replace(',', '.');
kurs.kurs_sprzedazy = Float.parseFloat(data);
data = nNode.getElementsByTagName("kurs_kupna").item(0).getTextContent();
data = data.replace(',', '.');
kurs.kurs_kupna = Float.parseFloat(data);
return kurs;
}
}
return null;
}

Related

Possible way to parse the text alone from an xml document using java dom

I need to receive all the text alone from an xml file for receiving the specific tag i use this code. But i am not sure how to parse all the text from the XML i the XML files are different i don't know their root node and child nodes but i need the text alone from the xml.
try {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(streamLimiter.getFile());
doc.getDocumentElement().normalize();
System.out.println("Root element :"
+ doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("employee");
System.out.println("-----------------------");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
NodeList nlList = eElement.getElementsByTagName("firstname")
.item(0).getChildNodes();
Node nValue = (Node) nlList.item(0);
System.out.println("First Name : "
+ nValue.getNodeValue());
}
}
} catch (Exception e) {
e.printStackTrace();
}

Quoting jsight's reply in this post: Getting XML Node text value with Java DOM
import java.io.ByteArrayInputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
class Test {
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws Exception {
String xml = "<add job=\"351\">\n"
+ " <tag>foobar</tag>\n"
+ " <tag>foobar2</tag>\n"
+ "</add>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream(xml.getBytes());
org.w3c.dom.Document doc = db.parse(bis);
Node n = doc.getFirstChild();
NodeList nl = n.getChildNodes();
Node an, an2;
for (int i = 0; i < nl.getLength(); i++) {
an = nl.item(i);
if (an.getNodeType() == Node.ELEMENT_NODE) {
NodeList nl2 = an.getChildNodes();
for (int i2 = 0; i2 < nl2.getLength(); i2++) {
an2 = nl2.item(i2);
// DEBUG PRINTS
System.out.println(an2.getNodeName() + ": type (" + an2.getNodeType() + "):");
if (an2.hasChildNodes()) {
System.out.println(an2.getFirstChild().getTextContent());
}
if (an2.hasChildNodes()) {
System.out.println(an2.getFirstChild().getNodeValue());
}
System.out.println(an2.getTextContent());
System.out.println(an2.getNodeValue());
}
}
}
}
}
Output:
#text: type (3):
foobar
foobar
#text: type (3):
foobar2
Adapt this code to your problem and it should work.

reading xml - java, dom

I have a problem with reading data from xml using dom. I don't know why "System.out.println(nNode.getChildNodes().item(0).hasAttributes());" returns false... In my xml file this node contains attributes. Could you help me please?
This is my code:
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class XmlParser {
private String[] linia;
private String[] wariant;
private String[] przystanek;
private String[] tabliczka;
private String[] dzien;
private String[] godz;
private String[] min;
public void readXml() {
try {
File fXmlFile = new File("c:\\file.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
System.out.println("Root element :"
+ doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("linia");
System.out.println("-----------------------");
Node nNode = nList.item(0);
linia = new String[nNode.getAttributes().getLength()];
System.out.println(nNode.getAttributes().getLength());
int i = 0;
while (i < nNode.getAttributes().getLength()) {
linia[i] = nNode.getAttributes().item(i) + "";
System.out.print(linia[i] + " ");
i++;
}
wariant = new String[nNode.getChildNodes().getLength()];
System.out.println();
System.out.println(nNode.getChildNodes().getLength());
System.out.println(nNode.getNodeName());
int j = 0;
System.out.println(nNode.getChildNodes().item(0).hasAttributes());
while (j < nNode.getChildNodes().getLength()) {
wariant[j] = nNode.getChildNodes().item(j).getAttributes()
.item(0)
+ "";
// if(wariant[j].toString()!=null)
System.out.println(" " + wariant[j]);
j++;
}
} catch (Exception e) {
e.printStackTrace();
}
}
}

Have you checked the child node at index 1? My guess is that your parser sees all characters between tags (newlines, tabs, spaces) as CDATA and parses them as CDATA nodes which do not have attributes.

Failing to get element values using Element.getAttribute()

I would like to read an xml file. I' ve found an example which is good until the xml element doesn't have any attributes. Of course i've tried to look after how could I read attributes, but it doesn't works.
XML for example
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<car>
<properties>
<test h="1.12" w="4.2">
<colour>red</colour>
</test>
</properties>
</car>
Java Code:
public void readXML(String file) {
try {
File fXmlFile = new File(file);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println("test : "
+ getTagValue("test", eElement));
System.out.println("colour : " + getTagValue("colour", eElement));
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
public String getTagValue(String sTag, Element eElement) {
NodeList nlList = eElement.getElementsByTagName(sTag).item(0)
.getChildNodes();
Node nValue = (Node) nlList.item(0);
System.out.println(nValue.hasAttributes());
if (sTag.startsWith("test")) {
return eElement.getAttribute("w");
} else {
return nValue.getNodeValue();
}
}
Output:
false
test :
false
colour : red
My problem is, that i can't print out the attributes. How could i get the attributes?

There is alot wrong with your code; undeclared variables and a seemingly crazy algorithm. I rewrote it and it works:
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public final class LearninXmlDoc
{
private static String getTagValue(final Element element)
{
System.out.println(element.getTagName() + " has attributes: " + element.hasAttributes());
if (element.getTagName().startsWith("test"))
{
return element.getAttribute("w");
}
else
{
return element.getNodeValue();
}
}
public static void main(String[] args)
{
final String fileName = "c:\\tmp\\test\\domXml.xml";
readXML(fileName);
}
private static void readXML(String fileName)
{
Document document;
DocumentBuilder documentBuilder;
DocumentBuilderFactory documentBuilderFactory;
NodeList nodeList;
File xmlInputFile;
try
{
xmlInputFile = new File(fileName);
documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilder = documentBuilderFactory.newDocumentBuilder();
document = documentBuilder.parse(xmlInputFile);
nodeList = document.getElementsByTagName("*");
document.getDocumentElement().normalize();
for (int index = 0; index < nodeList.getLength(); index++)
{
Node node = nodeList.item(index);
if (node.getNodeType() == Node.ELEMENT_NODE)
{
Element element = (Element) node;
System.out.println("\tcolour : " + getTagValue(element));
System.out.println("\ttest : " + getTagValue(element));
System.out.println("-----");
}
}
}
catch (Exception exception)
{
exception.printStackTrace();
}
}
}

If you have a schema for the file, or can make one, you can use XMLBeans. It makes Java beans out of the XML, as the name implies. Then you can just use getters to get the attributes.

Use dom4j library.
InputStream is = new FileInputStream(filePath);
SAXReader reader = new SAXReader();
org.dom4j.Document doc = reader.read(is);
is.close();
Element content = doc.getRootElement(); //this will return the root element in your xml file
List<Element> methodEls = content.elements("element"); // this will retun List of all Elements with name "element"
Attribute attrib = methodEls.get(0).attribute("attributeName"); // this is the "attributeName" attribute of first element with name "element"

If you're looking purely to obtain attributes (E.g. a config / ini file) I would recommend using a java properties file.
http://docs.oracle.com/javase/tutorial/essential/environment/properties.html
If you just want to read a file create a new fileReader and put it into a bufferedReader.
BufferedReader in = new BufferedReader(new FileReader("example.xml"));

How to read xml string values using this path as shown in code(getFilesDir().getAbsolutePath()+ File.separator + "test.xml")

Hi Everybody,
I am new to Android. I am using DOM parsing for reading xml string value. For that, I used following code, that code will work up to getting root element value after that it is giving exception please solve this problem,
Advance Thanks,
Xml Code:
<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>
<ChangePassword>
<Oldpassword>23545565635354</Oldpassword>
<Newpassword>addsffggfdsfdsfdfs </Newpassword>
</ChangePassword>
java code:
File file = new File(getFilesDir().getAbsolutePath()+ File.separator + "test.xml"); DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(file);
doc.getDocumentElement().normalize();
System.out.println("Root element " + doc.getDocumentElement().getNodeName());
NodeList nodeLst = doc.getElementsByTagName("ChangePassword");
System.out.println("Information of all entries");
for (int s = 0; s < nodeLst.getLength(); s++) {
Node fstNode = nodeLst.item(s);
if (fstNode.getNodeType() == Node.ELEMENT_NODE)
{
Element fstElmnt = (Element) fstNode;
// Firstname
NodeList fstNmElmntLst = ((Document) fstElmnt).getElementsByTagName("Oldpassword");
Element fstNmElmnt = (Element) fstNmElmntLst.item(0);
NodeList fstNm = ((Node) fstNmElmnt).getChildNodes();
System.out.println("Old password : " + ((Node) fstNm.item(0)).getNodeValue());
// Lastname
NodeList lstNmElmntLst = ((Document) fstElmnt).getElementsByTagName("Newpassword");
Element lstNmElmnt = (Element) lstNmElmntLst.item(0);
NodeList lstNm = ((Node) lstNmElmnt).getChildNodes();
System.out.println("Old password : " + ((Node) lstNm.item(0)).getNodeValue());
// Address
NodeList addrNmElmntLst = ((Document) fstElmnt).getElementsByTagName("Newpassword");
Element addrNmElmnt = (Element) addrNmElmntLst.item(0);
NodeList addrNm = ((Node) addrNmElmnt).getChildNodes();
System.out.println("Address : " + ((Node) addrNm.item(0)).getNodeValue());
}
}
} catch (Exception e) {
Log.e("Exception",e.toString());
//e.printStackTrace();
}

Wow. The DOM Parser code is pretty ugly. Please just try Simple XML instead. Look at what your code could be like:
#Root(name = "ChangePassword")
public class PasswordChange {
#Element(name = "Oldpassword")
public String oldPassword;
#Element(name = "Newpassword")
public String newPassword;
}
And that is much nicer. And then you can just say:
Serializer serial = new Persister();
PasswordChange pc = serial.read(PasswordChange.class, streamOrFileWithXML);
And that is all that there is to it. Though if you want to see how to include it in Android have a look at my blog post.

Document doc = db.parse(in);
Element docElem = doc.getDocumentElement();
NodeList nl = docElem.getElementsByTagName("Oldpassword");
try that...
update
maybe it would be helpful if you take a look here: http://www.w3schools.com/xml/default.asp
the following code is working, is just tested.
import java.io.File;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class testxml {
private String filepath = "src/xml.xml";
public void parse() {
File file = new File(filepath);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db;
try {
db = dbf.newDocumentBuilder();
Document doc = db.parse(file);
Element docElem = doc.getDocumentElement();
NodeList nl1 = docElem.getElementsByTagName("Oldpassword");
for(int i = 0; i < nl1.getLength(); i++) {
Element entry = (Element)nl1.item(i);
System.out.println(entry.getFirstChild().getNodeValue());
}
NodeList nl2 = docElem.getElementsByTagName("Newpassword");
for(int i = 0; i < nl2.getLength(); i++) {
Element entry = (Element)nl2.item(i);
System.out.println(entry.getFirstChild().getNodeValue());
}
} catch (ParserConfigurationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static void main(String args[]) {
testxml x = new testxml();
x.parse();
}
}

Try changing this line
NodeList nodeLst = doc.getElementsByTagName("ChangePassword");
To this
NodeList nodeLst = doc.getDocumentElement().getElementsByTagName("ChangePassword");
If not, show us your stack trace.

Reading contents of the XML using java

I'm trying to read an XML file using java. I can sucessfully read the file but the problem is, I don't know how to read the values inside the column tag.
Since the column tags are not unique, I have no idea how to read them. Can someone help me.
Thanks in advance.
import java.net.URL;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class XMLReader {
public static void main(String argv[]) {
try {
//new code
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new URL("http://www.cse.lk/listedcompanies/overview.htm?d-16544-e=3&6578706f7274=1").openStream());
doc.getDocumentElement().normalize();
System.out.println("Root element " + doc.getDocumentElement().getNodeName());
NodeList nodeLst = doc.getElementsByTagName("row");
System.out.println("Information of all Stocks");
for (int s = 0; s < nodeLst.getLength(); s++) {
Node fstNode = nodeLst.item(s);
if (fstNode.getNodeType() == Node.ELEMENT_NODE) {
Element fstElmnt = (Element) fstNode;
//NodeList fstNmElmntLst = fstElmnt.getElementsByTagName("column");
//Element fstNmElmnt = (Element) fstNmElmntLst.item(0);
//NodeList fstNm = fstNmElmnt.getChildNodes();
//System.out.println("First Tag : " + ((Node) fstNm.item(0)).getNodeValue());
NodeList lstNmElmntLst = fstElmnt.getElementsByTagName("column");
// Element lstNmElmnt = (Element) lstNmElmntLst.item(0);
for (int columnIndex = 0; columnIndex < lstNmElmntLst.getLength(); columnIndex++) {
Element lstNmElmnt = (Element) lstNmElmntLst.item(columnIndex);
NodeList lstNm = lstNmElmnt.getChildNodes();
System.out.println("Last Tag : " + ((Node) lstNm.item(0)).getNodeValue());
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}

This code :
NodeList fstNmElmntLst = fstElmnt.getElementsByTagName("column");
Return a List of column nodes, why not just use a for loop to iterate over them all instead of just reading the first one ?
for (int columnIndex = 0; columnIndex < fstNmElmntLst.getLength(); columnIndex++) {
Element fstNmElmnt = (Element) fstNmElmntLst.item(columnIndex);
...
}

You now get a NPE on:
<column/>
and you should check your list size before getting element 0:
NodeList lstNm = lstNmElmnt.getChildNodes();
if (lstNm.getLength() > 0) {
System.out.println("Last Tag : " + ((Node)lstNm.item(0)).getNodeValue());
} else {
System.out.println("No content");
}
And as you're processing text content in nodes, have a look at the answer to this SO question. Text nodes are irriting as:
<foo>
a
b
c
</foo>
can be or are more than one child node of foo, and getTextContent() can ease the pain a bit.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing XML from webpage - java

Give it the URI instead of the XML. It will download it for you. Document doc = dBuilder.parse(uriString)

Related

Possible way to parse the text alone from an xml document using java dom

reading xml - java, dom

Failing to get element values using Element.getAttribute()

How to read xml string values using this path as shown in code(getFilesDir().getAbsolutePath()+ File.separator + "test.xml")

Reading contents of the XML using java

Categories

Resources