Java DOM XML parsing :: getting element attribute value - java

How can i extract attribute value out of the element. My xml node is writen like this
< nodename attribute="value" > i need to extract it out to compare it against another string.
But since i am not calling document.getElementsByTag then i cant use .getAttribute("att.").getNodeValue to get the value.
Instead i have a NodeList and getAttribute() does not have getNodeValue.
package dev;
import java.io.*;
import java.util.*;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
public class Parser {
static String def = "\"admin\",\"base\",\"Default\",\"simple\"";
static String category = "";
static String sku = "";
static String has_options = "0";
static String name = "";
static String image = "";
static String small_image = "";
static String thumbnail = "";
public static void toCSV() {
try {
BufferedWriter output = new BufferedWriter(new FileWriter("sim.csv", true));
output.newLine();
output.write(def);
output.write(String.format(",\"%s\",\"%s\",\"%s\"", category, sku, has_options));
output.write(String.format(",\"%s\",\"%s\",\"%s\",\"%s\"", name, image, small_image, thumbnail));
output.flush();
output.close();
} catch(Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
toCSV();
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new File("input.asp.xml"));
document.getDocumentElement().normalize();
NodeList list = document.getElementsByTagName("izdelek");
for(int i = 0; i < 1; i++) {
NodeList child = list.item(i).getChildNodes();
for(int j = 0; j < child.getLength(); j++) {
if(child.item(j).getNodeName().equals("kategorija")) {
category = child.item(j).getTextContent().trim();
} else if(child.item(j).getNodeName().equals("ean")) {
sku = child.item(j).getTextContent().trim();
} else if(child.item(j).getNodeName().equals("izdelekIme")) {
name = child.item(j).getTextContent().trim();
} else if(child.item(j).getNodeName().equals("slikaMala")) {
small_image = child.item(j).getTextContent().trim();
thumbnail = child.item(j).getTextContent().trim();
} else if(child.item(j).getNodeName().equals("slikaVelika")) {
image = child.item(j).getTextContent().trim();
} else if(child.item(j).getNodeName().equals("dodatneLastnosti")) {
NodeList subs = child.item(j).getChildNodes();
// ^ need to parse these nodes they are written as <nodename attribute="value">
// i need to print out the value
}
}
//toCSV();
}
} catch(Exception io) {
io.printStackTrace();
}
}
}

Solved:
XML input:
< nodename attribute="value" > Something </ nodename>
Java code:
NodesList subs = child.item(j).getChildNodes();
System.out.println(subs.item(0).getTextContent()); // >> Something
Element element = (Element) document.adoptNode(subs.item(0));
System.out.println(element.getAttribute("attribute")); // >> value

You also can use this,
child.item(j).getFirstChild().getNodeValue();

Related

How to read an online XML file for currency rates in java

I'm building a simple currency converter which needs to sue online rates. I found the following API from the European Central Bank to use:
http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml
My problem is im struggling to implement it. Here is what i have so far after using a bunch of different sources to try and get this code together.
try{
URL url = new URL("http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();
NodeList nodeList1 = doc.getElementsByTagName("Cube");
for(int i = 0; i < nodeList1.getLength(); i++){
Node node = nodeList1.item(i);
}
}
catch(Exception e){
}
So what i thought is that this code would take down all the nodes which tart with "Cube", and contain the rates.
Anyone have an easier wya to pull down the rates from the API into an array in the order they appear on the XML as that's all I'm trying to do
Thanks
XPath is one way to answer this, since you just want to extract information from the XML and not change the XML. The structure of the XML suggests that you're looking for nodes that are Cube nodes, that are child of Cube which is also a child of Cube -- Cube nested three times, so extract nodes with an XPath compiled using this String: "//Cube/Cube/Cube". This looks for nodes that have Cube nested 3 times located anywhere (the //) in the Document:
XPathExpression expr = xpath.compile("//Cube/Cube/Cube");
Then check the nodes for a "currency" attribute. If they have this, then they also have a "rate" attribute, and then extract this information.
NamedNodeMap attribs = node.getAttributes();
if (attribs.getLength() > 0) {
Node currencyAttrib = attribs.getNamedItem(CURRENCY);
if (currencyAttrib != null) {
String currencyTxt = currencyAttrib.getNodeValue();
String rateTxt = attribs.getNamedItem(RATE).getNodeValue();
// ...
}
}
Where CURRENCY = "currency" and RATE = "rate"
For example:
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class TestXPath {
private static final String CURRENCY = "currency";
private static final String CUBE_NODE = "//Cube/Cube/Cube";
private static final String RATE = "rate";
public static void main(String[] args) {
List<CurrencyRate> currRateList = new ArrayList<>();
DocumentBuilderFactory builderFactory =
DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
try {
builder = builderFactory.newDocumentBuilder();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
Document document = null;
String spec = "http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml";
try {
URL url = new URL(spec);
InputStream is = url.openStream();
document = builder.parse(is);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
String xPathString = CUBE_NODE;
XPathExpression expr = xpath.compile(xPathString);
NodeList nl = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < nl.getLength(); i++) {
Node node = nl.item(i);
NamedNodeMap attribs = node.getAttributes();
if (attribs.getLength() > 0) {
Node currencyAttrib = attribs.getNamedItem(CURRENCY);
if (currencyAttrib != null) {
String currencyTxt = currencyAttrib.getNodeValue();
String rateTxt = attribs.getNamedItem(RATE).getNodeValue();
currRateList.add(new CurrencyRate(currencyTxt, rateTxt));
}
}
}
} catch (SAXException | IOException | XPathExpressionException e) {
e.printStackTrace();
}
for (CurrencyRate currencyRate : currRateList) {
System.out.println(currencyRate);
}
}
}
public class CurrencyRate {
private String currency;
private String rate; // ?double
public CurrencyRate(String currency, String rate) {
super();
this.currency = currency;
this.rate = rate;
}
public String getCurrency() {
return currency;
}
public String getRate() {
return rate;
}
#Override
public String toString() {
return "CurrencyRate [currency=" + currency + ", rate=" + rate + "]";
}
// equals, hashCode,....
}

Date Format getting disturb when creating .CSV file in Java

I am creating a web scraper and then store the data in the .CSV file.
My program is running fine but, there is a problem that the website from where I am retrieving data have a date which is in (Month Day, Year) format. So when I save the data in .CSV file it will consider the Year as another column due to which all the data gets manipulated. I actually want to store that data into (MM-MON-YYYY) and store Validity date in one column. I am posting my code below. Kindly, help me out. Thanks!
P.S: I am sorry for not writing the format I want in the original post.
package com.mufapscraping;
//import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
//import java.util.Collections;
import java.util.Iterator;
//import java.util.List;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class ComMufapScraping {
boolean writeCSVToConsole = true;
boolean writeCSVToFile = true;
//String destinationCSVFile = "C:\\convertedCSV.csv";
boolean sortTheList = true;
boolean writeToConsole;
boolean writeToFile;
public static Document doc = null;
public static Elements tbodyElements = null;
public static Elements elements = null;
public static Elements tdElements = null;
public static Elements trElement2 = null;
public static String Dcomma = ", 2";
public static ArrayList<Elements> sampleList = new ArrayList<Elements>();
public static void createConnection() throws IOException {
System.setProperty("http.proxyHost", "191.1.1.123");
System.setProperty("http.proxyPort", "8080");
String tempUrl = "http://www.mufap.com.pk/nav_returns_performance.php?tab=01";
doc = Jsoup.connect(tempUrl).get();
}
public static void parsingHTML() throws Exception {
for (int i = 1; i <= 1; i++) {
tbodyElements = doc.getElementsByTag("tbody");
//Element table = doc.getElementById("dataTable");
if (tbodyElements.isEmpty()) {
throw new Exception("Table is not found");
}
elements = tbodyElements.get(0).getElementsByTag("tr");
for (Element trElement : elements) {
trElement2 = trElement.getElementsByTag("tr");
tdElements = trElement.getElementsByTag("td");
FileWriter sb = new FileWriter("C:\\convertedCSV2.csv", true);
for (Iterator<Element> it = tdElements.iterator(); it.hasNext();) {
if (it.hasNext()) {
sb.append(" \n ");
}
for (Iterator<Element> it2 = trElement2.iterator(); it.hasNext();) {
Element tdElement = it.next();
sb.append(tdElement.text());
if (it2.hasNext()) {
sb.append(" , ");
}
}
System.out.println(sb.toString());
sb.flush();
sb.close();
}
System.out.println(sampleList.add(tdElements));
/* for (Elements elements2 : zakazky) {
System.out.println(elements2);
}*/
}
}
}
public static void main(String[] args) throws IOException, Exception {
createConnection();
parsingHTML();
}
}
Instead of appeding directly the element text in the FileWriter, format it first then append it.
So, replace the following line:
sb.append(tdElement.text());
into
sb.append(formatData(tdElement.text()));
private static final SimpleDateFormat FORMATTER_MMM_d_yyyy = new SimpleDateFormat("MMM d, yyyy", Locale.US);
private static final SimpleDateFormat FORMATTER_dd_MMM_yyyy = new SimpleDateFormat("dd-MMM-YYYY", Locale.US);
public static String formatData(String text) {
String tmp = null;
try {
Date d = FORMATTER_MMM_d_yyyy.parse(text);
tmp = FORMATTER_dd_MMM_yyyy.format(d);
} catch (ParseException pe) {
tmp = text;
}
return tmp;
}
SAMPLE
public static void main(String[] args) {
String[] fields = new String[] { //
"ABL Cash Fund", //
"AA(f)", //
"Apr 18, 2016", //
"10.4729" //
};
for (String field : fields) {
System.out.format("%s\n%s\n\n", field, formatData(field));
}
}
OUTPUT
ABL Cash Fund
ABL Cash Fund
AA(f)
AA(f)
Apr 18, 2016
18-Apr-2016
10.4729
10.4729
Instead of using the method getElementsByTag many times you can use cssSelector which can be much easier and enables you to get the same output in few lines of code
public static void main (String []args) throws IOException{
String tempUrl = "http://www.mufap.com.pk/nav_returns_performance.php?tab=01";
Document doc = Jsoup.connect(tempUrl).get();
Elements trElements = doc.select("#dataTable tbody tr");
FileWriter sb = new FileWriter("C:\\convertedCSV2.csv", true);
for(Element tr : trElements){
Elements tdElements = tr.select("td");
for (Element td : tdElements){
sb.append(td.text());
sb.append(";");
}
sb.append("\n");
}
}
This could be achieved by simply surrounding your data with double quotes, so month day, year would become "month day, year". Here's modified code that does the job for you:
package com.mufapscraping;
//import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
//import java.util.Collections;
import java.util.Iterator;
//import java.util.List;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class ComMufapScraping {
boolean writeCSVToConsole = true;
boolean writeCSVToFile = true;
//String destinationCSVFile = "C:\\convertedCSV.csv";
boolean sortTheList = true;
boolean writeToConsole;
boolean writeToFile;
public static Document doc = null;
public static Elements tbodyElements = null;
public static Elements elements = null;
public static Elements tdElements = null;
public static Elements trElement2 = null;
public static String Dcomma = ", 2";
public static ArrayList<Elements> sampleList = new ArrayList<Elements>();
public static void createConnection() throws IOException {
System.setProperty("http.proxyHost", "191.1.1.123");
System.setProperty("http.proxyPort", "8080");
String tempUrl = "http://www.mufap.com.pk/nav_returns_performance.php?tab=01";
doc = Jsoup.connect(tempUrl).get();
}
public static void parsingHTML() throws Exception {
for (int i = 1; i <= 1; i++) {
tbodyElements = doc.getElementsByTag("tbody");
//Element table = doc.getElementById("dataTable");
if (tbodyElements.isEmpty()) {
throw new Exception("Table is not found");
}
elements = tbodyElements.get(0).getElementsByTag("tr");
for (Element trElement : elements) {
trElement2 = trElement.getElementsByTag("tr");
tdElements = trElement.getElementsByTag("td");
FileWriter sb = new FileWriter("C:\\convertedCSV2.csv", true);
for (Iterator<Element> it = tdElements.iterator(); it.hasNext();) {
if (it.hasNext()) {
sb.append(" \n ");
}
for (Iterator<Element> it2 = trElement2.iterator(); it.hasNext();) {
Element tdElement = it.next();
sb.append('\"'); // surround your data
sb.append(tdElement.text());
sb.append('\"'); // with double quotes
if (it2.hasNext()) {
sb.append(" , ");
}
}
System.out.println(sb.toString());
sb.flush();
sb.close();
}
System.out.println(sampleList.add(tdElements));
/* for (Elements elements2 : zakazky) {
System.out.println(elements2);
}*/
}
}
}
public static void main(String[] args) throws IOException, Exception {
createConnection();
parsingHTML();
}
}
Then you do want to split it. ok, then modify the first line by adding "year," column:
Element tdElement = it.next();
final String content = tdElement.text()
sb.append(content);
if (it2.hasNext()) {
sb.append(" , ");
if (content.equals("Validity Date"))
sb.append("Validity Year,");
you probably want to break after the for? or you'll overwrite the file elements.size()-1 times...
FileWriter sb = new FileWriter("C:\\convertedCSV2.csv", true);
for (Iterator<Element> it = tdElements.iterator(); it.hasNext();) { ... }
break;

java - xpath to get rows inside a table

I have an html file like: http://scholar.google.gr/citations?user=v9xULZwAAAAJ&hl=el
In this file exist a table with articles. I want to get the first 20 articles (if exist) with xpath.
I try to find fist article:
String str = (String) xpath.evaluate("//form[contains(#id,'citationsForm')]/div[2]/div[1]/table/tbody/tr[2]/td[#id='col-title']/a", docList.get(0), XPathConstants.STRING);
And its Ok! result: Modern information retrieval
for all articles:
String str = (String) xpath.evaluate("//form[contains(#id,'citationsForm')]/div[2]/div[1]/table/tbody/tr/td[#id='col-title']/a", docList.get(0), XPathConstants.STRING);
but do not work
Any Idea?
Than you!
EDIT:
Also I try:
NodeList result = (NodeList)xpath.evaluate("//form[contains(#id,'citationsForm')]/div[2]/div[1]/table/tbody/tr/td[#id='col-title']/a",
docList.get(0), XPathConstants.NODESET);
ArrayList<String>liste = new ArrayList<String>();
for(int i=0; i<result.getLength();i++){
System.out.println(result.item(i).getNodeValue());
liste.add(result.item(i).getNodeName());
}
EDIT 2 All code
Class FileOperation:
package xmlparse;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.xml.parsers.ParserConfigurationException;
import org.htmlcleaner.CleanerProperties;
import org.htmlcleaner.DomSerializer;
import org.htmlcleaner.HtmlCleaner;
import org.htmlcleaner.TagNode;
import org.w3c.dom.Document;
public class FileOperations {
private static final String path = "C:\\Users\\Dimitris\\Desktop\\authors";
public ArrayList<Document> getXmlDocumt() {
ArrayList<Document> xmlFileList = new ArrayList<>();
try {
ArrayList<File> listFiles = listFiles(path);
for (File f : listFiles) {
String html = readfile(f.getAbsolutePath());
xmlFileList.add(ConvertHtml2Xml(html) );
}
} catch (IOException ex) {
Logger.getLogger(FileOperations.class.getName()).log(Level.SEVERE, null, ex);
}
return xmlFileList;
}
private ArrayList<File> listFiles(String directoryName) throws IOException {
ArrayList<File> htmlfilelist = new ArrayList<>();
File directory = new File(directoryName);
//get all the files from a directory
File[] fList = directory.listFiles();
for (File file : fList) {
if (file.isFile()) {
htmlfilelist.add(file);
}
}
return htmlfilelist;
}
private String readfile(String file) throws FileNotFoundException, IOException {
String s = "";
FileReader fr = new FileReader(file);
BufferedReader br = new BufferedReader(fr);
StringBuilder content = new StringBuilder(1024);
while ((s = br.readLine()) != null) {
content.append(s);
}
//System.out.println(content.toString());
return content.toString();
}
private Document ConvertHtml2Xml(String html) {
TagNode tagNode = new HtmlCleaner().clean(html);
Document doc = null;
try {
doc = new DomSerializer(new CleanerProperties()).createDOM(tagNode);
} catch (ParserConfigurationException ex) {
Logger.getLogger(FileOperations.class.getName()).log(Level.SEVERE, null, ex);
}
return doc;
}
}
Class XpathQueries:
XPath xpath;
ArrayList<Document> docList;
public XpathQueries() {
xpath = XPathFactory.newInstance().newXPath();
FileOperations fo = new FileOperations();
docList = new ArrayList<>(fo.getXmlDocumt());
}
public void getArticle() throws XPathExpressionException {
// String str = (String) xpath.evaluate("//form[contains(#id,'citationsForm')]/div[2]/div[1]/table/tbody//td[1]/a",
// docList.get(0), XPathConstants.STRING);
String str = (String) xpath.evaluate("//*[#id='col-title']/a", docList.get(0), XPathConstants.STRING);
System.out.println(str);
}
}
Try with this:
Object result = xpath.evaluate("//*[#id='col-title']/a", docList.get(0), XPathConstants.STRING);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
Thank you for help.
The solution is:
int length;
Object result = xpath.evaluate("//a[contains(#href,'citation_for_view')]", docList.get(0), XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
length = nodes.getLength();
if(length>20){
length=20;
}
for (int i = 0; i < length; i++) {
System.out.println(nodes.item(i).getFirstChild().getNodeValue());
}

XML parsing with Child not value parsing

XML parsing with Child not value parsing
import java.io.File;
import java.io.FileInputStream;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import com.sun.org.apache.xml.internal.dtm.ref.DTMNodeList;
public class XPathEvaluator {
/*
* ServiceGroup serviceGroup = new ServiceGroup(); List<Service>
* requiredServices = new ArrayList<Service>(); List<Service>
* recommandedServices = new ArrayList<Service>(); Service service = new
* Service();
*/
public void evaluateDocument(File xmlDocument) {
try {
XPathFactory factory = XPathFactory.newInstance();
XPath xPath = factory.newXPath();
String requiredServicesExpression = "/Envelope/Header";
InputSource requiredServicesInputSource = new InputSource(
new FileInputStream(xmlDocument));
DTMNodeList requiredServicesNodes = (DTMNodeList) xPath.evaluate(
requiredServicesExpression, requiredServicesInputSource,
XPathConstants.NODESET);
System.out.println(requiredServicesNodes.getLength());
NodeList requiredNodeList = (NodeList) requiredServicesNodes;
for (int i = 0; i < requiredNodeList.getLength(); i++) {
Node node = requiredNodeList.item(i);
System.out.println(node.getChildNodes());
}
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] argv) {
XPathEvaluator evaluator = new XPathEvaluator();
File xmlDocument = new File("d://eva.xml");
evaluator.evaluateDocument(xmlDocument);
}
}
my xml is following in this i am try to parse header information
<?xml version="1.0" encoding="UTF-8"?>
<Envelope>
<Header>
<User id="MAKRISH"/>
<Request-Id id="1"/>
<Type name="Response"/>
<Application-Source name="vss" version="1.0"/>
<Application-Destination name="test" />
<Outgo-Timestamp date="2012-08-24" time="14:50:00"/>
<DealerCode>08301</DealerCode>
<Market>00000</Market>
</Header>
</Envelope>
i am not able to get Header child how can i get them it is giving me null on getchildNodes method. i have check for many solution but get any thing.
The following parsing is done with DOM as per tagging , i hope this should help you to solve
{
try{
File file = new File("xmlfile");
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(file);
Element root = document.getDocumentElement();
root.normalize();
printNode(root, 0);
} catch (Exception e) {
}
}
public static void printNode(Node node, int depth) {
if (node.getNodeType() == Node.TEXT_NODE) {
System.out.printf("%s%n", node.getNodeValue());
} else {
NamedNodeMap attributes = node.getAttributes();
if ((attributes == null) || (attributes.getLength() == 0)) {
System.out.printf("%s%n", node.getNodeName());
} else {
System.out.printf("%s ", node.getNodeName());
printAttributes(attributes);
}
}
NodeList children = node.getChildNodes();
for(int i=0; i<children.getLength(); i++) {
Node childNode = children.item(i);
printNode(childNode, depth+1);
}
}
private static void printAttributes(NamedNodeMap attributes) {
for(int i=0; i<attributes.getLength(); i++)
{
Node attribute = attributes.item(i);
System.out.printf(" %s=\"%s\"", attribute.getNodeName(),
attribute.getNodeValue());
}
}
}
The accepted answer to this related question has a good example of parsing xml using xpath.
I've debugged into your code, and the getChildNodes call is in fact not returning null, but it has got a confusing toString().

java find value of an xml attribute

this is my xml :
<-tobject.subject tobject.subject.refnum="01016000" />
<-tobject.subject tobject.subject.refnum="10004000" />
I want to extract 01016000 and 10004000 from it .
I used this code:
NodeList nodeLst4 = doc.getElementsByTagName("tobject.subject");
if (nodeLst4 != null) {
int numberofCOdes = nodeLst4.getLength();
aSubjectCodes = new String[numberofCOdes];
for (int i = 0; i < numberofCOdes; i++) {
XPath xpath = XPathFactory.newInstance().newXPath();
aSubjectCodes[i] = xpath.evaluate("//tobject.subject/#tobject.subject.refnum", doc);
the problem is that when i loop through it the evaluate method just return first number and do not give me the second value.
and i am not sure if using xpath.evaluate is good idea or not.
Thanks
there is no need to use the doc.getElementsByTagName.
You are mixing plain DOM with XPath.
Your xpath is correct:
package net.davymeers;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.Collection;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class XpathTest {
private static String XMLSTRING = "<data>"
+ "<tobject.subject tobject.subject.refnum=\"01016000\" />\r\n"
+ "\r\n"
+ "<tobject.subject tobject.subject.refnum=\"10004000\" />"
+ "</data>";
/**
* #param args
*/
public static void main(final String[] args) {
final Document doc = createDocument();
final XPath xpath = createXpath();
final NodeList nodes = findElements(
"//tobject.subject/#tobject.subject.refnum", doc, xpath);
final Collection<String> results = convertToCollection(nodes);
for (final String result : results) {
System.out.println(result);
}
}
private static Document createDocument() {
Document doc = null;
try {
final DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory
.newInstance();
documentBuilderFactory.setNamespaceAware(true); // never forget
// this!
final DocumentBuilder builder = documentBuilderFactory
.newDocumentBuilder();
doc = builder.parse(new ByteArrayInputStream(XMLSTRING
.getBytes("ISO-8859-1")));
} catch (final UnsupportedEncodingException exception) {
// TODO handle exception
} catch (final SAXException exception) {
// TODO handle exception
} catch (final IOException exception) {
// TODO handle exception
} catch (final ParserConfigurationException exception) {
// TODO handle exception
}
return doc;
}
private static XPath createXpath() {
final XPathFactory xpathFactory = XPathFactory.newInstance();
final XPath xpath = xpathFactory.newXPath();
return xpath;
}
private static NodeList findElements(final String xpathExpression,
final Document doc, final XPath xpath) {
NodeList nodes = null;
if (doc != null) {
try {
final XPathExpression expr = xpath.compile(xpathExpression);
final Object result = expr
.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
} catch (final XPathExpressionException exception) {
// TODO handle exception
}
}
return nodes;
}
private static Collection<String> convertToCollection(final NodeList nodes) {
final Collection<String> result = new ArrayList<String>();
if (nodes != null) {
for (int i = 0; i < nodes.getLength(); i++) {
result.add(nodes.item(i).getNodeValue());
}
}
return result;
}
}
Here's a useful class I found a while back for XMLFiles. Takes a lot of the work off of your shoulders.
import java.io.FileInputStream;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
/**
* XMLFile.java
*
* XML file object that represents an xml file and its properties. Used to
* simplify the process of reading from and writing to XML files.
*
* Derived from unknown source. Implemented on 12/03/09. Permission given to
* implement and modify code.
*/
public class XMLFile {
private String name;
private String content;
private Map<String, String> nameAttributes = new HashMap<String, String>();
private Map<String, List<XMLFile>> nameChildren = new HashMap<String, List<XMLFile>>();
private static Element rootElement(String filename, String rootName) {
FileInputStream fileInputStream = null;
try {
fileInputStream = new FileInputStream(filename);
DocumentBuilderFactory builderFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document document = builder.parse(fileInputStream);
Element rootElement = document.getDocumentElement();
if (!rootElement.getNodeName().equals(rootName))
throw new RuntimeException("Could not find root node: "
+ rootName);
return rootElement;
} catch (Exception exception) {
throw new RuntimeException(exception);
} finally {
if (fileInputStream != null) {
try {
fileInputStream.close();
} catch (Exception exception) {
throw new RuntimeException(exception);
}
}
}
}
/**
* #param (String) Filepath of XML File (String) Root of XML File
**/
public XMLFile(String filename, String rootName) {
this(rootElement(filename, rootName));
}
/**
* #param (Element) XML File Element
**/
private XMLFile(Element element) {
this.name = element.getNodeName();
this.content = element.getTextContent();
NamedNodeMap namedNodeMap = element.getAttributes();
int n = namedNodeMap.getLength();
for (int i = 0; i < n; i++) {
Node node = namedNodeMap.item(i);
String name = node.getNodeName();
addAttribute(name, node.getNodeValue());
}
NodeList nodes = element.getChildNodes();
n = nodes.getLength();
for (int i = 0; i < n; i++) {
Node node = nodes.item(i);
int type = node.getNodeType();
if (type == Node.ELEMENT_NODE)
addChild(node.getNodeName(), new XMLFile((Element) node));
}
}
/**
* Adds attribute to ???
*
* #param (String) Attribute Name (String) Attribute Value
**/
private void addAttribute(String name, String value) {
nameAttributes.put(name, value);
}
/**
* Adds child directory to ???
*
* #param (String) Name of New Child Directory (XMLFile) XML Documentation
* of Child
**/
private void addChild(String name, XMLFile child) {
List<XMLFile> children = nameChildren.get(name);
if (children == null) {
children = new ArrayList<XMLFile>();
nameChildren.put(name, children);
}
children.add(child);
}
public String name() {
return name;
}
public String content() {
return content;
}
/**
*
**/
public XMLFile child(String name) {
List<XMLFile> children = children(name);
if (children.size() != 1)
throw new RuntimeException("Could not find individual child node: "
+ name);
return children.get(0);
}
/**
*
**/
public List<XMLFile> children(String name) {
List<XMLFile> children = nameChildren.get(name);
return children == null ? new ArrayList<XMLFile>() : children;
}
/**
* Gets the value of a specific field and converts it to a String object
*
* #param (String) Name of Field
**/
public String string(String name) {
String value = nameAttributes.get(name);
if (value == null)
throw new RuntimeException("Could not find attribute: " + name
+ ", in node: " + this.name);
return value;
}
/**
* Gets the value of a specific field and converts it to an int
*
* #param (String) Name of Field
**/
public int integer(String name) {
return Integer.parseInt(string(name));
}
/**
* Gets the value of a specific field and converts it to an
* ArrayList<String>
*
* #param (String) Name of Field
**/
public ArrayList<String> arrayListString(String name) {
String left = new String();
int finished = 0;
ArrayList<String> list = new ArrayList<String>();
try {
left = nameAttributes.get(name);
} catch (Exception e) {
System.err.println("Exception: " + e.getMessage());
}
while (finished == 0) {
if (left.indexOf(", ") > -1) {
list.add(left.substring(0, left.indexOf(", ")));
left = left.substring(left.indexOf(", ") + 2);
} else {
list.add(left);
finished = 1;
}
}
return list;
}
}
Would this work for you? I am parsing RSS XML from here:
http://www.kraftfoods.com/rss/dinnerRecipes.aspx
Look at Media and URL towards the bottom:
package recipeSearchAndFinder.xml;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class DomFeedParser extends BaseFeedParser {
public DomFeedParser(String feedUrl) {
super(feedUrl);
}
public List<Message> parse() {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
List<Message> messages = new ArrayList<Message>();
try {
DocumentBuilder builder = factory.newDocumentBuilder();
Document dom = builder.parse(this.getInputStream());
Element root = dom.getDocumentElement();
NodeList items = root.getElementsByTagName(ITEM);
for (int i = 0; i < items.getLength(); i++) {
Message message = new Message();
Node item = items.item(i);
NodeList properties = item.getChildNodes();
for (int j = 0; j < properties.getLength(); j++) {
Node property = properties.item(j);
String name = property.getNodeName();
if (name.equalsIgnoreCase(TITLE)) {
message.setTitle(property.getFirstChild()
.getNodeValue());
} else if (name.equalsIgnoreCase(LINK)) {
message.setLink(property.getFirstChild().getNodeValue());
} else if (name.equalsIgnoreCase(DESCRIPTION)) {
StringBuilder text = new StringBuilder();
NodeList chars = property.getChildNodes();
for (int k = 0; k < chars.getLength(); k++) {
text.append(chars.item(k).getNodeValue());
}
message.setDescription(text.toString());
} else if (name.equalsIgnoreCase(PUB_DATE)) {
message.setDate(property.getFirstChild().getNodeValue());
} else if (name.equalsIgnoreCase(MEDIA)) {
NamedNodeMap nMap = property.getAttributes();
String mediaurl = nMap.getNamedItem("url")
.getNodeValue();
message.setMedia(mediaurl);
}
}
messages.add(message);
}
} catch (Exception e) {
throw new RuntimeException(e);
}
return messages;
}
}

Categories