java - xpath to get rows inside a table - java

I have an html file like: http://scholar.google.gr/citations?user=v9xULZwAAAAJ&hl=el
In this file exist a table with articles. I want to get the first 20 articles (if exist) with xpath.
I try to find fist article:
String str = (String) xpath.evaluate("//form[contains(#id,'citationsForm')]/div[2]/div[1]/table/tbody/tr[2]/td[#id='col-title']/a", docList.get(0), XPathConstants.STRING);
And its Ok! result: Modern information retrieval
for all articles:
String str = (String) xpath.evaluate("//form[contains(#id,'citationsForm')]/div[2]/div[1]/table/tbody/tr/td[#id='col-title']/a", docList.get(0), XPathConstants.STRING);
but do not work
Any Idea?
Than you!
EDIT:
Also I try:
NodeList result = (NodeList)xpath.evaluate("//form[contains(#id,'citationsForm')]/div[2]/div[1]/table/tbody/tr/td[#id='col-title']/a",
docList.get(0), XPathConstants.NODESET);
ArrayList<String>liste = new ArrayList<String>();
for(int i=0; i<result.getLength();i++){
System.out.println(result.item(i).getNodeValue());
liste.add(result.item(i).getNodeName());
}
EDIT 2 All code
Class FileOperation:
package xmlparse;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.xml.parsers.ParserConfigurationException;
import org.htmlcleaner.CleanerProperties;
import org.htmlcleaner.DomSerializer;
import org.htmlcleaner.HtmlCleaner;
import org.htmlcleaner.TagNode;
import org.w3c.dom.Document;
public class FileOperations {
private static final String path = "C:\\Users\\Dimitris\\Desktop\\authors";
public ArrayList<Document> getXmlDocumt() {
ArrayList<Document> xmlFileList = new ArrayList<>();
try {
ArrayList<File> listFiles = listFiles(path);
for (File f : listFiles) {
String html = readfile(f.getAbsolutePath());
xmlFileList.add(ConvertHtml2Xml(html) );
}
} catch (IOException ex) {
Logger.getLogger(FileOperations.class.getName()).log(Level.SEVERE, null, ex);
}
return xmlFileList;
}
private ArrayList<File> listFiles(String directoryName) throws IOException {
ArrayList<File> htmlfilelist = new ArrayList<>();
File directory = new File(directoryName);
//get all the files from a directory
File[] fList = directory.listFiles();
for (File file : fList) {
if (file.isFile()) {
htmlfilelist.add(file);
}
}
return htmlfilelist;
}
private String readfile(String file) throws FileNotFoundException, IOException {
String s = "";
FileReader fr = new FileReader(file);
BufferedReader br = new BufferedReader(fr);
StringBuilder content = new StringBuilder(1024);
while ((s = br.readLine()) != null) {
content.append(s);
}
//System.out.println(content.toString());
return content.toString();
}
private Document ConvertHtml2Xml(String html) {
TagNode tagNode = new HtmlCleaner().clean(html);
Document doc = null;
try {
doc = new DomSerializer(new CleanerProperties()).createDOM(tagNode);
} catch (ParserConfigurationException ex) {
Logger.getLogger(FileOperations.class.getName()).log(Level.SEVERE, null, ex);
}
return doc;
}
}
Class XpathQueries:
XPath xpath;
ArrayList<Document> docList;
public XpathQueries() {
xpath = XPathFactory.newInstance().newXPath();
FileOperations fo = new FileOperations();
docList = new ArrayList<>(fo.getXmlDocumt());
}
public void getArticle() throws XPathExpressionException {
// String str = (String) xpath.evaluate("//form[contains(#id,'citationsForm')]/div[2]/div[1]/table/tbody//td[1]/a",
// docList.get(0), XPathConstants.STRING);
String str = (String) xpath.evaluate("//*[#id='col-title']/a", docList.get(0), XPathConstants.STRING);
System.out.println(str);
}
}

Try with this:
Object result = xpath.evaluate("//*[#id='col-title']/a", docList.get(0), XPathConstants.STRING);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}

Thank you for help.
The solution is:
int length;
Object result = xpath.evaluate("//a[contains(#href,'citation_for_view')]", docList.get(0), XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
length = nodes.getLength();
if(length>20){
length=20;
}
for (int i = 0; i < length; i++) {
System.out.println(nodes.item(i).getFirstChild().getNodeValue());
}

Related

How to read an online XML file for currency rates in java

I'm building a simple currency converter which needs to sue online rates. I found the following API from the European Central Bank to use:
http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml
My problem is im struggling to implement it. Here is what i have so far after using a bunch of different sources to try and get this code together.
try{
URL url = new URL("http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();
NodeList nodeList1 = doc.getElementsByTagName("Cube");
for(int i = 0; i < nodeList1.getLength(); i++){
Node node = nodeList1.item(i);
}
}
catch(Exception e){
}
So what i thought is that this code would take down all the nodes which tart with "Cube", and contain the rates.
Anyone have an easier wya to pull down the rates from the API into an array in the order they appear on the XML as that's all I'm trying to do
Thanks
XPath is one way to answer this, since you just want to extract information from the XML and not change the XML. The structure of the XML suggests that you're looking for nodes that are Cube nodes, that are child of Cube which is also a child of Cube -- Cube nested three times, so extract nodes with an XPath compiled using this String: "//Cube/Cube/Cube". This looks for nodes that have Cube nested 3 times located anywhere (the //) in the Document:
XPathExpression expr = xpath.compile("//Cube/Cube/Cube");
Then check the nodes for a "currency" attribute. If they have this, then they also have a "rate" attribute, and then extract this information.
NamedNodeMap attribs = node.getAttributes();
if (attribs.getLength() > 0) {
Node currencyAttrib = attribs.getNamedItem(CURRENCY);
if (currencyAttrib != null) {
String currencyTxt = currencyAttrib.getNodeValue();
String rateTxt = attribs.getNamedItem(RATE).getNodeValue();
// ...
}
}
Where CURRENCY = "currency" and RATE = "rate"
For example:
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class TestXPath {
private static final String CURRENCY = "currency";
private static final String CUBE_NODE = "//Cube/Cube/Cube";
private static final String RATE = "rate";
public static void main(String[] args) {
List<CurrencyRate> currRateList = new ArrayList<>();
DocumentBuilderFactory builderFactory =
DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
try {
builder = builderFactory.newDocumentBuilder();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
Document document = null;
String spec = "http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml";
try {
URL url = new URL(spec);
InputStream is = url.openStream();
document = builder.parse(is);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
String xPathString = CUBE_NODE;
XPathExpression expr = xpath.compile(xPathString);
NodeList nl = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < nl.getLength(); i++) {
Node node = nl.item(i);
NamedNodeMap attribs = node.getAttributes();
if (attribs.getLength() > 0) {
Node currencyAttrib = attribs.getNamedItem(CURRENCY);
if (currencyAttrib != null) {
String currencyTxt = currencyAttrib.getNodeValue();
String rateTxt = attribs.getNamedItem(RATE).getNodeValue();
currRateList.add(new CurrencyRate(currencyTxt, rateTxt));
}
}
}
} catch (SAXException | IOException | XPathExpressionException e) {
e.printStackTrace();
}
for (CurrencyRate currencyRate : currRateList) {
System.out.println(currencyRate);
}
}
}
public class CurrencyRate {
private String currency;
private String rate; // ?double
public CurrencyRate(String currency, String rate) {
super();
this.currency = currency;
this.rate = rate;
}
public String getCurrency() {
return currency;
}
public String getRate() {
return rate;
}
#Override
public String toString() {
return "CurrencyRate [currency=" + currency + ", rate=" + rate + "]";
}
// equals, hashCode,....
}

How to parse HTML with java properly?

Scenario/Requirement:
Download html page from some URL
Download images that were mentioned in html tags.
Change tags for images in my file, so I can open it with my browser offline and see them.
I made first 2 points, but am having difficulties with the third one.Tags do not change.What am I doing wrong?
The job is to open a file, find img src tag and replace it by another tag! Can you give me an example?
Code:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.*;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.ParserDelegator;
import java.awt.image.BufferedImage;
import java.net.URL;
import java.net.URLConnection;
import javax.imageio.ImageIO;
import javax.swing.text.AttributeSet;
import javax.swing.text.html.HTMLDocument;
public class ExtractAllImages {
static String result_doc = "/home/foo/index.html";
static String home_folder = "/home/foo/";
static String start_webURL = "http://www.oracle.com/";
public static void main(String args[]) throws Exception {
String webUrl = start_webURL;
URL url = new URL(webUrl);
URLConnection connection = url.openConnection();
InputStream is = connection.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
HTMLEditorKit htmlKit = new HTMLEditorKit();
HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
HTMLEditorKit.Parser parser = new ParserDelegator();
HTMLEditorKit.ParserCallback callback = htmlDoc.getReader(0);
parser.parse(br, callback, true);
FileWriter writer = new FileWriter(result_doc);
htmlKit.write(writer, htmlDoc, 0, htmlDoc.getLength());
writer.close();
int number_or_images = 0;
String[] array = new String[4096];
for (HTMLDocument.Iterator iterator = htmlDoc.getIterator(HTML.Tag.IMG); iterator.isValid(); iterator.next()) {
AttributeSet attributes = iterator.getAttributes();
String imgSrc = (String) attributes.getAttribute(HTML.Attribute.SRC);
System.out.println("img_src = " + imgSrc);
if (imgSrc != null && (imgSrc.endsWith(".jpg") || (imgSrc.endsWith(".png")) || (imgSrc.endsWith(".jpeg")) || (imgSrc.endsWith(".bmp")) || (imgSrc.endsWith(".ico")))) {
try {
downloadImage(webUrl, imgSrc);
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
array[number_or_images] = imgSrc;
number_or_images++;
///TODO change
}
for(int i =0; i < number_or_images; i++)
{
System.out.println("before = "+array[i]);
while(true)
{
int count = array[i].indexOf('/');
if(count == -1) break;
array[i] = array[i].substring(count+1);
}
System.out.println("after = " + array[i]);
}
//TODO open file and replace tags
int i =0;
File input = new File(result_doc);
Document doc = Jsoup.parse(input, "UTF-8");
System.out.println( input.canWrite());
for( Element img : doc.select("img[src]") )
{
String s = img.attr("src");
System.out.println(s);
img.attr("src", "/home/foo/"+array[i]); // set attribute 'src' to 'your-source-here'
s = img.attr("src");
System.out.println(s);
++i;
}
}
private static void downloadImage(String url, String imgSrc) throws IOException {
BufferedImage image = null;
try {
if (!(imgSrc.startsWith("http"))) {
url = url + imgSrc;
} else {
url = imgSrc;
}
imgSrc = imgSrc.substring(imgSrc.lastIndexOf("/") + 1);
String imageFormat = null;
imageFormat = imgSrc.substring(imgSrc.lastIndexOf(".") + 1);
String imgPath = null;
imgPath = home_folder + imgSrc + "";
URL imageUrl = new URL(url);
image = ImageIO.read(imageUrl);
if (image != null) {
File file = new File(imgPath);
ImageIO.write(image, imageFormat, file);
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
Solved.
I didn't save changes. Need to add code befire "downloadImage()"
int i = 0;
File input = new File(result_doc);
Document doc = Jsoup.parse(input, "UTF-8");
for( Element img : doc.select("img[src]") ) {
img.attr("src",array[i]); // set attribute 'src' to 'your-source-here'
++i;
}
try {
String strmb = doc.outerHtml();
bw = new BufferedWriter(new FileWriter(result_doc));
bw.write(strmb);
bw.close();
}
catch (Exception ex) {
System.out.println("Program stopped. The problem is " + "\"" +
ex.getMessage()+"\"");
}
You can go with JSOUP
Try something like below
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public static void getAllTags(){
try {
File input=new File("H:\\html pages\\index1.html");
Document document=Jsoup.parse(input, "UTF-8");
Document parse=Jsoup.parse(document.html());
Elements body=parse.select("body");
Elements bodyTags=body.select("*");
for (Element element : bodyTags) {
//Do what you want with tag
System.out.println(element.tagName());
}
} catch (Exception e) {
e.printStackTrace();
}
If you want to parse html then try this
public static void parseHTML(){
try {
File input = new File("H:\\html\\index1.html");
Document document = Jsoup.parse(input, "UTF-8");
Document parse = Jsoup.parse(document.html());
Elements bodyElements = parse.select("div");
Elements elements = bodyElements.select("*");
for (Element element : elements) {
FilterHtml.setHtmlTAG(element.tagName());
FilterHtml.ParseXml();
Elements body = bodyElements.select(FilterHtml.getXmlTAG());
if (body.is(FilterHtml.getXmlTAG())) {
Elements tag = parse.select(FilterHtml.getXmlTAG());
//Do something meaning full with tag
System.out.println(tag.text());
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
Hope this would help. if yes please mark it green.

Java DOM XML parsing :: getting element attribute value

How can i extract attribute value out of the element. My xml node is writen like this
< nodename attribute="value" > i need to extract it out to compare it against another string.
But since i am not calling document.getElementsByTag then i cant use .getAttribute("att.").getNodeValue to get the value.
Instead i have a NodeList and getAttribute() does not have getNodeValue.
package dev;
import java.io.*;
import java.util.*;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
public class Parser {
static String def = "\"admin\",\"base\",\"Default\",\"simple\"";
static String category = "";
static String sku = "";
static String has_options = "0";
static String name = "";
static String image = "";
static String small_image = "";
static String thumbnail = "";
public static void toCSV() {
try {
BufferedWriter output = new BufferedWriter(new FileWriter("sim.csv", true));
output.newLine();
output.write(def);
output.write(String.format(",\"%s\",\"%s\",\"%s\"", category, sku, has_options));
output.write(String.format(",\"%s\",\"%s\",\"%s\",\"%s\"", name, image, small_image, thumbnail));
output.flush();
output.close();
} catch(Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
toCSV();
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new File("input.asp.xml"));
document.getDocumentElement().normalize();
NodeList list = document.getElementsByTagName("izdelek");
for(int i = 0; i < 1; i++) {
NodeList child = list.item(i).getChildNodes();
for(int j = 0; j < child.getLength(); j++) {
if(child.item(j).getNodeName().equals("kategorija")) {
category = child.item(j).getTextContent().trim();
} else if(child.item(j).getNodeName().equals("ean")) {
sku = child.item(j).getTextContent().trim();
} else if(child.item(j).getNodeName().equals("izdelekIme")) {
name = child.item(j).getTextContent().trim();
} else if(child.item(j).getNodeName().equals("slikaMala")) {
small_image = child.item(j).getTextContent().trim();
thumbnail = child.item(j).getTextContent().trim();
} else if(child.item(j).getNodeName().equals("slikaVelika")) {
image = child.item(j).getTextContent().trim();
} else if(child.item(j).getNodeName().equals("dodatneLastnosti")) {
NodeList subs = child.item(j).getChildNodes();
// ^ need to parse these nodes they are written as <nodename attribute="value">
// i need to print out the value
}
}
//toCSV();
}
} catch(Exception io) {
io.printStackTrace();
}
}
}
Solved:
XML input:
< nodename attribute="value" > Something </ nodename>
Java code:
NodesList subs = child.item(j).getChildNodes();
System.out.println(subs.item(0).getTextContent()); // >> Something
Element element = (Element) document.adoptNode(subs.item(0));
System.out.println(element.getAttribute("attribute")); // >> value
You also can use this,
child.item(j).getFirstChild().getNodeValue();

Regarding XPath using Java

I have a problem in getting the value of an element by providing the XPath using java. I tried lot of things but could not succeed.
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.StringReader;
import java.util.HashMap;
import java.util.Map;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import com.dell.logistics.framework.transform.NamespaceContext;
public class GetXPath {
protected Object evaluate(String xpathStr, String xml, String namespaces) throws XPathExpressionException {
InputSource inputSource = new InputSource(new StringReader(xml));
XPathFactory factory = XPathFactory.newInstance();
XPath xPath = factory.newXPath();
NamespaceContext nsContext = new NamespaceContext();
nsContext.setNamespacesMap(getNsMap(namespaces));
//System.out.println(nsContext.getPrefix(namespaces));
xPath.setNamespaceContext(nsContext);
XPathExpression xpExp = xPath.compile(xpathStr);
return xpExp.evaluate(inputSource, XPathConstants.NODESET);
}
private Map<String, String> getNsMap(String namespaces) {
String delims = ",";
String[] nsKeyValue = namespaces.split(delims);
Map<String, String> mp = new HashMap<String, String>();
for (String string : nsKeyValue) {
mp.put(string.split("=")[0], string.split("=")[1]);
System.out.println(string.split("=")[0] + string.split("=")[1]);
}
return mp;
}
public static String readFile(String fileName) {
try {
// InputStream is = null;
InputStream is = GetWorkOrderDataExtractor.class.getResourceAsStream(fileName);
BufferedReader br = new BufferedReader(new InputStreamReader(is));
StringBuffer sb = new StringBuffer();
String l = null;
while ((l = br.readLine()) != null) {
sb.append(l).append("\n");
}
return sb.toString();
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
public static void main(String[] args)
throws ParserConfigurationException, SAXException,
IOException, XPathExpressionException {
GetXPath g = new GetXPath();
String xml = readFile("fooewo.xml");
String value = null;
System.out.println(xml);
NodeList containerNodes = (NodeList) g.evaluate(
"/demo",xml,
"a=http://schemas.demo.com/it/WorkOrderChannelAckNackResponse/1.0");
try{
for (int i = 0; i < containerNodes.getLength(); i++) {
// get the node value.
value = containerNodes.item(i).getTextContent();
System.out.println(value);
}
System.out.println("Node Found : " + containerNodes.getLength() + " times");
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
"
XML file:
<?xml version="1.0" encoding="utf-8"?>
<demo xmlns="try with ur schema">
<test>
<value>10</value>
<color>red</color>
<animal>dog</animal>
<day>13</day>
<age>22</age>
</test>
<test>
<value>20</value>
<color>green</color>
<animal>cat</animal>
<day>12</day>
<age>23</age>
</test>
</demo>
Any help appreciated.
Thanks,
Pradeep
I think the best way to evaluate XPath easily is using AXIOMXPath
Here is an example,
OMElement documentElement = new StAXOMBuilder(inStreamToXML).getDocumentElement();
AXIOMXPath xpathExpression = new AXIOMXPath ("/demo");
List nodeList = (OMNode)xpathExpression.selectNodes(documentElement);
By traversing the list you can get the result easily.

java find value of an xml attribute

this is my xml :
<-tobject.subject tobject.subject.refnum="01016000" />
<-tobject.subject tobject.subject.refnum="10004000" />
I want to extract 01016000 and 10004000 from it .
I used this code:
NodeList nodeLst4 = doc.getElementsByTagName("tobject.subject");
if (nodeLst4 != null) {
int numberofCOdes = nodeLst4.getLength();
aSubjectCodes = new String[numberofCOdes];
for (int i = 0; i < numberofCOdes; i++) {
XPath xpath = XPathFactory.newInstance().newXPath();
aSubjectCodes[i] = xpath.evaluate("//tobject.subject/#tobject.subject.refnum", doc);
the problem is that when i loop through it the evaluate method just return first number and do not give me the second value.
and i am not sure if using xpath.evaluate is good idea or not.
Thanks
there is no need to use the doc.getElementsByTagName.
You are mixing plain DOM with XPath.
Your xpath is correct:
package net.davymeers;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.Collection;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class XpathTest {
private static String XMLSTRING = "<data>"
+ "<tobject.subject tobject.subject.refnum=\"01016000\" />\r\n"
+ "\r\n"
+ "<tobject.subject tobject.subject.refnum=\"10004000\" />"
+ "</data>";
/**
* #param args
*/
public static void main(final String[] args) {
final Document doc = createDocument();
final XPath xpath = createXpath();
final NodeList nodes = findElements(
"//tobject.subject/#tobject.subject.refnum", doc, xpath);
final Collection<String> results = convertToCollection(nodes);
for (final String result : results) {
System.out.println(result);
}
}
private static Document createDocument() {
Document doc = null;
try {
final DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory
.newInstance();
documentBuilderFactory.setNamespaceAware(true); // never forget
// this!
final DocumentBuilder builder = documentBuilderFactory
.newDocumentBuilder();
doc = builder.parse(new ByteArrayInputStream(XMLSTRING
.getBytes("ISO-8859-1")));
} catch (final UnsupportedEncodingException exception) {
// TODO handle exception
} catch (final SAXException exception) {
// TODO handle exception
} catch (final IOException exception) {
// TODO handle exception
} catch (final ParserConfigurationException exception) {
// TODO handle exception
}
return doc;
}
private static XPath createXpath() {
final XPathFactory xpathFactory = XPathFactory.newInstance();
final XPath xpath = xpathFactory.newXPath();
return xpath;
}
private static NodeList findElements(final String xpathExpression,
final Document doc, final XPath xpath) {
NodeList nodes = null;
if (doc != null) {
try {
final XPathExpression expr = xpath.compile(xpathExpression);
final Object result = expr
.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
} catch (final XPathExpressionException exception) {
// TODO handle exception
}
}
return nodes;
}
private static Collection<String> convertToCollection(final NodeList nodes) {
final Collection<String> result = new ArrayList<String>();
if (nodes != null) {
for (int i = 0; i < nodes.getLength(); i++) {
result.add(nodes.item(i).getNodeValue());
}
}
return result;
}
}
Here's a useful class I found a while back for XMLFiles. Takes a lot of the work off of your shoulders.
import java.io.FileInputStream;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
/**
* XMLFile.java
*
* XML file object that represents an xml file and its properties. Used to
* simplify the process of reading from and writing to XML files.
*
* Derived from unknown source. Implemented on 12/03/09. Permission given to
* implement and modify code.
*/
public class XMLFile {
private String name;
private String content;
private Map<String, String> nameAttributes = new HashMap<String, String>();
private Map<String, List<XMLFile>> nameChildren = new HashMap<String, List<XMLFile>>();
private static Element rootElement(String filename, String rootName) {
FileInputStream fileInputStream = null;
try {
fileInputStream = new FileInputStream(filename);
DocumentBuilderFactory builderFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document document = builder.parse(fileInputStream);
Element rootElement = document.getDocumentElement();
if (!rootElement.getNodeName().equals(rootName))
throw new RuntimeException("Could not find root node: "
+ rootName);
return rootElement;
} catch (Exception exception) {
throw new RuntimeException(exception);
} finally {
if (fileInputStream != null) {
try {
fileInputStream.close();
} catch (Exception exception) {
throw new RuntimeException(exception);
}
}
}
}
/**
* #param (String) Filepath of XML File (String) Root of XML File
**/
public XMLFile(String filename, String rootName) {
this(rootElement(filename, rootName));
}
/**
* #param (Element) XML File Element
**/
private XMLFile(Element element) {
this.name = element.getNodeName();
this.content = element.getTextContent();
NamedNodeMap namedNodeMap = element.getAttributes();
int n = namedNodeMap.getLength();
for (int i = 0; i < n; i++) {
Node node = namedNodeMap.item(i);
String name = node.getNodeName();
addAttribute(name, node.getNodeValue());
}
NodeList nodes = element.getChildNodes();
n = nodes.getLength();
for (int i = 0; i < n; i++) {
Node node = nodes.item(i);
int type = node.getNodeType();
if (type == Node.ELEMENT_NODE)
addChild(node.getNodeName(), new XMLFile((Element) node));
}
}
/**
* Adds attribute to ???
*
* #param (String) Attribute Name (String) Attribute Value
**/
private void addAttribute(String name, String value) {
nameAttributes.put(name, value);
}
/**
* Adds child directory to ???
*
* #param (String) Name of New Child Directory (XMLFile) XML Documentation
* of Child
**/
private void addChild(String name, XMLFile child) {
List<XMLFile> children = nameChildren.get(name);
if (children == null) {
children = new ArrayList<XMLFile>();
nameChildren.put(name, children);
}
children.add(child);
}
public String name() {
return name;
}
public String content() {
return content;
}
/**
*
**/
public XMLFile child(String name) {
List<XMLFile> children = children(name);
if (children.size() != 1)
throw new RuntimeException("Could not find individual child node: "
+ name);
return children.get(0);
}
/**
*
**/
public List<XMLFile> children(String name) {
List<XMLFile> children = nameChildren.get(name);
return children == null ? new ArrayList<XMLFile>() : children;
}
/**
* Gets the value of a specific field and converts it to a String object
*
* #param (String) Name of Field
**/
public String string(String name) {
String value = nameAttributes.get(name);
if (value == null)
throw new RuntimeException("Could not find attribute: " + name
+ ", in node: " + this.name);
return value;
}
/**
* Gets the value of a specific field and converts it to an int
*
* #param (String) Name of Field
**/
public int integer(String name) {
return Integer.parseInt(string(name));
}
/**
* Gets the value of a specific field and converts it to an
* ArrayList<String>
*
* #param (String) Name of Field
**/
public ArrayList<String> arrayListString(String name) {
String left = new String();
int finished = 0;
ArrayList<String> list = new ArrayList<String>();
try {
left = nameAttributes.get(name);
} catch (Exception e) {
System.err.println("Exception: " + e.getMessage());
}
while (finished == 0) {
if (left.indexOf(", ") > -1) {
list.add(left.substring(0, left.indexOf(", ")));
left = left.substring(left.indexOf(", ") + 2);
} else {
list.add(left);
finished = 1;
}
}
return list;
}
}
Would this work for you? I am parsing RSS XML from here:
http://www.kraftfoods.com/rss/dinnerRecipes.aspx
Look at Media and URL towards the bottom:
package recipeSearchAndFinder.xml;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class DomFeedParser extends BaseFeedParser {
public DomFeedParser(String feedUrl) {
super(feedUrl);
}
public List<Message> parse() {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
List<Message> messages = new ArrayList<Message>();
try {
DocumentBuilder builder = factory.newDocumentBuilder();
Document dom = builder.parse(this.getInputStream());
Element root = dom.getDocumentElement();
NodeList items = root.getElementsByTagName(ITEM);
for (int i = 0; i < items.getLength(); i++) {
Message message = new Message();
Node item = items.item(i);
NodeList properties = item.getChildNodes();
for (int j = 0; j < properties.getLength(); j++) {
Node property = properties.item(j);
String name = property.getNodeName();
if (name.equalsIgnoreCase(TITLE)) {
message.setTitle(property.getFirstChild()
.getNodeValue());
} else if (name.equalsIgnoreCase(LINK)) {
message.setLink(property.getFirstChild().getNodeValue());
} else if (name.equalsIgnoreCase(DESCRIPTION)) {
StringBuilder text = new StringBuilder();
NodeList chars = property.getChildNodes();
for (int k = 0; k < chars.getLength(); k++) {
text.append(chars.item(k).getNodeValue());
}
message.setDescription(text.toString());
} else if (name.equalsIgnoreCase(PUB_DATE)) {
message.setDate(property.getFirstChild().getNodeValue());
} else if (name.equalsIgnoreCase(MEDIA)) {
NamedNodeMap nMap = property.getAttributes();
String mediaurl = nMap.getNamedItem("url")
.getNodeValue();
message.setMedia(mediaurl);
}
}
messages.add(message);
}
} catch (Exception e) {
throw new RuntimeException(e);
}
return messages;
}
}

Categories