Parsing http returned xml with java - java

So I've tried searching and searching on how to do this but I keep seeing a lot of complicated answers for what I need. I basically am using the Flurry Analytics API to return some xml code from an HTTP request and this is what it returns.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<eventMetrics type="Event" startDate="2011-2-28" eventName="Tip Calculated" endDate="2011-3-1" version="1.0" generatedDate="3/1/11 11:32 AM">
<day uniqueUsers="1" totalSessions="24" totalCount="3" date="2011-02-28"/>
<day uniqueUsers="0" totalSessions="0" totalCount="0" date="2011-03-01"/>
<parameters/>
</eventMetrics>
All I want to get is that totalCount number which is 3 with Java to an int or string. I've looked at the different DOM and SAX methods and they seem to grab information outside of the tags. Is there someway I can just grab totalCount within the tag?
Thanks,
Update
I found this url -http://www.androidpeople.com/android-xml-parsing-tutorial-%E2%80%93-using-domparser/
That helped me considering it was in android. But I thank everyone who responded for helping me out. I checked out every answer and it helped out a little bit for getting to understand what's going on. However, now I can't seem to grab the xml from my url because it requires an HTTP post first to then get the xml. When it goes to grab xml from my url it just says file not found.
Update 2
I got it all sorted out reading it in now and getting the xml from Flurry Analytics (for reference if anyone stumbles upon this question)
HTTP request for XML file

totalCount is what we call an attribute. If you're using the org.w3c.dom API, you call getAttribute("totalCount") on the appropriate element.

If you are using an SAX handler, override the startElement callback method to access attributes:
public void startElement (String uri, String name, String qName, Attributes atts)
{
if("day".equals (qName)) {
String total = attrs.getValue("totalCount");
}
}

A JDOM example. Note the use of SAXBuilder to load the document.
URL httpSource = new URL("some url string");
Document document = SAXBuilder.build(httpSource);
List<?> elements = document.getDescendants(new KeyFilter());
for (Element e : elements) {
//do something more useful with it than this
String total = (Element) e.getAttributeValue("totalCount");
}
class KeyFilter implements Filter {
public boolean matches (Object obj) {
return (Element) obj.getName().equals("key");
}
}

I think that the simplest way is to use XPath, below is an example based on vtd-xml.
import com.ximpleware.*;
public class test {
public static void main(String[] args) throws Exception {
String xpathExpr = "/eventMetrics/day/#totalCount";
VTDGen vg = new VTDGen();
int i = -1;
if (vg.parseHttpUrl("http://localhost/test.xml", true)) {
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot();
ap.selectXPath(xpathExpr);
ap.bind(vn);
System.out.println("total count "+(int)ap.evalXPathtoDouble());
}
}
}

Related

How do I retrieve one specific piece of data from a XML file using XMLreader?

A real example of the XML data I have to parse through and how the file is configured. this is how the file is presented to me.
<?xml version="1.0"?>
<session>
<values>
<value id="FILE_CREATE_DATE">
<timestamp>2012-04-16T21:33:31Z</timestamp>
</value>
<value id="LAST_ACCESSED">
<timestamp>2012-09-17T17:15:23Z</timestamp>
</value>
<value id="VERSION_TIMESTAMP">
<timestamp>2012-04-16T21:33:31Z</timestamp>
</value>
</values>
</session>
I need to go into this file and retrieve the FILE_CREATE_DATE data.
My code so far:
File xmlFile = new File(XMLFileData[i].getPath());
FileInputStream myXMLStream = new FileInputStream(xmlFile);
XMLInputFactory XMLFactory = XMLInputFactory.newInstance();
XMLStreamReader XMLReader = XMLFactory.createXMLStreamReader(myXMLStream);
while(XMLReader.hasNext())
{
if (XMLReader.getEventType() == XMLStreamReader.START_ELEMENT)
{
String XMLTag = XMLReader.getLocalName();
if(XMLReader.hasText())
{
System.out.println(XMLReader.getText());
break;
}
}
XMLReader.next();
}
the 'getLocalName()' function returns 'Sessions' then 'value' then 'values' but never returns the actual name of the element. I need to test to see if I am at the right element then retrieve the data from that element...
I use Jsoup which is a library for parsing HTML. But it can be used for xml too. you would first have to load the XML file into a Document object then simply call
doc.getElementById("FILE_CREATE_DATE");
This will return an Element object that will have the timestamp as a child. Here's a link to the library: https://jsoup.org/
This is my first StackOverflow answer so let me know if it helps !
Your id is not an element - it's element attribute.
You should read attribute of your value node, see the javadoc for getAttributeValue method:
http://docs.oracle.com/javase/6/docs/api/javax/xml/stream/XMLStreamReader.html#getAttributeValue(java.lang.String,%20java.lang.String)
Returns the normalized attribute value of the attribute with the
namespace and localName If the namespaceURI is null the namespace is
not checked for equality
So it will be:
String XMLTag = XMLReader.getLocalName();
if(XMLTag.equals("value")) {
String idValue = XMLReader.getAttributeValue(null, "id");
//here idValue will be equal to FILE_CREATE_DATE, LAST_ACCESSED or VERSION_TIMESTAMP
}
try something like
if(XMLReader.getAttributeValue(0).equalIgnorecase("FILE_CREATE_DATE"))
getAttributeValue : Return value of the given index of the attribute. for
<value id="FILE_CREATE_DATE">
id is the first attribute. So XMLReader.getAttributeValue(0)
but before calling this you have to validate whether element has the first attribute. Because all the tags does not have at least 1 attribute.
in jsoup you can query like this
public static void main(String[] args) {
Document doc;
try {
doc = Jsoup.connect("http://www.dropbox.com/public/xml/yourfile.xml").userAgent("Mozilla").get();
//<value id="FILE_CREATE_DATE">
Elements links = doc.select("value[id=FILE_CREATE_DATE]");
for (Element link : links) {
if(link.attr("id").contains("FILE_CREATE_DATE"))//find the link with some texts
{
System.out.println("here is the element you need");
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
XMLInputFactory XMLFactory = XMLInputFactory.newInstance();
XMLStreamReader XMLReader = XMLFactory.createXMLStreamReader(myXMLStream);
while(XMLReader.hasNext())
{
if (XMLReader.getEventType() == XMLStreamReader.START_ELEMENT)
{
String XMLTag = XMLReader.getLocalName();
if(XMLTag.equals("value"))
{
String idValue = XMLReader.getAttributeValue(null, "id");
if (idValue.equals("FILE_CREATE_DATE"))
{
System.out.println(idValue);
XMLReader.nextTag();
System.out.println(XMLReader.getElementText());
}
}
}
XMLReader.next();
}
So this code is the final result of all my anguish on the topic of recovering specific data from a XML data file. I want to thank everyone who helped me out with answers - regardless on if it was what I was looking for they got me thinking and that led to the solution...

how to create OSLC docs using Jena?

I need to create RDF/XML documents containing objects in the OSLC namespace.
e.g.
<oslc_disc:ServiceProviderCatalog
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/terms/"
xmlns:oslc_disc="http://open-services.net/xmlns/discovery/1.0/"
rdf:about="{self}">
<dc:title>{catalog title}</dc:title>
<oslc_disc:details rdf:resource="{catalog details uri}" />
what is the simplest way to create this doc using the Jena API ?
( I know about Lyo, they use a JSP for this doc :-)
Thanks, Carsten
Here's a complete example to start you off. Be aware that this will be equivalent to XML output you want, but may not be identical. The order of properties, for example, may vary, and there are other ways to write the same content.
import com.hp.hpl.jena.rdf.model.*
import com.hp.hpl.jena.vocabulary.DCTerms;
public class Jena {
// Vocab items -- could use schemagen to generate a class for this
final static String OSLC_DISC_NS = "http://open-services.net/xmlns/discovery/1.0/";
final static Resource ServiceProviderCatalog =
ResourceFactory.createResource(OSLC_DISC_NS + "ServiceProviderCatalog");
final static Property details =
ResourceFactory.createProperty(OSLC_DISC_NS, "details");
public static void main(String[] args) {
// Inputs
String selfURI = "http://example.com/self";
String catalogTitle = "Catalog title";
String catalogDetailsURI = "http://example.com/catalogDetailsURI";
// Create in memory model
Model model = ModelFactory.createDefaultModel();
// Set prefixes
model.setNsPrefix("dc", DCTerms.NS);
model.setNsPrefix("oslc_disc", OSLC_DISC_NS);
// Add item of type spcatalog
Resource self = model.createResource(selfURI, ServiceProviderCatalog);
// Add the title
self.addProperty(DCTerms.title, catalogTitle);
// Add details, which points to a resource
self.addProperty(details, model.createResource(catalogDetailsURI));
// Write pretty RDF/XML
model.write(System.out, "RDF/XML-ABBREV");
}
}

I want to pull Facebook posts from a public page to a Java application

I am creating an app in Java that will take all the information from a public website and load it in the app for people to read using jsoup. I was trying the same kind of function with Facebook but it wasn't working the same way. Does anyone have a good idea about how I should go about this?
Thanks,
Calland
public String[] scrapeEvents(String... args) throws Exception {
Document doc = Jsoup.connect("http://www.facebook.com/cedarstreettimes?fref=ts").get();
Elements elements = doc.select("div._wk");
String s = elements.toString();
return s;
}
edit: I found this link of information,but I'm a little confused on how to manipulate it to get me only the content of what the specific user posts on their wall. http://developers.facebook.com/docs/getting-started/graphapi/
I had a look at the source of that page -- the thing that is tripping up the parse is that all the real content is wrapped in comments, like this:
<code class="hidden_elem" id="u_0_42"><!-- <div class="fbTimelineSection ...> --></code>
There is JS on the page that lifts that data into the real DOM, but as jsoup doesn't execute JS it stays as comments. So before extracting the content, we need to emulate that JS and "un-hide" those elements. Here's an example to get you started:
String url = "https://www.facebook.com/cedarstreettimes?fref=ts";
String ua = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.33 (KHTML, like Gecko) Chrome/27.0.1438.7 Safari/537.33";
Document doc = Jsoup.connect(url).userAgent(ua).timeout(10*1000).get();
// move the hidden commented out html into the DOM proper:
Elements hiddenElements = doc.select("code.hidden_elem");
for (Element hidden: hiddenElements) {
for (Node child: hidden.childNodesCopy()) {
if (child instanceof Comment) {
hidden.append(((Comment) child).getData()); // comment data parsed as html
}
}
}
Elements articles = doc.select("div[role=article]");
for (Element article: articles) {
if (article.select("span.userContent").size() > 0) {
String text = article.select("span.userContent").text();
String imgUrl = article.select("div.photo img").attr("abs:src");
System.out.println(String.format("%s\n%s\n\n", text,imgUrl));
}
}
That example pulls out the article text and any photo that is associated with it.
(It's possibly better to use the FB API that this method; I wanted to show how you can emulate little bits of JS to make a scrape work properly.)

Retrieving Xpath from SOAP Message

I want to retrieve all the xpaths from soap message at run time.
For example, if I have a soap message like
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Bodyxmlns:ns1="http://xmlns.oracle.com/TestAppln_jws/TestEmail/TestEmail">
<ns1:process>
<ns1:To></ns1:To>
<ns1:Subject></ns1:Subject>
<ns1:Body></ns1:Body>
</ns1:process>
</soap:Body>
</soap:Envelope>
then the possible xpaths from this soap message are
/soap:Envelope/soap:Body/ns1:process/ns1:To
/soap:Envelope/soap:Body/ns1:process/ns1:Subject
/soap:Envelope/soap:Body/ns1:process/ns1:Body
How can i retrive those with java?
Use the XPath type with a NamespaceContext.
Map<String, String> map = new HashMap<String, String>();
map.put("foo", "http://xmlns.oracle.com/TestAppln_jws/TestEmail/TestEmail");
NamespaceContext context = ...; //TODO: context from map
XPath xpath = ...; //TODO: create instance from factory
xpath.setNamespaceContext(context);
Document doc = ...; //TODO: parse XML
String toValue = xpath.evaluate("//foo:To", doc);
The double forward slash makes this expression match the first To element in the http://xmlns.oracle.com/TestAppln_jws/TestEmail/TestEmail in the given node. It does not matter that I used foo instead of ns1; the prefix mapping needs to match the one in the XPath expression, not the one in the document.
You can find further examples in Java: using XPath with namespaces and implementing NamespaceContext. You can find further examples of working with SOAP here.
Something like this could work:
string[] paths;
function RecurseThroughRequest(string request, string[] paths, string currentPath)
{
Nodes[] nodes = getNodesAtPath(request, currentPath);
//getNodesAtPath is an assumed function which returns a set of
//Node objects representing all the nodes that are children at the current path
foreach(Node n in nodes)
{
if(!n.hasChildren())
{
paths.Add(currentPath + "/" + n.Name);
}
else
{
RecurseThroughRequest(paths, currentPath + "/" + n.Name);
}
}
}
And then call the function with something like this:
string[] paths = new string[];
RecurseThroughRequest(request, paths, "/");
Of course that won't work out of the gates, but I think the theory is there.

How to insert/replace XML tag in XmlDocument?

I have a XmlDocument in java, created with the Weblogic XmlDocument parser.
I want to replace the content of a tag in this XMLDocument with my own data, or insert the tag if it isn't there.
<customdata>
<tag1 />
<tag2>mfkdslmlfkm</tag2>
<location />
<tag3 />
</customdata>
For example I want to insert a URL in the location tag:
<location>http://something</location>
but otherwise leave the XML as is.
Currently I use a XMLCursor:
XmlObject xmlobj = XmlObject.Factory.parse(a.getCustomData(), options);
XmlCursor xmlcur = xmlobj.newCursor();
while (xmlcur.hasNextToken()) {
boolean found = false;
if (xmlcur.isStart() && "schema-location".equals(xmlcur.getName().toString())) {
xmlcur.setTextValue("http://replaced");
System.out.println("replaced");
found = true;
} else if (xmlcur.isStart() && "customdata".equals(xmlcur.getName().toString())) {
xmlcur.push();
} else if (xmlcur.isEnddoc()) {
if (!found) {
xmlcur.pop();
xmlcur.toEndToken();
xmlcur.insertElementWithText("schema-location", "http://inserted");
System.out.println("inserted");
}
}
xmlcur.toNextToken();
}
I tried to find a "quick" xquery way to do this since the XmlDocument has an execQuery method, but didn't find it very easy.
Do anyone have a better way than this? It seems a bit elaborate.
How about an XPath based approach? I like this approach as the logic is super-easy to understand. The code is pretty much self-documenting.
If your xml document is available to you as an org.w3c.dom.Document object (as most parsers return), then you could do something like the following:
// get the list of customdata nodes
NodeList customDataNodeSet = findNodes(document, "//customdata" );
for (int i=0 ; i < customDataNodeSet.getLength() ; i++) {
Node customDataNode = customDataNodeSet.item( i );
// get the location nodes (if any) within this one customdata node
NodeList locationNodeSet = findNodes(customDataNode, "location" );
if (locationNodeSet.getLength() > 0) {
// replace
locationNodeSet.item( 0 ).setTextContent( "http://stackoverflow.com/" );
}
else {
// insert
Element newLocationNode = document.createElement( "location" );
newLocationNode.setTextContent("http://stackoverflow.com/" );
customDataNode.appendChild( newLocationNode );
}
}
And here's the helper method findNodes that does the XPath search.
private NodeList findNodes( Object obj, String xPathString )
throws XPathExpressionException {
XPath xPath = XPathFactory.newInstance().newXPath();
XPathExpression expression = xPath.compile( xPathString );
return (NodeList) expression.evaluate( obj, XPathConstants.NODESET );
}
How about an object oriented approach? You could deserialise the XML to an object, set the location value on the object, then serialise back to XML.
XStream makes this really easy.
For example, you would define the main object, which in your case is CustomData (I'm using public fields to keep the example simple):
public class CustomData {
public String tag1;
public String tag2;
public String location;
public String tag3;
}
Then you initialize XStream:
XStream xstream = new XStream();
// if you need to output the main tag in lowercase, use the following line
xstream.alias("customdata", CustomData.class);
Now you can construct an object from XML, set the location field on the object and regenerate the XML:
CustomData d = (CustomData)xstream.fromXML(xml);
d.location = "http://stackoverflow.com";
xml = xstream.toXML(d);
How does that sound?
If you don't know the schema the XStream solution probably isn't the way to go. At least XStream is on your radar now, might come in handy in the future!
You should be able to do this with query
try
fn:replace(string,pattern,replace)
I am new to xquery myself and I have found it to be a painful query language to work with, but it does work quiet well once you get over the initial learning curve.
I do still wish there was an easier way which was as efficient?

Categories