I use the following java program to extract information from an xml file.
import java.io.File;
import java.net.URL;
import org.w3c.dom.*;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
public class ExtractInfo {
public static void main(String argv []) {
try {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
File file = new File("page.xml");
Document doc = docBuilder.parse(file);
// normalize text representation
doc.getDocumentElement().normalize();
System.out.println ("Root element of the doc is " +
doc.getDocumentElement().getNodeName());
NodeList listOfPersons = doc.getElementsByTagName("person");
int totalPersons = listOfPersons.getLength();
System.out.println("Total no of people : " + totalPersons);
for (int s=0; s<listOfPersons.getLength(); s++) {
Node firstPersonNode = listOfPersons.item(s);
if (firstPersonNode.getNodeType() == Node.ELEMENT_NODE) {
Element firstPersonElement = (Element)firstPersonNode;
//-------
NodeList firstNameList = firstPersonElement.getElementsByTagName("first");
Element firstNameElement = (Element)firstNameList.item(0);
NodeList textFNList = firstNameElement.getChildNodes();
System.out.println("First Name : " +
((Node)textFNList.item(0)).getNodeValue().trim());
//-------
NodeList lastNameList = firstPersonElement.getElementsByTagName("last");
Element lastNameElement = (Element)lastNameList.item(0);
NodeList textLNList = lastNameElement.getChildNodes();
System.out.println("Last Name : " +
((Node)textLNList.item(0)).getNodeValue().trim());
//----
NodeList ageList = firstPersonElement.getElementsByTagName("age");
Element ageElement = (Element)ageList.item(0);
NodeList textAgeList = ageElement.getChildNodes();
System.out.println("Age : " +
((Node)textAgeList.item(0)).getNodeValue().trim());
}
}
} catch (SAXParseException err) {
System.out.println ("** Parsing error" + ", line "
+ err.getLineNumber () + ", uri " + err.getSystemId());
System.out.println(" " + err.getMessage());
} catch (SAXException e) {
Exception x = e.getException ();
((x == null) ? e : x).printStackTrace();
} catch (Throwable t) {
t.printStackTrace();
}
}
}
Could some one please help me in generating RDF triples from the extracted information and create a triple store using Jena containing all the triples. I am quite new to RDF, and Jena, So I do need your help guys ,
Thanks in advance.
Resource resource=OntModel.createResourc(NameSpace+"Doutorado_em_Engenharia_de_Sistemas_e_Computacao");
Property prop=OntModel.createProperty(http://www.owl-ontologies.com/OntologyBase.owl#program_Provided_By);
Resource obj=OntModel.createResource(NameSpace+"Universidade_X");
OntMode.add(resource,prop,obj);
Before applying it, you should first create an instance of OntModel for your ontology.
http://answers.semanticweb.com/questions/11084/add-triples-in-an-ontology-using-jena-api
Related
I am reading specific information from an XML File. I am having issues trying to read some elements like the DataDate. I am getting a NullPointerException. I think it happens because in the XML file there are two nodes with the word "Project" and the first one does not have a DataDate.
I do not know how to fix this error.
This is a part of the XML File that I am reading:
package testReadXML;
import java.io.File;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
public class TestReadXML {
public static void main(String[] args) {
try {
File xmlFile = new File("C:/Users/diani/Downloads/XML Files/CS01.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
doc.getDocumentElement().normalize();
System.out.println("Root element:" + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("Project");
for (int i = 0; i < nList.getLength(); i ++) {
Node nNode = nList.item(i);
System.out.println("\n" + nNode.getNodeName());
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println("Object Id : " + eElement.getAttribute("ObjectId"));
System.out.println("Id : " + eElement.getElementsByTagName("Id").item(0).getTextContent());
System.out.println("Name : " + eElement.getElementsByTagName("Name").item(0).getTextContent());
System.out.println("Data Date : " + eElement.getElementsByTagName("DataDate").item(0).getTextContent());
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
Just add this If condition when you are fetching the elements
if(eElement.getElementsByTagName("DataDate").getLength() > 0) {
System.out.println("Object Id : " + eElement.getAttribute("ObjectId"));
System.out.println("Id : " + eElement.getElementsByTagName("Id").item(0).getTextContent());
System.out.println("Name : " + eElement.getElementsByTagName("Name").item(0).getTextContent());
System.out.println("Data Date : " + eElement.getElementsByTagName("DataDate").item(0).getTextContent());
}
I am trying to read in data from an XML file with dbFactory, and am struggling with finding an attribute that is not always there ("image") that can be seen in step5 and then not in step6 from the data below:
here is some data from the file
<screen>
<screenID>step_5</screenID>
<video>/video/Task 5 - Open Word.mp4</video>
<vid_caption>Task 5 - Open Word</vid_caption>
<image>/shared_images/word_icon.png</image>
</screen>
<screen>
<screenID>step_6</screenID>
<video>/video/Task 6 - How to open MS Word.mp4</video>
<vid_caption>Task 6 - How to open MS Word</vid_caption>
</screen>
I have tried with streams and this is my last, I feel that I am missing something simple, below is the program that I have created for it
//imports for XML readers
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import java.io.File;
import java.util.ArrayList;
import java.util.List;
public class ReadXML
{
public static void main(String args[])
{
//try to read in the file
try
{
//create the file to read in
//File XmlFile = new File("Documents\\University\\2017\\Courses\\Second Semester\\CSC3003S\\Capstone\\Program\\Capstone-master\\elearnerselfstudy.xml");
File XmlFile = new File("elearnerselfstudy.txt");
//Defines a factory API that enables applications to obtain a parser that produces DOM object trees from XML documents.
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document document = dBuilder.parse(XmlFile);
//need to normalize - read the stack overflow
document.getDocumentElement().normalize();
//print out root for testing purposes
System.out.println("The root element is :" + document.getDocumentElement().getNodeName() + "\n");
//list of lessons - it is reading the elements into the list fine
NodeList nLessonList = document.getElementsByTagName("lesson");
System.out.println("I have the lesson list ready");
System.out.println("The length of the lessonList is: " + nLessonList.getLength()+"\n");
//list for the screens - it is reading the elements into the list fine, the error is somewhere else
NodeList nScreenList = document.getElementsByTagName("screen");
System.out.println("I have the screen list ready");
System.out.println("The length of the screenList is: " + nScreenList.getLength()+ "\n");
//lesson list iteration
for (int temp = 0; temp < nLessonList.getLength(); temp++)
{
Node nNode = nLessonList.item(temp);
System.out.println("\nCurrent Element :" + nNode.getNodeName());
if (nNode.getNodeType() == Node.ELEMENT_NODE)
{
//System.out.println("Are we even insdie the if bro - we definitely penetrate the if");
//System.out.println("Still not a good enough reason to use the word penetrate" + "\n");
Element eElement = (Element) nNode;
//System.out.println("Lesson : " + eElement.getAttribute("lesson"));
System.out.println("Lesson : " + eElement.getAttribute("lesson_title"));
System.out.println("Lesson ID : " + eElement.getAttribute("lesson_id"));
System.out.println("Lesson Type : " + eElement.getAttribute("lesson_type"));
}//end if
}//end for loop through tree
//screen list iteration
for (int temp = 0; temp < nScreenList.getLength(); temp++)
{
Node nNode2 = nScreenList.item(temp);
System.out.println("\nCurrent Element :" + nNode2.getNodeName());
if (nNode2.getNodeType() == Node.ELEMENT_NODE)
{
Element eElement = (Element) nNode2;
//return elements
//System.out.println("Screen : " + eElement.getAttribute("screen"));
System.out.println("ScreenId is : " + eElement.getElementsByTagName("screenID").item(0).getTextContent());
System.out.println("Video is : " + eElement.getElementsByTagName("video").item(0).getTextContent());
System.out.println("Video Caption is : " + eElement.getElementsByTagName("vid_caption").item(0).getTextContent());
if (eElement.getAttributeNode("image")!=null)//(eElement.hasAttribute("image")==true)
{
System.out.println("Image is : " + eElement.getElementsByTagName("image").item(0).getTextContent());
}
}//end if
}//end for list iteration
}//end try
//catch
catch (Exception e)
{
e.printStackTrace();
}//end catch
}//end main
}//end class
//get the image path if it is there
if (eElement.getElementsByTagName("image").item(0)!=null)
{
System.out.println("Image path is : " +eElement.getElementsByTagName("image").item(0).getTextContent());
}
If I copy and paste the xml from this site into a xml file I can parse it with java
http://api.indeed.com/ads/apisearch?publisher=8397709210207872&q=java&l=austin%2C+tx&sort&radius&st&jt&start&limit&fromage&filter&latlong=1&chnl&userip=1.2.3.4&v=2
However, I want to parse it directly from a webpage if possible!
Here's my current code:
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import org.xml.sax.SAXException;
import java.io.File;
import java.io.IOException;
public class XMLParser {
public void readXML(String parse) {
File xml = new File(parse);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder;
try {
dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xml);
// System.out.println("Root element :"
// + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("result");
System.out.println("----------------------------");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
// System.out.println("\nCurrent Element :" +
nNode.getNodeName());
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println("job title : "
+
eElement.getElementsByTagName("jobtitle").item(0)
.getTextContent());;
System.out.println("Company: "
+
eElement.getElementsByTagName("company")
.item(0).getTextContent());
System.out.println("City : "
+
eElement.getElementsByTagName("city").item(0)
.getTextContent());
System.out.println("State : "
+
eElement.getElementsByTagName("state").item(0)
.getTextContent());
System.out.println("Country : "
+
eElement.getElementsByTagName("country").item(0)
.getTextContent());
System.out.println("Date posted : "
+
eElement.getElementsByTagName("date").item(0)
.getTextContent());
System.out.println("Job summary : "
+
eElement.getElementsByTagName("snippet").item(0)
.getTextContent());
System.out.println("Latitude : "
+
eElement.getElementsByTagName("latitude").item(0).getTextContent());
System.out.println("longitude : "
+
eElement.getElementsByTagName("longitude").item(0).getTextContent());
}
}
} catch (ParserConfigurationException | SAXException | IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static void main(String[] args) {
new XMLParser().readXML("test.xml");
}
}
any help would be appreciated.
Give it the URI instead of the XML. It will download it for you.
Document doc = dBuilder.parse(uriString)
Please find the code snippet like this
String url = "http://api.indeed.com/ads/apisearch?publisher=8397709210207872&q=java&l=austin%2C+tx&sort&radius&st&jt&start&limit&fromage&filter&latlong=1&chnl&userip=1.2.3.4&v=2";
try
{
DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
DocumentBuilder b = f.newDocumentBuilder();
Document doc = b.parse(url);
}
you need to have the element/nodes you want in a for loop. So it can scan through xml file, and find the right node you searching for.
reads the xml file as a string, and creates a xml structure
builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(connection.getInputStream());
NodeList nodes = doc.getElementsByTagName("mode");
for (int i = 0; i < nodes.getLength(); i++)
Element element = (Element) nodes.item(i);
//Gets tag from XML and it´s content
NodeList nodeMode = element.getElementsByTagName("mode");
Element elemMode = (Element) nodeMode.item(0);
and after if you want to pick out a value and parse to an int or what you want you do like this:
int currentMode = Integer.parseInt(elemMode.getFirstChild().getTextContent());
That's how I parsed data directly from url http://www.nbp.pl/kursy/xml/+something
static class Kurs {
public float kurs_sprzedazy;
public float kurs_kupna;
}
private static DocumentBuilder dBuilder;
private static Kurs getData(String filename, String currency) throws Exception {
Document doc = dBuilder.parse("http://www.nbp.pl/kursy/xml/"+filename+".xml");
doc.getDocumentElement().normalize();
NodeList nList = doc.getElementsByTagName("pozycja");
for(int i = 0; i < nList.getLength(); i++) {
Element nNode = (Element)nList.item(i);
if(nNode.getElementsByTagName("kod_waluty").item(0).getTextContent().equals(currency)) {
Kurs kurs = new Kurs();
String data = nNode.getElementsByTagName("kurs_sprzedazy").item(0).getTextContent();
data = data.replace(',', '.');
kurs.kurs_sprzedazy = Float.parseFloat(data);
data = nNode.getElementsByTagName("kurs_kupna").item(0).getTextContent();
data = data.replace(',', '.');
kurs.kurs_kupna = Float.parseFloat(data);
return kurs;
}
}
return null;
}
I got exact output what I need, but I have to use POJO class for my program, I searched lot for issue, but I didnt get clear idea.Help me to solve this issue,thanks in advance, my coding for xml pharse in java given below.
Coding for ReadAndPrintXMLFile::
import org.w3c.dom.*;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import java.net.URL;
import java.io.InputStream;
public class ReadAndPrintXMLFile{
public static void main (String argv []){
try {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
URL url = new URL("http://xxxxxxxxxxxxxxxx");
InputStream stream = url.openStream();
Document doc = docBuilder.parse(stream);
// normalize text representation
doc.getDocumentElement ().normalize ();
System.out.println ("Root element of the doc is " +
doc.getDocumentElement().getNodeName());
NodeList listOfPersons = doc.getElementsByTagName("head");
int totalPersons = listOfPersons.getLength();
System.out.println("Total no of head : " + totalPersons);
for(int s=0; s<listOfPersons.getLength() ; s++){
Node firstPersonNode = listOfPersons.item(s);
if(firstPersonNode.getNodeType() == Node.ELEMENT_NODE){
Element firstPersonElement = (Element)firstPersonNode;
//-------
NodeList firstNameList = firstPersonElement.getElementsByTagName("heading");
Element firstNameElement = (Element)firstNameList.item(0);
NodeList textFNList = firstNameElement.getChildNodes();
System.out.println("Heading : " +
((Node)textFNList.item(0)).getNodeValue().trim());
}//end of if clause
}//end of for loop with s var
}catch (SAXParseException err) {
System.out.println ("** Parsing error" + ", line "
+ err.getLineNumber () + ", uri " + err.getSystemId ());
System.out.println(" " + err.getMessage ());
}catch (SAXException e) {
Exception x = e.getException ();
((x == null) ? e : x).printStackTrace ();
}catch (Throwable t) {
t.printStackTrace ();
}
//System.exit (0);
}//end of main
}
For this xml parse program I have to POJO class, .So For this I create a class like this
public class POJOurl {
private String heading;
public String getHeading() {
return heading;
}
public void setHeading(String heading) {
this.heading = heading;
}
}
Here I dont know how to use thse get and set method in my program, using these public String getHeading() and public void setHeading(String heading) I have to execute the program.. and I have to get the output what I am getting now.. only thing is I have to use POJO class for this program
Output::
Root element of the doc is root1
Total no of head : 4
Heading : Appliance Repairs
Heading : Air conditioning and refrigeration services
Heading : Accountants
Heading : Accident Management
I would create a class Person with the attributes that you need, e.g. firstName.
I have been working on this problem for quite some time and can't figure it out. There is a given "xml" file that needs to be parsed and displayed on the screen:
<office>
<name>joe</name>
<surname>smith</surname>
<name>bob</name>
<surname>black</surname>
.....
</office>
I've found some great samples of codes on line but they don't seem to work with an xml file that's not set up correctly as this one, so if I'd add a tag I can get my code to work, but the problem is I can't make any changes to the "xml" file.
It is someone else's code I found here that's been modified.
Here is my code with mods:
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Scanner;
import org.w3c.dom.*;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.ParserConfigurationException;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
public class ReadAndPrintXMLFile{
public static void main (String argv []) throws ParserConfigurationException, SAXException, IOException{
try {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse (new File("office.xml"));
// normalize text representation
doc.getDocumentElement ().normalize ();
System.out.println ("Root element of the doc is " +
doc.getDocumentElement().getNodeName() + "\n");
//counts how many times <name> is found in the file
//then the number is used in the for loop below
NodeList listOfTerms = doc.getElementsByTagName("name");
int totalTerms = listOfTerms.getLength();
System.out.println("Total no of terms : " + totalTerms + "\n");
for(int s= 0; s<listOfTerms.getLength() ; s++){
Node firstTermNode = listOfTerms.item(s);
if(firstTermNode.getNodeType() == Node.ELEMENT_NODE){
Element firstTermElement = (Element)firstTermNode;
//-------
NodeList firstWordList = firstTermElement.getElementsByTagName("name");
Element firstWordElement = (Element)firstWordList.item(0);
NodeList textWordList = firstWordElement.getChildNodes();
System.out.println("Name : " +
((Node)textWordList.item(0)).getNodeValue().trim());
//-------
NodeList defList = firstTermElement.getElementsByTagName("surname");
Element defElement = (Element)defList.item(0);
NodeList textDefList = defElement.getChildNodes();
System.out.println("Surname : " +
((Node)textDefList.item(0)).getNodeValue().trim());
}//end of if clause
}//end of for loop with s var
}catch (SAXParseException err) {
System.out.println ("** Parsing error" + ", line "
+ err.getLineNumber () + ", uri " + err.getSystemId ());
System.out.println(" " + err.getMessage ());
}catch (SAXException e) {
Exception x = e.getException ();
((x == null) ? e : x).printStackTrace ();
}catch (Throwable t) {
t.printStackTrace ();
}
//System.exit (0);
}//end of main
}
The error message I get is this:
java.lang.NullPointerException
at Data.main(Data.java:45) //maybe a different line in the code for you.
If I use the root of the document for counter it prints the result once, for some reason getChildNodes() is not working correctly.
I notice that you do a .getElementsByTagName("name") twice. Are you expecting <name> tags within <name> ? If not then that is most likely the cause of your error, since the second time, it would return an empty list and will cause a NullPointerException when you try to reference firstWordElement
You can't obtain the 'surname' from 'name' list which is what you are doing in the for loop. Get them in separate steps, so to fetch the 'name' elements:
NodeList listOfTerms = doc.getElementsByTagName("name");
int totalTerms = listOfTerms.getLength();
System.out.println("Total no of terms : " + totalTerms + "\n");
for(int s= 0; s<listOfTerms.getLength() ; s++){
Node firstTermNode = listOfTerms.item(s);
if(firstTermNode.getNodeType() == Node.ELEMENT_NODE){
Element firstTermElement = (Element)firstTermNode;
System.out.println(firstTermElement.getTextContent());
}//end of if clause
}//end of for loop with s var
and then to fetch the surname, just vary the tagname
listOfTerms = doc.getElementsByTagName("surname");
totalTerms = listOfTerms.getLength();
System.out.println("Total no of terms : " + totalTerms + "\n");
for(int s= 0; s<listOfTerms.getLength() ; s++){
Node firstTermNode = listOfTerms.item(s);
if(firstTermNode.getNodeType() == Node.ELEMENT_NODE){
Element firstTermElement = (Element)firstTermNode;
System.out.println(firstTermElement.getTextContent());
}//end of if clause
}//end of for loop with s var
Hope that helps.