Parsing fake-xml file in Java - java

I have been working on this problem for quite some time and can't figure it out. There is a given "xml" file that needs to be parsed and displayed on the screen:
<office>
<name>joe</name>
<surname>smith</surname>
<name>bob</name>
<surname>black</surname>
.....
</office>
I've found some great samples of codes on line but they don't seem to work with an xml file that's not set up correctly as this one, so if I'd add a tag I can get my code to work, but the problem is I can't make any changes to the "xml" file.
It is someone else's code I found here that's been modified.
Here is my code with mods:
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Scanner;
import org.w3c.dom.*;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.ParserConfigurationException;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
public class ReadAndPrintXMLFile{
public static void main (String argv []) throws ParserConfigurationException, SAXException, IOException{
try {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse (new File("office.xml"));
// normalize text representation
doc.getDocumentElement ().normalize ();
System.out.println ("Root element of the doc is " +
doc.getDocumentElement().getNodeName() + "\n");
//counts how many times <name> is found in the file
//then the number is used in the for loop below
NodeList listOfTerms = doc.getElementsByTagName("name");
int totalTerms = listOfTerms.getLength();
System.out.println("Total no of terms : " + totalTerms + "\n");
for(int s= 0; s<listOfTerms.getLength() ; s++){
Node firstTermNode = listOfTerms.item(s);
if(firstTermNode.getNodeType() == Node.ELEMENT_NODE){
Element firstTermElement = (Element)firstTermNode;
//-------
NodeList firstWordList = firstTermElement.getElementsByTagName("name");
Element firstWordElement = (Element)firstWordList.item(0);
NodeList textWordList = firstWordElement.getChildNodes();
System.out.println("Name : " +
((Node)textWordList.item(0)).getNodeValue().trim());
//-------
NodeList defList = firstTermElement.getElementsByTagName("surname");
Element defElement = (Element)defList.item(0);
NodeList textDefList = defElement.getChildNodes();
System.out.println("Surname : " +
((Node)textDefList.item(0)).getNodeValue().trim());
}//end of if clause
}//end of for loop with s var
}catch (SAXParseException err) {
System.out.println ("** Parsing error" + ", line "
+ err.getLineNumber () + ", uri " + err.getSystemId ());
System.out.println(" " + err.getMessage ());
}catch (SAXException e) {
Exception x = e.getException ();
((x == null) ? e : x).printStackTrace ();
}catch (Throwable t) {
t.printStackTrace ();
}
//System.exit (0);
}//end of main
}
The error message I get is this:
java.lang.NullPointerException
at Data.main(Data.java:45) //maybe a different line in the code for you.
If I use the root of the document for counter it prints the result once, for some reason getChildNodes() is not working correctly.

I notice that you do a .getElementsByTagName("name") twice. Are you expecting <name> tags within <name> ? If not then that is most likely the cause of your error, since the second time, it would return an empty list and will cause a NullPointerException when you try to reference firstWordElement
You can't obtain the 'surname' from 'name' list which is what you are doing in the for loop. Get them in separate steps, so to fetch the 'name' elements:
NodeList listOfTerms = doc.getElementsByTagName("name");
int totalTerms = listOfTerms.getLength();
System.out.println("Total no of terms : " + totalTerms + "\n");
for(int s= 0; s<listOfTerms.getLength() ; s++){
Node firstTermNode = listOfTerms.item(s);
if(firstTermNode.getNodeType() == Node.ELEMENT_NODE){
Element firstTermElement = (Element)firstTermNode;
System.out.println(firstTermElement.getTextContent());
}//end of if clause
}//end of for loop with s var
and then to fetch the surname, just vary the tagname
listOfTerms = doc.getElementsByTagName("surname");
totalTerms = listOfTerms.getLength();
System.out.println("Total no of terms : " + totalTerms + "\n");
for(int s= 0; s<listOfTerms.getLength() ; s++){
Node firstTermNode = listOfTerms.item(s);
if(firstTermNode.getNodeType() == Node.ELEMENT_NODE){
Element firstTermElement = (Element)firstTermNode;
System.out.println(firstTermElement.getTextContent());
}//end of if clause
}//end of for loop with s var
Hope that helps.

Related

Reading an XML file in Java

I am reading specific information from an XML File. I am having issues trying to read some elements like the DataDate. I am getting a NullPointerException. I think it happens because in the XML file there are two nodes with the word "Project" and the first one does not have a DataDate.
I do not know how to fix this error.
This is a part of the XML File that I am reading:
package testReadXML;
import java.io.File;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
public class TestReadXML {
public static void main(String[] args) {
try {
File xmlFile = new File("C:/Users/diani/Downloads/XML Files/CS01.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
doc.getDocumentElement().normalize();
System.out.println("Root element:" + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("Project");
for (int i = 0; i < nList.getLength(); i ++) {
Node nNode = nList.item(i);
System.out.println("\n" + nNode.getNodeName());
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println("Object Id : " + eElement.getAttribute("ObjectId"));
System.out.println("Id : " + eElement.getElementsByTagName("Id").item(0).getTextContent());
System.out.println("Name : " + eElement.getElementsByTagName("Name").item(0).getTextContent());
System.out.println("Data Date : " + eElement.getElementsByTagName("DataDate").item(0).getTextContent());
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
Just add this If condition when you are fetching the elements
if(eElement.getElementsByTagName("DataDate").getLength() > 0) {
System.out.println("Object Id : " + eElement.getAttribute("ObjectId"));
System.out.println("Id : " + eElement.getElementsByTagName("Id").item(0).getTextContent());
System.out.println("Name : " + eElement.getElementsByTagName("Name").item(0).getTextContent());
System.out.println("Data Date : " + eElement.getElementsByTagName("DataDate").item(0).getTextContent());
}

Finding a rare attribute

I am trying to read in data from an XML file with dbFactory, and am struggling with finding an attribute that is not always there ("image") that can be seen in step5 and then not in step6 from the data below:
here is some data from the file
<screen>
<screenID>step_5</screenID>
<video>/video/Task 5 - Open Word.mp4</video>
<vid_caption>Task 5 - Open Word</vid_caption>
<image>/shared_images/word_icon.png</image>
</screen>
<screen>
<screenID>step_6</screenID>
<video>/video/Task 6 - How to open MS Word.mp4</video>
<vid_caption>Task 6 - How to open MS Word</vid_caption>
</screen>
I have tried with streams and this is my last, I feel that I am missing something simple, below is the program that I have created for it
//imports for XML readers
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import java.io.File;
import java.util.ArrayList;
import java.util.List;
public class ReadXML
{
public static void main(String args[])
{
//try to read in the file
try
{
//create the file to read in
//File XmlFile = new File("Documents\\University\\2017\\Courses\\Second Semester\\CSC3003S\\Capstone\\Program\\Capstone-master\\elearnerselfstudy.xml");
File XmlFile = new File("elearnerselfstudy.txt");
//Defines a factory API that enables applications to obtain a parser that produces DOM object trees from XML documents.
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document document = dBuilder.parse(XmlFile);
//need to normalize - read the stack overflow
document.getDocumentElement().normalize();
//print out root for testing purposes
System.out.println("The root element is :" + document.getDocumentElement().getNodeName() + "\n");
//list of lessons - it is reading the elements into the list fine
NodeList nLessonList = document.getElementsByTagName("lesson");
System.out.println("I have the lesson list ready");
System.out.println("The length of the lessonList is: " + nLessonList.getLength()+"\n");
//list for the screens - it is reading the elements into the list fine, the error is somewhere else
NodeList nScreenList = document.getElementsByTagName("screen");
System.out.println("I have the screen list ready");
System.out.println("The length of the screenList is: " + nScreenList.getLength()+ "\n");
//lesson list iteration
for (int temp = 0; temp < nLessonList.getLength(); temp++)
{
Node nNode = nLessonList.item(temp);
System.out.println("\nCurrent Element :" + nNode.getNodeName());
if (nNode.getNodeType() == Node.ELEMENT_NODE)
{
//System.out.println("Are we even insdie the if bro - we definitely penetrate the if");
//System.out.println("Still not a good enough reason to use the word penetrate" + "\n");
Element eElement = (Element) nNode;
//System.out.println("Lesson : " + eElement.getAttribute("lesson"));
System.out.println("Lesson : " + eElement.getAttribute("lesson_title"));
System.out.println("Lesson ID : " + eElement.getAttribute("lesson_id"));
System.out.println("Lesson Type : " + eElement.getAttribute("lesson_type"));
}//end if
}//end for loop through tree
//screen list iteration
for (int temp = 0; temp < nScreenList.getLength(); temp++)
{
Node nNode2 = nScreenList.item(temp);
System.out.println("\nCurrent Element :" + nNode2.getNodeName());
if (nNode2.getNodeType() == Node.ELEMENT_NODE)
{
Element eElement = (Element) nNode2;
//return elements
//System.out.println("Screen : " + eElement.getAttribute("screen"));
System.out.println("ScreenId is : " + eElement.getElementsByTagName("screenID").item(0).getTextContent());
System.out.println("Video is : " + eElement.getElementsByTagName("video").item(0).getTextContent());
System.out.println("Video Caption is : " + eElement.getElementsByTagName("vid_caption").item(0).getTextContent());
if (eElement.getAttributeNode("image")!=null)//(eElement.hasAttribute("image")==true)
{
System.out.println("Image is : " + eElement.getElementsByTagName("image").item(0).getTextContent());
}
}//end if
}//end for list iteration
}//end try
//catch
catch (Exception e)
{
e.printStackTrace();
}//end catch
}//end main
}//end class
//get the image path if it is there
if (eElement.getElementsByTagName("image").item(0)!=null)
{
System.out.println("Image path is : " +eElement.getElementsByTagName("image").item(0).getTextContent());
}

parsing Xml with NodeList and DocumentBuilder and getting values

I've followed almost all of the SO questions and answers, some make sense while others don't, i'm able to get some of my xml working some of the time. At this point I have a hobbled gob of nothing.
Below is my xml that i am trying work with
<GET_GUESS_CHART>
<sort_by_letter>
<letter_row>
<letter>A</letter>
<guess>16</guess>
<rank>3</rank>
</letter_row>
<letter_row>
<letter>B</letter>
<guess>5</guess>
<rank>1</rank>
</letter_row>
</sort_by_letter>
<sort_by_rank>
<letter_row>
<letter>A</letter>
<guess>16</guess>
<rank>1</rank>
</letter_row>
<letter_row>
<letter>E</letter>
<guess>15</guess>
<rank>2</rank>
</letter_row>
</sort_by_rank>
</GET_GUESS_CHART>
I want to loop through the document and loop through 'sort_by_letters' and 'sort_by_rank' and get values for each 'letter_row'.
Here is how i get the document:
URL url = new URL(Url[0]);
DocumentBuilderFactory dbf = DocumentBuilderFactory
.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
// Download the XML file
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();
I'm able to actually get the document, but for the life of me can not figure how to work it to get what i need.
All you need to do is to walk the DOM tree...
import java.io.ByteArrayInputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class ReadXML {
private static final String XML = "<?xml version=\"1.0\"?>\n"
+ "<GET_GUESS_CHART>"
+ " <sort_by_letter>"
+ " <letter_row>"
+ " <letter>A</letter>"
+ " <guess>16</guess>"
+ " <rank>3</rank>"
+ " </letter_row>"
+ " <letter_row>" +
+ " <letter>B</letter>"
+ " <guess>5</guess>"
+ " <rank>1</rank>"
+ " </letter_row>"
+ " </sort_by_letter>"
+ " <sort_by_rank>"
+ " <letter_row>"
+ " <letter>A</letter>"
+ " <guess>16</guess>"
+ " <rank>1</rank>"
+ " </letter_row>"
+ " <letter_row>"
+ " <letter>E</letter>"
+ " <guess>15</guess>"
+ " <rank>2</rank>"
+ " </letter_row>"
+ " </sort_by_rank>"
+ "</GET_GUESS_CHART>";
public static void main(String[] args) throws Exception {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(new ByteArrayInputStream(XML.getBytes()));
NodeList rootElement = doc.getChildNodes();
NodeList sortByNodes = rootElement.item(0).getChildNodes();
for (int k = 0; k < sortByNodes.getLength(); k++) {
Node sortBy = sortByNodes.item(k);
System.out.println("SORT BY: " + sortBy.getNodeName());
NodeList letterRows = sortBy.getChildNodes();
for (int j = 0; j < letterRows.getLength(); j++) {
Node letterRow = letterRows.item(j);
NodeList letterRowDetails = letterRow.getChildNodes();
if (letterRowDetails.getLength() > 0) {
String letter = null;
String guess = null;
String rank = null;
for (int i = 0; i < letterRowDetails.getLength(); i++) {
Node detail = letterRowDetails.item(i);
if (detail.getNodeName().equals("letter")) {
letter = detail.getTextContent();
} else if (detail.getNodeName().equals("guess")) {
guess = detail.getTextContent();
} else if (detail.getNodeName().equals("rank")) {
rank = detail.getTextContent();
}
}
System.out.println("Letter=" + letter + ", guess=" + guess + ", rank=" + rank);
}
}
}
}
}
(You'll probably build an object and add it to some result list instead of the System.out line...)
OUTPUT:
SORT BY: #text
SORT BY: sort_by_letter
Letter=A, guess=16, rank=3
Letter=B, guess=5, rank=1
SORT BY: #text
SORT BY: sort_by_rank
Letter=A, guess=16, rank=1
Letter=E, guess=15, rank=2
To answer the comment: if you wanted to JUST get the "sort_by_letter" XML elements, you can add an extra if clause here...
...
for (int k = 0; k < sortByNodes.getLength(); k++) {
Node sortBy = sortByNodes.item(k);
if(sortBy.getNodeName().equals("sort_by_letter")) {
System.out.println("SORT BY: " + sortBy.getNodeName());
...

How to create POJO class for xml parse in java?

I got exact output what I need, but I have to use POJO class for my program, I searched lot for issue, but I didnt get clear idea.Help me to solve this issue,thanks in advance, my coding for xml pharse in java given below.
Coding for ReadAndPrintXMLFile::
import org.w3c.dom.*;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import java.net.URL;
import java.io.InputStream;
public class ReadAndPrintXMLFile{
public static void main (String argv []){
try {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
URL url = new URL("http://xxxxxxxxxxxxxxxx");
InputStream stream = url.openStream();
Document doc = docBuilder.parse(stream);
// normalize text representation
doc.getDocumentElement ().normalize ();
System.out.println ("Root element of the doc is " +
doc.getDocumentElement().getNodeName());
NodeList listOfPersons = doc.getElementsByTagName("head");
int totalPersons = listOfPersons.getLength();
System.out.println("Total no of head : " + totalPersons);
for(int s=0; s<listOfPersons.getLength() ; s++){
Node firstPersonNode = listOfPersons.item(s);
if(firstPersonNode.getNodeType() == Node.ELEMENT_NODE){
Element firstPersonElement = (Element)firstPersonNode;
//-------
NodeList firstNameList = firstPersonElement.getElementsByTagName("heading");
Element firstNameElement = (Element)firstNameList.item(0);
NodeList textFNList = firstNameElement.getChildNodes();
System.out.println("Heading : " +
((Node)textFNList.item(0)).getNodeValue().trim());
}//end of if clause
}//end of for loop with s var
}catch (SAXParseException err) {
System.out.println ("** Parsing error" + ", line "
+ err.getLineNumber () + ", uri " + err.getSystemId ());
System.out.println(" " + err.getMessage ());
}catch (SAXException e) {
Exception x = e.getException ();
((x == null) ? e : x).printStackTrace ();
}catch (Throwable t) {
t.printStackTrace ();
}
//System.exit (0);
}//end of main
}
For this xml parse program I have to POJO class, .So For this I create a class like this
public class POJOurl {
private String heading;
public String getHeading() {
return heading;
}
public void setHeading(String heading) {
this.heading = heading;
}
}
Here I dont know how to use thse get and set method in my program, using these public String getHeading() and public void setHeading(String heading) I have to execute the program.. and I have to get the output what I am getting now.. only thing is I have to use POJO class for this program
Output::
Root element of the doc is root1
Total no of head : 4
Heading : Appliance Repairs
Heading : Air conditioning and refrigeration services
Heading : Accountants
Heading : Accident Management
I would create a class Person with the attributes that you need, e.g. firstName.

creating RDF triple and RDF store using jena from xml file

I use the following java program to extract information from an xml file.
import java.io.File;
import java.net.URL;
import org.w3c.dom.*;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
public class ExtractInfo {
public static void main(String argv []) {
try {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
File file = new File("page.xml");
Document doc = docBuilder.parse(file);
// normalize text representation
doc.getDocumentElement().normalize();
System.out.println ("Root element of the doc is " +
doc.getDocumentElement().getNodeName());
NodeList listOfPersons = doc.getElementsByTagName("person");
int totalPersons = listOfPersons.getLength();
System.out.println("Total no of people : " + totalPersons);
for (int s=0; s<listOfPersons.getLength(); s++) {
Node firstPersonNode = listOfPersons.item(s);
if (firstPersonNode.getNodeType() == Node.ELEMENT_NODE) {
Element firstPersonElement = (Element)firstPersonNode;
//-------
NodeList firstNameList = firstPersonElement.getElementsByTagName("first");
Element firstNameElement = (Element)firstNameList.item(0);
NodeList textFNList = firstNameElement.getChildNodes();
System.out.println("First Name : " +
((Node)textFNList.item(0)).getNodeValue().trim());
//-------
NodeList lastNameList = firstPersonElement.getElementsByTagName("last");
Element lastNameElement = (Element)lastNameList.item(0);
NodeList textLNList = lastNameElement.getChildNodes();
System.out.println("Last Name : " +
((Node)textLNList.item(0)).getNodeValue().trim());
//----
NodeList ageList = firstPersonElement.getElementsByTagName("age");
Element ageElement = (Element)ageList.item(0);
NodeList textAgeList = ageElement.getChildNodes();
System.out.println("Age : " +
((Node)textAgeList.item(0)).getNodeValue().trim());
}
}
} catch (SAXParseException err) {
System.out.println ("** Parsing error" + ", line "
+ err.getLineNumber () + ", uri " + err.getSystemId());
System.out.println(" " + err.getMessage());
} catch (SAXException e) {
Exception x = e.getException ();
((x == null) ? e : x).printStackTrace();
} catch (Throwable t) {
t.printStackTrace();
}
}
}
Could some one please help me in generating RDF triples from the extracted information and create a triple store using Jena containing all the triples. I am quite new to RDF, and Jena, So I do need your help guys ,
Thanks in advance.
Resource resource=OntModel.createResourc(NameSpace+"Doutorado_em_Engenharia_de_Sistemas_e_Computacao");
Property prop=OntModel.createProperty(http://www.owl-ontologies.com/OntologyBase.owl#program_Provided_By);
Resource obj=OntModel.createResource(NameSpace+"Universidade_X");
OntMode.add(resource,prop,obj);
Before applying it, you should first create an instance of OntModel for your ontology.
http://answers.semanticweb.com/questions/11084/add-triples-in-an-ontology-using-jena-api

Categories