Export certain details of XML file with Java

Export certain details of XML file with Java - java

I have an XML file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dblp SYSTEM "dblp-2019-11-22.dtd">
<dblp>
<phdthesis mdate="2016-05-04" key="phd/dk/Heine2010">
<author>Carmen Heine</author>
<title>Modell zur Produktion von Online-Hilfen.</title>
<year>2010</year>
<school>Aarhus University</school>
<pages>1-315</pages>
<isbn>978-3-86596-263-8</isbn>
<ee>http://d-nb.info/996064095</ee>
</phdthesis><phdthesis mdate="2020-02-12" key="phd/Hoff2002">
.
. (continues with the same tags for a lot of other books)
From that XML file I'm trying to export the details from the tag "year" in order to count how many books have been published each year. I tried a lot of implementation for that purpose but none of them seems to be working.
Code I've written until now:
public class Publications {
public static void main(String[] args) {
try
{
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler()
{
boolean year = false;
//parser starts parsing a specific element inside the document
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException
{
System.out.println("Start Element :" + qName);
if(qName.equalsIgnoreCase("Year"))
{
year=true;
}
}
//parser ends parsing the specific element inside the document
public void endElement(String uri, String localName, String qName) throws SAXException
{
System.out.println("End Element:" + qName);
}
//reads the text value of the currently parsed element
public void characters(char ch[], int start, int length) throws SAXException
{
if (year)
{
System.out.println("Year : " + new String(ch, start, length));
year = false;
}
}
};
saxParser.parse("dblp-2020-04-01.xml", handler);
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
I also adding the exceptions I get:
java.io.FileNotFoundException: C:\Users\Deray\DataAnalysis\dblp-2020-04-01.dtd
at java.base/java.io.FileInputStream.open0(Native Method)
at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
at java.base/java.io.FileInputStream.<init>(FileInputStream.java:112)
at java.base/sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:86)
at java.base/sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:184)
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:654)
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.determineDocVersion(XMLVersionDetector.java:150)
at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:860)
at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1216)
at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:324)
at java.xml/javax.xml.parsers.SAXParser.parse
(SAXParser.java:276)
at Publications.main(Publications.java:44)
Do you have any other suggestions about the implementation?

Related

FLAT XML of any type using SAX Parser in Java

I am a novice in Java and I have written a code in which I am struggling to fetch the element value inside the tag. for example in the below xml- id = bk001 didn't appear in the output
<book id="bk001">
<author>Hightower, Kim</author>
<title>The First Book</title>
<genre>Fiction</genre>
<price>44.95</price>
<pub_date>2000-10-01</pub_date>
<date>
<auth_date>
2000-10-01
</auth_date>
<auth_date>
2000-10-05
</auth_date>
</date>
<review>An amazing story of nothing.</review>
</book>
We can expect XML of any type, we have to convert into a flat structure e.g. CSV
Code written
public class SAX
{
Map<String, String> list = new HashMap<String,String>();
public static void main(String[] args) throws IOException {
new SAX().printElementNames("input/books_1.xml");
}
public void printElementNames(String fileName) throws IOException
{
try {
SAXParserFactory parserFact = SAXParserFactory.newInstance();
SAXParser parser = parserFact.newSAXParser();
DefaultHandler handler = new DefaultHandler()
{
public void startElement(String uri, String lName, String ele, Attributes attributes) throws SAXException {
System.out.print(ele + " ");
if((attributes.getValue("TagValue"))==null)
{
return;
}
else
{
System.out.println(attributes.getValue("TagValue"));
}
}
public void characters(char ch[], int start, int length) throws SAXException {
String value = new String(ch, start, length).trim();
if(value.length() == 0) return;
System.out.println(value);
}
};
parser.parse(new File(fileName), handler);
}catch(Exception e){
e.printStackTrace();
}
}
}
Kindly help me with the same. I have tried to search the same on stackoverflow but couldn't get anything concrete.
Agenda of the code is that it should work for any valid XML.
Note - We are not allowed to use external libraries like gson etc.

The only attribute that your code is attempting to read is "TagValue", so why would you expect your code to display the value of an "id" attribute?

replace your startElement with:
public void startElement(String uri, String localName,String qName, Attributes attributes) throws SAXException {
System.out.print(qName + " ");
for(int i=0; i<attributes.getLength();i++) {
System.out.println(attributes.getQName(i) + " " + attributes.getValue(i));
}
}

Read XML File with condition age >30 and output in Java Console

I'm reading a XML File in Eclipse and my Output is in my Console.
So far I managed to output my entries.
But I need to print the entries where my employees are over 30 year old.
This is my XML:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<company>
<name>CompanyName</name>
<employees id="0">
<name>employee name0</name>
<age>33</age>
<role>tester</role>
<gen>male</gen>
</employees>
<employees id="1">
<name>employee name1</name>
<age>18</age>
<role>tester</role>
<gen>female</gen>
</employees>
<employees id="2">
<name>employee name2</name>
<age>38</age>
<role>developer</role>
<gen>male</gen>
</employees>
</company>
And this is what I have been trying :
if (qName.equals("age"))
{
int age2;
String age=attributes.getValue("age");
age2=Integer.ParseInt(age)
if (age2>30){
System.out.println("\tAge="+age2);
}
So I want print down
employee with id=0 and employee with id=2 because they have age >30

Considering that you're using SAXParser and that this snippet of code is inside the overridden method startElement you'll need to override the characters and endElement too. Something like this:
class Handler extends DefaultHandler {
String currentElement;
String currentAgeValue;
String currentNameValue;
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
super.startElement(uri, localName, qName, attributes);
currentElement = qName;
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
super.characters(ch, start, length);
switch(currentElement) {
case "age":
currentAgeValue = new String(ch, start, length);
break;
case "name":
currentNameValue = new String(ch, start, length);
break;
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
super.endElement(uri, localName, qName);
if(qName.equals("employees")) {
int age = Integer.parseInt(currentAgeValue);
if(age > 30) {
System.out.println("Name:" + currentNameValue+", Age:" + age);
}
}
}
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
String xml = "<company><name>CompanyName</name><employees id=\"0\"><name>employee name0</name><age>33</age><role>tester</role><gen>male</gen></employees><employees id=\"1\"><name>employee name1</name><age>18</age><role>tester</role><gen>female</gen></employees><employees id=\"2\"><name>employee name2</name><age>38</age><role>developer</role><gen>male</gen></employees></company>";
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
saxParser.parse(new InputSource(new StringReader(xml)), new Handler());
}
The output will be:
Name:employee name0, Age:33
Name:employee name2, Age:38
The characters method is called when reading the values of a given element, the Attribute parameter of startElement keeps value for XML attributes like id in <employees id="2">.

Join two strings retrieved by SAX

I have an XML file like this one:
<?xml version="1.0" encoding="UTF-8"?>
<Article>
<ArticleTitle>Java-SAX Tutorial</ArticleTitle>
<Author>
<FamilyName>Yong</FamilyName>
<GivenName>Mook</GivenName>
<GivenName>Kim</GivenName>
<nickname>mkyong</nickname>
<salary>100000</salary>
</Author>
<Author>
<FamilyName>Low</FamilyName>
<GivenName>Yin</GivenName>
<GivenName>Fong</GivenName>
<nickname>fong fong</nickname>
<salary>200000</salary>
</Author>
</Article>
I have tried the example in mkyong's tutorial here and I can retrieve data perfectly from it using SAX, it gives me:
Article Title : Java-SAX Tutorial
Given Name : Kim
Given Name : Mook
Family Name : Yong
Given Name : Yin
Given Name : Fong
Family Name : Low
But I want it to give me something like this:
Article Title : Java-SAX Tutorial
Author : Kim Mook Yong
Author : Yin Fong Low
In other terms, I would like to retrieve some of the child nodes of the node Author, not all of them, put them in a string variable and display them.
This is the class I use in order to parse the Authors with the modification I have tried to do:
public class ReadAuthors {
public void parse(String filePath) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
boolean bFamilyName = false;
boolean bGivenName = false;
#Override
public void startElement(String uri, String localName,String qName,
Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("FamilyName")) {
bFamilyName = true;
}
if (qName.equalsIgnoreCase("GivenName")) {
bGivenName = true;
}
}
#Override
public void endElement(String uri, String localName,
String qName) throws SAXException {
}
#Override
public void characters(char ch[], int start, int length) throws SAXException {
String fullName = "";
String familyName = "";
String givenName ="";
if (bFamilyName) {
familyName = new String(ch, start, length);
fullName += familyName;
bFamilyName = false;
}
if (bGivenName) {
givenName = new String(ch, start, length);
fullName += " " + givenName;
bGivenName = false;
}
System.out.println("Full Name : " + fullName);
}
};
saxParser.parse(filePath, handler);
} catch (Exception e) {
e.printStackTrace();
}
}
}
With this modification, it only gives me the ArticleTitle value and it doesn't return anything regarding the authors full names.
I have another class for parsing the ArticleTitle node and they are both called in a Main class.
What did I do wrong? And how can I fix it?

The fullName variable is overwritten everytime when the characters method is called. I think you should move out that variable into the handler: init with empty string when Author starts and write out when it ends. The concatenation should work as you did. I haven't tried this out but something similear should work:
public class ReadAuthors {
public void parse(String filePath) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
boolean bName = false;
String fullName = "";
#Override
public void startElement(String uri, String localName,String qName,
Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("FamilyName")) {
bName = true;
}
if (qName.equalsIgnoreCase("GivenName")) {
bName = true;
}
if (qName.equalsIgnoreCase("Author")) {
fullName = "";
}
}
#Override
public void endElement(String uri, String localName,
String qName) throws SAXException {
if (qName.equalsIgnoreCase("Author")) {
System.out.println("Full Name : " + fullName);
}
}
#Override
public void characters(char ch[], int start, int length) throws SAXException {
String name = "";
if (bName) {
name = new String(ch, start, length);
fullName += name;
bName = false;
}
}
};
saxParser.parse(filePath, handler);
} catch (Exception e) {
e.printStackTrace();
}
}
}

How to handle namespaces with SAX Parser?

I'm trying to learn to parse XML documents, I have a XML document that uses namespaces so, I'm sure I need to do something to parse correctly.
This is what I have:
DefaultHandler handler = new DefaultHandler() {
boolean bfname = false;
boolean blname = false;
boolean bnname = false;
boolean bsalary = false;
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
System.out.println("Start Element :" + qName);
if (qName.equalsIgnoreCase("FIRSTNAME")) {
bfname = true;
}
if (qName.equalsIgnoreCase("LASTNAME")) {
blname = true;
}
if (qName.equalsIgnoreCase("NICKNAME")) {
bnname = true;
}
if (qName.equalsIgnoreCase("SALARY")) {
bsalary = true;
}
}
public void endElement(String uri, String localName,
String qName) throws SAXException {
System.out.println("End Element :" + qName);
}
public void characters(char ch[], int start, int length) throws SAXException {
if (bfname) {
System.out.println("First Name : " + new String(ch, start, length));
bfname = false;
}
if (blname) {
System.out.println("Last Name : " + new String(ch, start, length));
blname = false;
}
if (bnname) {
System.out.println("Nick Name : " + new String(ch, start, length));
bnname = false;
}
if (bsalary) {
System.out.println("Salary : " + new String(ch, start, length));
bsalary = false;
}
}
};
saxParser.parse(file, handler);
My question is, how I can handle the namespase in this example?

To elaborate on what Blaise's point with sample code, consider this contrived example:
<?xml version="1.0" encoding="UTF-8"?>
<!-- ns.xml -->
<root xmlns:foo="http://data" xmlns="http://data">
<foo:record>ONE</foo:record>
<bar:record xmlns:bar="http://data">TWO</bar:record>
<record>THREE</record>
<record xmlns="http://metadata">meta 1</record>
<foo:record xmlns:foo="http://metadata">meta 2</foo:record>
</root>
There are two different types of record element. One in the http://data namespace; the other in http://metadata namespace. There are three data records and two metadata records.
The document could be normalized to this:
<?xml version="1.0" encoding="UTF-8"?>
<ns0:root xmlns:ns0="http://data" xmlns:ns1="http://metadata">
<ns0:record>ONE</ns0:record>
<ns0:record>TWO</ns0:record>
<ns0:record>THREE</ns0:record>
<ns1:record>meta 1</ns1:record>
<ns1:record>meta 2</ns1:record>
</ns0:root>
But the code must handle the general case.
Here is some code for printing the metadata records:
class MetadataPrinter extends DefaultHandler {
private boolean isMeta = false;
#Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
isMeta = "http://metadata".equals(uri) && "record".equals(localName);
}
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
if (isMeta) {
System.out.println();
isMeta = false;
}
}
#Override
public void characters(char[] ch, int start, int length)
throws SAXException {
if (isMeta) {
System.out.print(new String(ch, start, length));
}
}
}
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware(true);
SAXParser parser = factory.newSAXParser();
parser.parse(new File("ns.xml"), new MetadataPrinter());
Note: namespace awareness must be enabled explicitly in some of the older Java XML APIs (SAX and DOM among them.)

In a namespace qualified XML document there are two components to a nodes name: namespace URI and local name (these are passed in as parameters to the startElement and endElement events). When you are checking for the presence of an element you should be matching on both these parameters. Currently your code would work for both documents below even though they are namespace qualified differently.
<foo xmlns="FOO">
<bar>Hello World</bar>
</foo>
And
<foo xmlns="BAR">
<bar>Hello World</bar>
</foo>
You are currently (and incorrectly) matching on the qName parameter. The problem with what you are doing is that the qName might change based on the prefix used to represent a namespace. The two documents below have the exact same namespace qualification. The local names and namespaces are the same, but their QNames are different.
<foo xmlns="FOO">
<bar>Hello World</bar>
</foo>
And
<ns:foo xmlns:ns="FOO">
<ns:bar>Hello World</ns:bar>
<ns:foo>

Sax parser issues in android

I'm trying to parse a xml using SAX parser. The code works fine on pc but on android the elements doesn't get added to list .
In the code i'm trying to add the data within the tags sunrise & sunset onto the list array
In public
void endElement(..) {}
System.out.println("size of list " + timeLst.size()); //Always shows 0 in android
Below is the code..
TimeServiceParser tsp = new TimeServiceParser();
tsp.parseDocument(new URL("http://www.earthtools.org/sun/47.566667/-52.716667/14/3/99/1"));
tsp.printData();
public class TimeService extends DefaultHandler {
public void parseDocument(URL sourceUrl) {
SAXParserFactory spf = SAXParserFactory.newInstance();
try{
SAXParser sp = spf.newSAXParser();
InputStream is = sourceUrl.openStream();
sp.parse(is, this);
}catch(SAXException se) {
...
}
}
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
tempVal = "";
if(qName.equalsIgnoreCase("sunrise")) {
tempTimeData = new TimeData();
}
}
public void endElement(String uri, String localName, String qName) throws SAXException {
if(qName.equalsIgnoreCase("sunrise")) {
tempTimeData.setSunriseTime(tempVal);
timeLst.add(tempTimeData);
}else if(qName.equalsIgnoreCase("sunset")) {
if(tempTimeData!=null) {
TimeData t = (TimeData)(timeLst.get(0));
t.setSunsetTime(tempVal);
}
}
System.out.println("size of list " + timeLst.size()); //Always shows 0 in android
}
public void characters(char[] ch, int start, int length) throws SAXException {
tempVal = new String(ch , start , length);
}
public void printData() {
Iterator<TimeData> it = timeLst.listIterator();
while(it.hasNext()) {
TimeData td = (TimeData)(it.next());
System.out.println(td.getSunriseTime());
System.out.println(td.getSunsetTime());
}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Export certain details of XML file with Java - java

Related

FLAT XML of any type using SAX Parser in Java

Read XML File with condition age >30 and output in Java Console

Join two strings retrieved by SAX

How to handle namespaces with SAX Parser?

Sax parser issues in android

Categories

Resources