I'm trying to learn to parse XML documents, I have a XML document that uses namespaces so, I'm sure I need to do something to parse correctly.
This is what I have:
DefaultHandler handler = new DefaultHandler() {
boolean bfname = false;
boolean blname = false;
boolean bnname = false;
boolean bsalary = false;
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
System.out.println("Start Element :" + qName);
if (qName.equalsIgnoreCase("FIRSTNAME")) {
bfname = true;
}
if (qName.equalsIgnoreCase("LASTNAME")) {
blname = true;
}
if (qName.equalsIgnoreCase("NICKNAME")) {
bnname = true;
}
if (qName.equalsIgnoreCase("SALARY")) {
bsalary = true;
}
}
public void endElement(String uri, String localName,
String qName) throws SAXException {
System.out.println("End Element :" + qName);
}
public void characters(char ch[], int start, int length) throws SAXException {
if (bfname) {
System.out.println("First Name : " + new String(ch, start, length));
bfname = false;
}
if (blname) {
System.out.println("Last Name : " + new String(ch, start, length));
blname = false;
}
if (bnname) {
System.out.println("Nick Name : " + new String(ch, start, length));
bnname = false;
}
if (bsalary) {
System.out.println("Salary : " + new String(ch, start, length));
bsalary = false;
}
}
};
saxParser.parse(file, handler);
My question is, how I can handle the namespase in this example?
To elaborate on what Blaise's point with sample code, consider this contrived example:
<?xml version="1.0" encoding="UTF-8"?>
<!-- ns.xml -->
<root xmlns:foo="http://data" xmlns="http://data">
<foo:record>ONE</foo:record>
<bar:record xmlns:bar="http://data">TWO</bar:record>
<record>THREE</record>
<record xmlns="http://metadata">meta 1</record>
<foo:record xmlns:foo="http://metadata">meta 2</foo:record>
</root>
There are two different types of record element. One in the http://data namespace; the other in http://metadata namespace. There are three data records and two metadata records.
The document could be normalized to this:
<?xml version="1.0" encoding="UTF-8"?>
<ns0:root xmlns:ns0="http://data" xmlns:ns1="http://metadata">
<ns0:record>ONE</ns0:record>
<ns0:record>TWO</ns0:record>
<ns0:record>THREE</ns0:record>
<ns1:record>meta 1</ns1:record>
<ns1:record>meta 2</ns1:record>
</ns0:root>
But the code must handle the general case.
Here is some code for printing the metadata records:
class MetadataPrinter extends DefaultHandler {
private boolean isMeta = false;
#Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
isMeta = "http://metadata".equals(uri) && "record".equals(localName);
}
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
if (isMeta) {
System.out.println();
isMeta = false;
}
}
#Override
public void characters(char[] ch, int start, int length)
throws SAXException {
if (isMeta) {
System.out.print(new String(ch, start, length));
}
}
}
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware(true);
SAXParser parser = factory.newSAXParser();
parser.parse(new File("ns.xml"), new MetadataPrinter());
Note: namespace awareness must be enabled explicitly in some of the older Java XML APIs (SAX and DOM among them.)
In a namespace qualified XML document there are two components to a nodes name: namespace URI and local name (these are passed in as parameters to the startElement and endElement events). When you are checking for the presence of an element you should be matching on both these parameters. Currently your code would work for both documents below even though they are namespace qualified differently.
<foo xmlns="FOO">
<bar>Hello World</bar>
</foo>
And
<foo xmlns="BAR">
<bar>Hello World</bar>
</foo>
You are currently (and incorrectly) matching on the qName parameter. The problem with what you are doing is that the qName might change based on the prefix used to represent a namespace. The two documents below have the exact same namespace qualification. The local names and namespaces are the same, but their QNames are different.
<foo xmlns="FOO">
<bar>Hello World</bar>
</foo>
And
<ns:foo xmlns:ns="FOO">
<ns:bar>Hello World</ns:bar>
<ns:foo>
Related
I am using SAX Parser to parse some XML content. Please check my code below.
public void parse(InputSource is, AppDataBean appDataBean) throws RuntimeException {
int limitCheck;
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
Log.d("SAX",appDataBean.getUrl());
DefaultHandler handler = new DefaultHandler() {
boolean title = false;
boolean link = false;
boolean author = false;
public void startElement(String uri, String localName,
String qName, Attributes attributes)
throws SAXException {
if (qName.equalsIgnoreCase(TITLE)) {
title = true;
}
if (qName.equalsIgnoreCase(LINK)) {
link = true;
}
if (qName.equalsIgnoreCase(AUTHOR)) {
author = true;
}
//Log.d("SAX","Start Element :" + qName);
}
public void endElement(String uri, String localName,
String qName)
throws SAXException {
}
public void characters(char ch[], int start, int length)
throws SAXException {
System.out.println(new String(ch, start, length));
if (title) {
Log.d("SAX","End Element :" + "First Name : "
+ new String(ch, start, length));
title = false;
}
if (link) {
Log.d("SAX","End Element :" + "Last Name : "
+ new String(ch, start, length));
link = false;
}
if (author) {
Log.d("SAX","End Element :" + "Nick Name : "
+ new String(ch, start, length));
author = false;
}
}
};
saxParser.parse(is, handler);
} catch (Exception e) {
e.printStackTrace();
}
}
Below is how my XML will look like.
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<?xml-stylesheet type="text/xsl" href="rss.xsl"?>
<channel>
<title>MyRSS</title>
<atom:link href="http://www.example.com/rss.php" rel="self" type="application/rss+xml" />
<link>http://www.example.com/rss.php</link>
<description>MyRSS</description>
<language>en-us</language>
<pubDate>Tue, 22 May 2018 13:15:15 +0530</pubDate>
<item>
<title>Title 1</title>
<pubDate>Tue, 22 May 2018 13:14:40 +0530</pubDate>
<link>http://www.example.com/news.php?nid=47610</link>
<guid>http://www.example.com/news.php?nid=47610</guid>
<description>bla bla bla</description>
</item>
</channel>
</rss>
However in here, I nee to avoid the Channel tag and only read of the root tag is Item. Then only I can get the real content. How can I do this?
Update
As suggested by an answer, I tried using the SAX Parser with stack. Below is the code but still I no good, now it prints nothing for the First Name
public void parse(InputSource is, AppDataBean appDataBean) throws RuntimeException {
int limitCheck;
stack = new Stack<>();
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
Log.d("SAX", appDataBean.getUrl());
DefaultHandler handler = new DefaultHandler() {
boolean title = false;
boolean link = false;
boolean author = false;
public void startElement(String uri, String localName,
String qName, Attributes attributes)
throws SAXException {
Log.d("SAX", "localName: " + localName);
if(localName.equalsIgnoreCase("item"))
{
stack = new Stack<>();
stack.push(qName);
}
if (qName.equalsIgnoreCase(TITLE)) {
if(stack.peek().equalsIgnoreCase("item"))
{
title = true;
}
}
if (qName.equalsIgnoreCase(LINK)) {
link = true;
}
if (qName.equalsIgnoreCase(AUTHOR)) {
author = true;
}
//Log.d("SAX","Start Element :" + qName);
}
public void endElement(String uri, String localName,
String qName)
throws SAXException {
stack.pop();
}
public void characters(char ch[], int start, int length)
throws SAXException {
System.out.println(new String(ch, start, length));
if (title) {
Log.d("SAX", "End Element :" + "First Name : "
+ new String(ch, start, length));
title = false;
}
if (link) {
Log.d("SAX", "End Element :" + "Last Name : "
+ new String(ch, start, length));
link = false;
}
if (author) {
Log.d("SAX", "End Element :" + "Nick Name : "
+ new String(ch, start, length));
author = false;
}
}
};
saxParser.parse(is, handler);
} catch (Exception e) {
e.printStackTrace();
}
}
Typically a SAX application will maintain a stack to hold context. On a startElement event, push the element name to the stack; on endElement pop it off the stack. Then when you get a startElement event for a title element, you can do stack.peek() to see what the parent of the title is.
I got an error while trying to read this xml file:
<?xml version="1.0" encoding="UTF-8"?>
<job>
<id>B002</id>
<name>Name</name>
<time>every day 1:00</time>
</job>
It said:
org.xml.sax.SAXParseException; systemId: file:///D:/JobManagement.xml; lineNumber: 1; columnNumber: 20; A pseudo attribute name is expected. .
I searched on google and find out some ways to solve this problem but they did not work. I'm using SAX Parser code from Mykong.com in the following link :
https://www.mkyong.com/java/how-to-read-xml-file-in-java-sax-parser/
I have to solved this as quick as I can so I do not have enough time to learn it. Please help me.
Above xml code is just part of my file.
public class JobManagementService {
public void ReadXMLFile() {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
boolean bfname = false;
boolean blname = false;
boolean bnname = false;
boolean bsalary = false;
public void startElement(String uri, String localName, String qName, Attributes attributes)
throws SAXException {
System.out.println("Start Element :" + qName);
if (qName.equalsIgnoreCase("FIRSTNAME")) {
bfname = true;
}
if (qName.equalsIgnoreCase("LASTNAME")) {
blname = true;
}
if (qName.equalsIgnoreCase("NICKNAME")) {
bnname = true;
}
if (qName.equalsIgnoreCase("SALARY")) {
bsalary = true;
}
}
public void endElement(String uri, String localName, String qName) throws SAXException {
System.out.println("End Element :" + qName);
}
public void characters(char ch[], int start, int length) throws SAXException {
if (bfname) {
System.out.println("First Name : " + new String(ch, start, length));
bfname = false;
}
if (blname) {
System.out.println("Last Name : " + new String(ch, start, length));
blname = false;
}
if (bnname) {
System.out.println("Nick Name : " + new String(ch, start, length));
bnname = false;
}
if (bsalary) {
System.out.println("Salary : " + new String(ch, start, length));
bsalary = false;
}
}
};
saxParser.parse("D:\\JobManagement.xml", handler);
} catch (Exception e) {
e.printStackTrace();
}
}
}
I call ReadXMLFile method here
#GetMapping("/jobManagement")
public String home() {
jobManagementService.ReadXMLFile();
return "/jobManagement";
}
I just want to test this function before applying it
The file you have shown us is well-formed and parseable. Because the error message is very precise about what is wrong and where (just after the version="1.0") my suspicion would be that one of the characters isn't what it appears to be. Perhaps the quotation marks are actually "smart quotes", or perhaps the space after version="1.0" is actually a non-breaking space.
(By the way, telling us you are in a hurry and don't want to investigate the problem thoroughly will put many people off from answering your question. We're problem-solvers by nature, we like to get to the bottom of things, and working with someone who says they don't want to put any effort in from their side is not usually very rewarding.)
I have an XML file like this one:
<?xml version="1.0" encoding="UTF-8"?>
<Article>
<ArticleTitle>Java-SAX Tutorial</ArticleTitle>
<Author>
<FamilyName>Yong</FamilyName>
<GivenName>Mook</GivenName>
<GivenName>Kim</GivenName>
<nickname>mkyong</nickname>
<salary>100000</salary>
</Author>
<Author>
<FamilyName>Low</FamilyName>
<GivenName>Yin</GivenName>
<GivenName>Fong</GivenName>
<nickname>fong fong</nickname>
<salary>200000</salary>
</Author>
</Article>
I have tried the example in mkyong's tutorial here and I can retrieve data perfectly from it using SAX, it gives me:
Article Title : Java-SAX Tutorial
Given Name : Kim
Given Name : Mook
Family Name : Yong
Given Name : Yin
Given Name : Fong
Family Name : Low
But I want it to give me something like this:
Article Title : Java-SAX Tutorial
Author : Kim Mook Yong
Author : Yin Fong Low
In other terms, I would like to retrieve some of the child nodes of the node Author, not all of them, put them in a string variable and display them.
This is the class I use in order to parse the Authors with the modification I have tried to do:
public class ReadAuthors {
public void parse(String filePath) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
boolean bFamilyName = false;
boolean bGivenName = false;
#Override
public void startElement(String uri, String localName,String qName,
Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("FamilyName")) {
bFamilyName = true;
}
if (qName.equalsIgnoreCase("GivenName")) {
bGivenName = true;
}
}
#Override
public void endElement(String uri, String localName,
String qName) throws SAXException {
}
#Override
public void characters(char ch[], int start, int length) throws SAXException {
String fullName = "";
String familyName = "";
String givenName ="";
if (bFamilyName) {
familyName = new String(ch, start, length);
fullName += familyName;
bFamilyName = false;
}
if (bGivenName) {
givenName = new String(ch, start, length);
fullName += " " + givenName;
bGivenName = false;
}
System.out.println("Full Name : " + fullName);
}
};
saxParser.parse(filePath, handler);
} catch (Exception e) {
e.printStackTrace();
}
}
}
With this modification, it only gives me the ArticleTitle value and it doesn't return anything regarding the authors full names.
I have another class for parsing the ArticleTitle node and they are both called in a Main class.
What did I do wrong? And how can I fix it?
The fullName variable is overwritten everytime when the characters method is called. I think you should move out that variable into the handler: init with empty string when Author starts and write out when it ends. The concatenation should work as you did. I haven't tried this out but something similear should work:
public class ReadAuthors {
public void parse(String filePath) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
boolean bName = false;
String fullName = "";
#Override
public void startElement(String uri, String localName,String qName,
Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("FamilyName")) {
bName = true;
}
if (qName.equalsIgnoreCase("GivenName")) {
bName = true;
}
if (qName.equalsIgnoreCase("Author")) {
fullName = "";
}
}
#Override
public void endElement(String uri, String localName,
String qName) throws SAXException {
if (qName.equalsIgnoreCase("Author")) {
System.out.println("Full Name : " + fullName);
}
}
#Override
public void characters(char ch[], int start, int length) throws SAXException {
String name = "";
if (bName) {
name = new String(ch, start, length);
fullName += name;
bName = false;
}
}
};
saxParser.parse(filePath, handler);
} catch (Exception e) {
e.printStackTrace();
}
}
}
I need to parse document using SAX parser in java. I was able to print all the node values if I use DefaultHandler class traditionally implementing the startElement, endElement and characters method. How can we access the the previous node value at child node, how can I do that?
My Sample XML is:
<staff>
<firstname>yong</firstname>
<lastname>mook kim</lastname>
<nickname>mkyong</nickname>
<salary>100000</salary>
</staff>
<staff>
<firstname>low</firstname>
<lastname>yin fong</lastname>
<nickname>fong fong</nickname>
<salary>200000</salary>
</staff>
Based on salary node value, I also want to access the first name. I am confused. How can we do it? My sample Code:
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
boolean bfname = false;
boolean blname = false;
boolean bnname = false;
boolean bsalary = false;
public void startElement(String uri, String localName,String qName,
Attributes attributes) throws SAXException {
System.out.println("Start Element :" + qName);
if (qName.equalsIgnoreCase("FIRSTNAME")) {
bfname = true;
}
if (qName.equalsIgnoreCase("LASTNAME")) {
blname = true;
}
if (qName.equalsIgnoreCase("NICKNAME")) {
bnname = true;
}
if (qName.equalsIgnoreCase("SALARY")) {
bsalary = true;
}
}
public void endElement(String uri, String localName,
String qName) throws SAXException {
System.out.println("End Element :" + qName);
}
public void characters(char ch[], int start, int length) throws SAXException {
if (bfname) {
System.out.println("First Name : " + new String(ch, start, length));
bfname = false;
}
if (blname) {
System.out.println("Last Name : " + new String(ch, start, length));
blname = false;
}
if (bnname) {
System.out.println("Nick Name : " + new String(ch, start, length));
bnname = false;
}
if (bsalary) {
//System.out.println("Salary : " + new String(ch, start, length));
String nodeValue=new String(ch, start, length);
if(nodeValue.compareTo("100000")==0)
{
**????I need to store the respective respective first name
in ArrayList**
}
bsalary = false;
}
}
};
You can't navigate back and forth when using SAX. You should try using DOM. If you have to use SAX then you can use Stack to hold the previous data and pop them as required.
You can use a String variable to store the name as
public void characters(char ch[], int start, int length) throws SAXException {
... Code Here ...
if (bfname) {
employeeName = new String(ch, start, length);
bfname = false;
}
... Code Here ...
}
& use this variable at the end as
public void characters(char ch[], int start, int length) throws SAXException {
... Code Here ...
if (bsalary) {
String nodeValue=new String(ch, start, length);\
if(nodeValue.compareTo("100000")==0)
{
//Use employeeName Here...
}
bsalary = false;
}
... Code Here ...
}
I am parsing an XML document. I have done this thousands of times before, but I can't see why I am getting the following issue:
Here is the relevant part of the XML document that I am parsing:
XML: <?xml version="1.0" standalone="yes"?>
<ratings>
<url_template>http://api.netflix.com/users/T1BlCJtdcWMuF6gJEfue96_W.kZ_gW81h59KqLEfT1AzE-/ratings/title?{-join|&|title_refs}</url_template>
<ratings_item>
<user_rating value="not_interested"></user_rating>
<predicted_rating>4.8</predicted_rating>
<id>http://api.netflix.com/users/T1BlCJtdcWMuF6gJEfue96_W.kZ_gW81h59KqLEfT1AzE-/ratings/title/70112530</id>
<link href="http://api.netflix.com/catalog/titles/series/70112530/seasons/70112530" rel="http://schemas.netflix.com/catalog/title" title="Castle: Season 1">
</link>
.
.
.
So, I am trying to pase out the user_rating, the predicted_rating, and the id. I am doing this successfully. However, I am noticing that when user_rating contains no value, then the predicted_rating will automatically take the value of , rather than it's own value of 4.8. When user_rating does have value, however, then the predicted_rating will have the correct value. Here is my parsing code:
public class RatingsHandler extends DefaultHandler {
Vector vector;
Ratings ratings;
boolean inUserRating;
boolean inPredictedRating;
boolean inAverageRating;
boolean inID;
public void startDocument() throws SAXException {
vector = new Vector();
ratings = new Ratings();
}
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
if (localName.equals("user_rating")) {
inUserRating = true;
} else if (localName.equals("predicted_rating")) {
inPredictedRating = true;
} else if (localName.equals("average_rating")) {
inAverageRating = true;
} else if (localName.equals("id")) {
inID = true;
}
}
public void characters(char ch[], int start, int length)
throws SAXException {
if (inUserRating) {
ratings.setUserRating(new String(ch, start, length));
inUserRating = false;
} else if (inPredictedRating) {
ratings.setPredRating(new String(ch, start, length));
inPredictedRating = false;
} else if (inAverageRating) {
ratings.setAvgRating(new String(ch, start, length));
inAverageRating = false;
} else if (inID) {
Const.rating_id = new String(ch, start, length);
inID = false;
}
}
public void endDocument() throws SAXException {
if (ratings != null) {
vector.addElement(ratings);
}
}
public Vector getRatings() {
return vector;
}
}
Does it have something to do with the fact that user_rating has an attribute "value"? I would appreciate any help. Thanks!
I would suggest you to wait for the
endElement(String uri, String localName, String qName)
before you mark the element as passed by:
inSomething = false
I can imagine that when the element is empty, the
public void characters(char[] ch, int start, int length)
won't be called, your flag won't be cleared and you will run into inconsitent state having two inSomething flags set to true.