I am new working in Java and XML DOM parser. I had a requirement like read the xml data and store it inform of column and rows type.
Example:sample.xml file
<staff>
<firstname>Swetha</firstname>
<lastname>EUnis</lastname>
<nickname>Swetha</nickname>
<salary>10000</salary>
</staff>
<staff>
<firstname>John</firstname>
<lastname>MAdiv</lastname>
<nickname>Jo</nickname>
<salary>200000</salary>
</staff>
i need to read this XML file and store it in the above format:
firstName,lastName,nickName,Salary
swetha,Eunis,swetha,10000
john,MAdiv,Jo,200000
Java Code:
NodeList nl= doc.getElementsByTagName("*");
for(int i=0;i< nl.getLength();i++)
{
Element section = (Element) nl.item(i);
Node title = section.getFirstChild();
while (title != null && title.getNodeType() != Node.ELEMENT_NODE)
{
title = title.getNextSibling();
if (title != null)
{
String first=title.getFirstChild().getNodeValue().trim();
if(first!=null)
{
title = title.getNextSibling();
}
System.out.print(first + ",");
} }
System.out.println("");
}//for
I did the above code, but i am not able to find the way to get the data in the above column and row format. Can any one please please kindly help me in solving my issue, i am looking into it from past many days
Since this looks like homework, I'm going to give you some hints:
The chances are that your lecturer has given you some lecture notes and/or examples on processing an XML DOM. Read them all again.
The getElementsByTagName method takes an element name as a parameter. "*" is not a valid element name, so the call won't return anything.
Your code needs to mirror the structure of the XML. The XML structure in this case consists of N staff elements, each of which contains elements named firstname, lastname, nickname and salary.
It is also possible that your lecturer expects you to use something like XSLT or an XML binding mechanism to simplify this. (Or maybe this was intended to be XMI rather than XML ... in which there are other ways to handle this ...)
I kept getElementsByTagName method parameter "*" because to read the data dynamically.
Well, it doesn't work!! The DOM getElementsByTagName method does NOT accept a pattern of any kind.
If you want to make your code generic, you can't use getElementsByTagName. You will need to walk the tree from the top, starting with the DOM's root node.
Can you please provide me with sample data.
No. Your lecturer would not approve of me giving you code to copy from. However, I will point out that there are lots of XML DOM tutorials on the web which should help you figure out what you need to do. The best thing is for you to do the work yourself. You will learn more that way ... and that is the whole point of your homework!
1. The DOM Parser will parse the entire XML file to create the DOM object.
2. You will always need to be aware of the the type of output and the structure of xml returned when a request is fired on a web-service.
3. And its Not the XML structure of a reply which is returned from the Webservice that will be dynamic, but the child elements values and attributes can be Dynamic.
4. You will need to handle this dynamic behavior with try/catch block...
For further details on DOM PARSER, see this site...
http://tutorials.jenkov.com/java-xml/dom.html
Related
I'm relatively new to XML parsers, trying to understand some java code using DOM api to parse an XML document.
I need to know what '#text' means in the following code or even what this line of code does: -
if(!ChildNode.getNodeName().equals("#text"))
{
//do something
}
According to the JavaDoc, #text is the value of the nodeName attribute for nodes implementing the Text interface.
i.e. if a node in the document is a text node (as opposed to, for example, an element), it's nodeName will be #text.
The code in question appears to be checking whether the node referenced by ChildNode is a text node before performing some action. Presumably, the action is something that can't be performed upon a text node, like querying or adding to its children.
i have some already generated xmls and the application causing problems now needs to add elements to it which need to be at a specific position to be valid with to the applied schemata...
now there are two problems the first one is that i have to hardcode the positions which is not that nice but "ok".
But the much bigger one is jdom... I printed the content list and it looks like:
element1
text
element2
element4
text
element5
while the textnodes are just whitespaces and every element i add makes it even more unpredictable how many textnodes there are (because sometimes there are added some sometimes not) which are just counted as it were elements but i want to ignore them because when i add element3 at index 2 its not between element2 and element4 it comes after this annoying textnode.
Any suggestions? The best solution imho would be something that automatically puts it where it has to be according to the schema but i think thats not possible?
Thanks for advice :)
The JDOM Model of the XML is very literal... it has to be. On the other hand, JDOM offers ways to filter and process the XML in a way that should make your task easier.
In your case, you want to add Element content to the document, and all the text content is whitespace..... so, just ignore all the text content, and worry about the Element content only.
For example, if you want to insert a new element nemt before the 3rd Element, you can:
rootemt.getChildren().add(3, new Element("nemt"));
The elements are now sorted out.... what about the text...
A really simple solution is to just pretty-print the output:
XMLOutputter xout = new XMLOutputter(Format.getPrettyFormat());
xout.output(System.out, mydoc);
That way all the whitespace will be reformatted to make the XML 'pretty'.
EDIT - and no, there is no way with JDOM to automatically insert the element in the right place according to the schema....
Rolf
I have the following xml structure already available with the childnodes as title, desc, symp, diag, treat addinfo etc. I want to check that whether addinfo contains any thing and also append some string to it like "This is additional info". There are many disease title I need to check the addifo tag according to disease title-title tag.
<chapter>
<disease type="Name">
<title>Name</title>
<desc>--------------</desc>
<symp>--------------</symptoms>
<diag>-------------</diagnosis>
<treat>-------------</treatment>
<addinfo></addinfo>
</disease>
</chapter
>
I am using XPath query for searching the content of the tags according to the disease name.
Thanks
XPath can only retrieve information from your document, it cannot modify it.
XQuery can produce a modified copy of your document, but it's not particularly easy, because you need to explicitly copy all the parts that you don't want to change.
XSLT is a better bet for producing a modified copy of the document; or XQuery Update Facility if you prefer.
You can use an XPath like this to see if addinfo is empty or not: //addinfo[not(node())]
UPDATE:
Ok, now am totally lost as to what you want. It sounds like a really straightforward problem. You would create some sort of getText() method that given an XML node, gets the text associated with it (if any or returns null or empty string). Then you would do a traversal of all the diseases and for each disease, perform your logic: if( string.isNullOrEmpty(getText(disease, "addinfo")) ) { // do something } -- if you want to check whether addinfo is empty or add stuff to it etc..
<item>
<RelatedPersons>
<RelatedPerson>
<Name>xy</Name>
<Title>asd</Title>
<Address>abc</Address>
</RelatedPerson>
<RelatedPerson>
<Name>xy</Name>
<Title>asd</Title>
<Address>abc</Address>
</RelatedPerson>
</RelatedPersons>
</item>
I d like to parse this data with a SAXParser. How can i do this?
I know the tutorials about SAX, and i can parsing any normal RSS, but i can't parsing this datas only.
Define your Problem: What you can probably do is create a Value Object(POJO) called Person which has the properties: name, title and address. You aim of parsing this XML would then be to create an ArrayList<Person> object. Defining a definite data structure helps you build logic around it.
Choose a Parser : You can then use a SAX Parser or an XML Pull Parser to browse through the tags: see this lin for a tutorial on DOM, SAX and XML Pull Parser in Android.
Data Population Logic: Then while Parsing, whenever you encounter a <RelatedPersons> tag, instantiate a new Person object. When you encounter the respective Properties tag, read the value and populate it in this object. When you encounter a closing </RelatedPersons> dump this Person Object in the ArrayList. Depending on the Parser you use, you will have to use appropriate methods to browse to the child node/nested nodes.(Refer the link for details)
By the time you are done parsing the last item node you will have all the values in your ArrayList.
Note that this is more of a theoretical answer; I hope it helps.
I'm writing a Java program that scrapes a web page for links and then stores them in a database. I'm having problems though. Using HTMLUnit, I wrote the following:
page.getByXPath("//a[starts-with(#href, \"showdetails.aspx\")]");
It returns the correct anchor elements, but I only want the actual path contained in the href attribute, not the entire thing. How can I do this, and further, how can I get the data contained between nodes:
I need this data, too.
Thanks in advance!
The first (getting the href)
page.getByXPath("//a[starts-with(#href, \"showdetails.aspx\")]/#href");
The second (getting the text)
page.getByXPath("//a[starts-with(#href, \"showdetails.aspx\")]/text()");
I assume that getByXPath is a utility function written by you which uses XPath.evaluate? To get the string value you could use either xpath.evaluate(expression, object) or xpath.evaluate(expression, object, XMLConstants.STRING).
Alternatively you could call getNodeValue() on the attribute node returned by evaluating "//a[starts-with(#href, \"showdetails.aspx\")]/#href".