I am developing xml editor using jsp and servlet. In this case i am using DOM parser.
I have one problem in XML editor ,
How to edit the following xml file without losing elements.
eg:
<book id="b1">
<bookbegin id="bb1">
<para id="p1">This is<b>first</b>line</para>
<para id="p2">This is<b>second</b>line</para>
<para id="p3">This is<b>third</b>line</para>
</bookbegin>
</book>
I try to edit the above xml file using dtd using jsp,servlet. but while i read the textvalue from xml, it return only first,second,third.How to read the 'This is' and 'line '. Then how to store back to the xml file using xpath.
thank in advance.
The <b> tag inside the <para> tag is another element, not a formatting tag (in XML). Therefore, you need to traverse down to it.
Like #JRL says, the <b> tags are cosnidered as well-formed XML and, as a consequence, splitted by your DOM processor.
I think youf ail to read other text elements because you only read text when an XML node has no more XML node, which is not your case here.
Related
How to use JAVA DOM XML PARSER with an XML structure like this, can anybody give me an example, I have to do it in 3 hours.
the document looks sth like this:
<mainNode>
<header>
<a></a>
<b></b>
</header>
<part1>
<inside>
<1>text1</1>
<2>text2</2>
</inside>
</part1>
<part2>
<inside>
<1>text1</1>
<2>tex2</2>
</inside>
</part2>
</mainNode>
I need to get the 1 and two text values from a big file.
I am working on Java. I am parsing an xml file, I am getting tag values, it is working. I have xml file as follows:
<DOC>
<STUDENT>
<ID>1</ID>
<NAME>DAN</NAME>
<ADDRESS>U.K</ADDRESS>
</STUDENT>
<STUDENT>
<ID>2</ID>
<NAME>JACK</NAME>
<ADDRESS>U.S</ADDRESS>
</STUDENT>
</DOC>
I have question that I want to fetch data inside <DOC>....</DOC> with their tag name & value as well. Means I want data as follows:
"<STUDENT>
<ID>1</ID>
<NAME>DAN</NAME>
<ADDRESS>U.K</ADDRESS>
</STUDENT>
<STUDENT>
<ID>2</ID>
<NAME>JACK</NAME>
<ADDRESS>U.S</ADDRESS>
</STUDENT>"
Please guide me how to do it.
The most common approaches in Java are to use one of either SAX or Dom parsing libraries.
If you look them up you should find loads of documentation/tutorials about them.
Dom is the easiest to use normally as it stores the entire XML in memory and you cna then access any tag, however, this is less performant and can be problematic if you are using very large XML. SAX requires more work, but reads the XML and processes each tag as it gets to it.
Both are able to do what you need though.
Take a look at SAX Parser.
This link might be helpful too: http://www.mkyong.com/java/how-to-read-xml-file-in-java-sax-parser/
Dot in XML tag
I have problem with tags in xml file.
I have a lot of tags with dots for example <tag.state> example text </tag.state>
JavaScript (extJS), does not parse successfully tags with dots :\
XML file were generated automaticly, and I cannot influance in generated tags.. so is It possible to avoid this issue?
in cannot read tags in extJS
try with ' and dobule quatas " but also it fails...
fields: [ 'tag.state']
or
fields: [ "tag.state"]
I had a similar problem in java where my xml file could only have <string-array> and <item> and <name> etc. I just made a java file that ran through my xml (copied into a .txt) and rewrote everything with correct tags. I can share it with you if you would like.
I have an xml file with html tags like:
<?xml version="1.0" encoding="utf-8" ?>
<blog>
<blogid>49</blogid>
<title>[FIXED] Job requests page broken</title>
<fulltext>
<img title="page broken" src="images/west/blog/site-broken.jpg" alt="page broken" />
<p><span style="background-color: #ccffcc;">Update 28/05/2011</span>: Job requests page seems to be working OK now. If you find any issues please use the contact page to notify us. Thank you for your patience!</p>
<p>Â </p>
<p>Well, what can I say? Why does it always have to be that way? You are trying to create something new and something else gets broken on the way...</p>
</fulltext>
Now I want the whole html part between tag as it is.
What I get right now is blank as I think dom is parsing html tags as well.
I tried xpath but it is not working with android.
I don't think you can get this not well-formed XML into a DOM as-is. (EDIT: or is it well-formed?)
You would need to a) either escape the characters - making the XML well-formed and parseable (but probably not into a DOM you want, I guess you want to display the HTML in a different system) or b) parse it using a stream processor or c) fix it using string manipulation (add <[[CDATA .. ]]>) and then parse it into a DOM.
HTH
HTML is a sub-language of XML (without getting into details related to XHTML). Therefore, there is no reason for the DOM parser not to treat those inner tags as XML tags.
Maybe what you're looking for is a way to flatten what's inside <fulltext>?
use a library like Jsoup for this purpose.
public static void main(String args[]){
String html = "<?xml version="1.0"?><foo>" +
"<bar>Some text — invalid!</bar></foo>";
Document doc = Jsoup.parse(html, "", Parser.xmlParser());
for (Element e : doc.select("bar")) {
System.out.println(e);
}
}
I'm using SAX to read/parse XML documents and I have it working fine except for this particular site where eclipse tells me "junk after document element" and I get no data returned
http://www.zachblume.com/apis/rhyme.php?format=xml&word=example
The site is not mine..just trying to get some data from it.
Yes, that's not an XML document. It's trying to include more than one root element:
<?xml version="1.0"?>
<word>ampal</word>
<word>ample</word>
<word>hampel</word>
<word>hample</word>
<word>lampl</word>
<word>pampel</word>
<word>sample</word>
The parser regards everything after <word>ampal</word> as by that time it's read a complete document... hence the complain about "junk after document element".
An XML document can only have one root, but several children within the root. For example:
<?xml version="1.0"?>
<words>
<word>ampal</word>
<word>ample</word>
<word>hampel</word>
<word>hample</word>
<word>lampl</word>
<word>pampel</word>
<word>sample</word>
</words>
The page does not contain XML. It contains an XML snippet at best:
<?xml version="1.0"?>
<word>ampal</word>
<word>ample</word>
<word>hampel</word>
<word>hample</word>
<word>lampl</word>
<word>pampel</word>
<word>sample</word>
This is incorrect since there is no document element. SAX interprets the first <word> as the document element, and correctly reports "junk after document element" since for all it knows, the document element ends on line 1.
To get around the error, do not treat this document as XML. Download it as text, remove the XML declaration (<?xml version="1.0"?>) and then wrap it in a fake document element before you try to process it.