Java - Print any detail of HTML element

Java - Print any detail of HTML element - java

I am fairly new to Java, at least regarding interacting with web. Anyway, I am making an app that has to grab HTML out of a webpage, and parse it.
By parsing I mean finding out what the element has in the 'class="" ' attribute, or in any attribute available in the element. Also finding out what is inside the element. This is where I have searched so far: http://www.java2s.com/Code/Java/Development-Class/HTMLDocumentElementIteratorExample.htm
I found very little regarding this.
I know there are lots of Java parsers out there. I have tried JTidy, and the default Swing parser. I would prefer to use the built-in-to-java parser.
Here is what i have so far (this is just method for testing how it works, proper code will come when i know what & how. Also connection is a URLConnection variable, and connection has been established before this method gets called. < just to clarify):
public void parse() {
try {
InputStream is = connection.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
String line;
while ((line = br.readLine()) != null) {
System.out.println(line);
}
// copied from http://www.java2s.com/Code/Java/Development-Class/HTMLDocumentElementIteratorExample.htm
HTMLEditorKit htmlKit = new HTMLEditorKit();
HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
HTMLEditorKit.Parser parser = new ParserDelegator();
HTMLEditorKit.ParserCallback callback = htmlDoc.getReader(0);
parser.parse(br, callback, true);
// Parse
ElementIterator iterator = new ElementIterator(htmlDoc);
Element element;
while ((element = iterator.next()) != null) {
AttributeSet attributes = element.getAttributes();
Object name = attributes.getAttribute(StyleConstants.NameAttribute);
System.out.println ("All attrs of " + name + ": " + attributes.getAttributeNames().toString());
Enumeration e = attributes.getAttributeNames();
Object obj;
while (e.hasMoreElements()) {
obj = e.nextElement();
System.out.println (obj.toString());
System.out.println ("attribute of class = " + attributes.containsAttribute("class", "login"));
}
if ((name instanceof HTML.Tag)
&& ((name == HTML.Tag.H1) || (name == HTML.Tag.H2) || (name == HTML.Tag.H3))) {
// Build up content text as it may be within multiple elements
StringBuffer text = new StringBuffer();
int count = element.getElementCount();
for (int i = 0; i < count; i++) {
Element child = element.getElement(i);
AttributeSet childAttributes = child.getAttributes();
if (childAttributes.getAttribute(StyleConstants.NameAttribute) == HTML.Tag.CONTENT) {
int startOffset = child.getStartOffset();
int endOffset = child.getEndOffset();
int length = endOffset - startOffset;
text.append(htmlDoc.getText(startOffset, length));
}
}
System.out.println(name + ": " + text.toString());
}
}
} catch (IOException e) {
System.out.println ("Exception?1 " + e.getMessage() );
} catch (Exception e) {
System.out.println ("Exception? " + e.getMessage());
}
}
The question is: How do I get any element's attributes and print them out?

This code is needlessly verbose. I would suggest using a better library like Jsoup. Here's some code to find out all the attributes of all divs on this page.
String url = "http://stackoverflow.com/questions/7311269"
+ "/java-print-any-detail-of-html-element";
Document doc = Jsoup.connect(url).get();
Elements divs = doc.select("div");
int i = 0;
for (Element div : divs) {
System.out.format("Div #%d:\n", ++i);
for(Attribute attr : div.attributes()) {
System.out.format("%s = %s\n", attr.getKey(), attr.getValue());
}
}
Follow the Jsoup Cookbook for a gentle introduction to the this powerful library.

Related

Bad characters when replacing text in pdf using pdfbox

I'm trying to replace text in pdf and it's kind of replaced, this is my code
PDDocument doc = null;
int occurrences = 0;
try {
doc = PDDocument.load("test.pdf"); //Input PDF File Name
List pages = doc.getDocumentCatalog().getAllPages();
for (int i = 0; i < pages.size(); i++) {
PDPage page = (PDPage) pages.get(i);
PDStream contents = page.getContents();
PDFStreamParser parser = new PDFStreamParser(contents.getStream());
parser.parse();
List tokens = parser.getTokens();
for (int j = 0; j < tokens.size(); j++) {
Object next = tokens.get(j);
if (next instanceof PDFOperator) {
PDFOperator op = (PDFOperator) next;
// Tj and TJ are the two operators that display strings in a PDF
if (op.getOperation().equals("Tj")) {
// Tj takes one operator and that is the string
// to display so lets update that operator
COSString previous = (COSString) tokens.get(j - 1);
String string = previous.getString();
if (string.contains("Good")) {
string = string.replace("Good", "Bad");
occurrences++;
}
//Word you want to change. Currently this code changes word "Good" to "Bad"
previous.reset();
previous.append(string.getBytes("ISO-8859-1"));
} else if (op.getOperation().equals("TJ")) {
COSArray previous = (COSArray) tokens.get(j - 1);
COSString temp = new COSString();
String tempString = "";
for (int t = 0; t < previous.size(); t++) {
if (previous.get(t) instanceof COSString) {
tempString += ((COSString) previous.get(t)).getString();
}
}
temp.append(tempString.getBytes("ISO-8859-1"));
tempString = "";
tempString = temp.getString();
if (tempString.contains("Good")) {
tempString = tempString.replace("Good", "Bad");
occurrences++;
}
previous.clear();
String[] stringArray = tempString.split(" ");
for (String string : stringArray) {
COSString cosString = new COSString();
string = string + " ";
cosString.append(string.getBytes("ISO-8859-1"));
previous.add(cosString);
}
}
}
}
// now that the tokens are updated we will replace the page content stream.
PDStream updatedStream = new PDStream(doc);
OutputStream out = updatedStream.createOutputStream();
ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
tokenWriter.writeTokens(tokens);
page.setContents(updatedStream);
}
System.out.println("number of matches found: " + occurrences);
doc.save("a.pdf"); //Output file name
} catch (IOException ex) {
Logger.getLogger(ReplaceTextInPDF.class.getName()).log(Level.SEVERE, null, ex);
} catch (COSVisitorException ex) {
Logger.getLogger(ReplaceTextInPDF.class.getName()).log(Level.SEVERE, null, ex);
} finally {
if (doc != null) {
try {
doc.close();
} catch (IOException ex) {
Logger.getLogger(ReplaceTextInPDF.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
the issue that it's replaced in a bad characters or hidden shape ( as example the bad word becomes only d character), but if i copy and paste it in another place it paste the expected word correctly,
also when i search the generated pdf for the new word it doesn't find it, but when i search with the old word it finds it in the replaced places

I found aspose, this link shows how to use it to replace text in pdfs, it's easy and works perfect except that it's not free, so the free version is printing copyrights line on the head of pdf file pages
http://www.aspose.com/docs/display/pdfjava/Replace+Text+in+Pages+of+a+PDF+Document

Get text from xml - Android

I have this xml online http://64.182.231.116/~spencerf/test.xml
And I am trying to get the two text values Assorted Cereal and Yogurt Parfait (2). Here is how I am currently parsing it, and I get the values I want as well as all the values under then, all the numbers and such, but I just want to get the names, and I am struggling how to just do that, any help or guidance would be great. Here is my code:
String currentDay = "";
String currentMeal = "";
String counter = "";
String icon1 = "";
LinkedHashMap<String, List<String>> itemsByCounter = new LinkedHashMap<String , List<String>>();
List<String> items = new ArrayList<String>();
while (eventType != XmlResourceParser.END_DOCUMENT) {
String tagName = xmlData.getName();
switch (eventType) {
case XmlResourceParser.START_TAG:
if (tagName.equalsIgnoreCase("day")) {
currentDay = xmlData.getAttributeValue(null, "name");
}
if (tagName.equalsIgnoreCase("meal")) {
currentMeal = xmlData.getAttributeValue(null, "name");
}
if (tagName.equalsIgnoreCase("counter") && currentDay.equalsIgnoreCase(day) && currentMeal.equalsIgnoreCase(meal)) {
counter = xmlData.getAttributeValue(null, "name");
}
if (tagName.equalsIgnoreCase("name") && counter != null && currentDay.equalsIgnoreCase(day) && currentMeal.equalsIgnoreCase(meal)) {
icon1 = xmlData.getAttributeValue(null, "icon1");
Log.i(TAG, "icon1: " + icon1);
}
break;
case XmlResourceParser.TEXT:
if (currentDay.equalsIgnoreCase(day) && currentMeal.equalsIgnoreCase(meal) && counter !=(null)) {
if (xmlData.getText().trim().length() > 0) {
//Here gets everything but I just want 2 names
Log.i(TAG, "data: " + xmlData.getText());
items.add(xmlData.getText().trim().replaceAll(" +", " "));
}
}
break;
case XmlPullParser.END_TAG:
if (tagName.equalsIgnoreCase("counter")) {
if (items.size() > 0) {
itemsByCounter.put(counter, items);
items = new ArrayList<String>();
recordsFound++;
}
}
break;
}
eventType = xmlData.next();
So as you can see in the comment in my code I am getting everything under the name tag, back but I just want the value of the name, and not all the other stuff.

You will need to store the name in its own child element (meaning put an end tag before the nutritional facts). Under each dish, you could have this:
<name>Assorted Cereal</name>
<nutrition_facts> ... </nutrition_facts>

Not tested but could do it along these lines:
List<Nutrition_Facts> nutrition_facts = new ArrayList<Nutrition_Facts>();
XMLDOMParser parser = new XMLDOMParser();
AssetManager manager = context.getAssets();
InputStream stream;
try {
stream = manager.open("test.xml"); //need full path to your file here - mine is stored in assets folder
Document doc = parser.getDocument(stream);
}catch(IOException ex){
System.out.printf("Error reading map %s\n", ex.getMessage());
}
NodeList nodeList = doc.getElementsByTagName("nutrition_facts");
for (int i = 0; i < nodeList.getLength(); i++) {
Element e = (Element) nodeList.item(i);
String name;
if (elementName.equals(e.getAttribute("Assorted Cereal"))){
name = e.getAttribute("name");
//do some stuff
}
}
//XMLDOMParser Class
public class XMLDOMParser {
//Returns the entire XML document
public Document getDocument(InputStream inputStream) {
Document document = null;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder db = factory.newDocumentBuilder();
InputSource inputSource = new InputSource(inputStream);
document = db.parse(inputSource);
} catch (ParserConfigurationException e) {
Log.e("Error: ", e.getMessage());
return null;
} catch (SAXException e) {
Log.e("Error: ", e.getMessage());
return null;
} catch (IOException e) {
Log.e("Error: ", e.getMessage());
return null;
}
return document;
}
/*
* I take a XML element and the tag name, look for the tag and get
* the text content i.e for <employee><name>Kumar</name></employee>
* XML snippet if the Element points to employee node and tagName
* is name I will return Kumar. Calls the private method
* getTextNodeValue(node) which returns the text value, say in our
* example Kumar. */
public String getValue(Element item, String name) {
NodeList nodes = item.getElementsByTagName(name);
return this.getTextNodeValue(nodes.item(0));
}
private final String getTextNodeValue(Node node) {
Node child;
if (node != null) {
if (node.hasChildNodes()) {
child = node.getFirstChild();
while(child != null) {
if (child.getNodeType() == Node.TEXT_NODE) {
return child.getNodeValue();
}
child = child.getNextSibling();
}
}
}
return "";
}
}

Java editor HTML kit class listener for check boxes in public page

I am using the java editor HTML kit class to show a public html page that contains checkboxes. I want to build a listener to these checkboxes that arise in the public page that is shown in my panel. Does anyone have any idea what of stuff I should look for? I really appreciate any help!

Here is some really old code that might help get your started.
Note how this code is checking for a DefaultComboBoxModel? Well, I"m guessing that for a check box a ButtonModel will also be found in one of your elements. Once you have access to the model you should be able to add a listener to it.
No guarantee this will work. I suggest you create some simple test HTML to see what kind of output you get.
import java.io.*;
import java.net.*;
import java.util.*;
import javax.swing.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
class GetHTML
{
public static void main(String[] args)
{
EditorKit kit = new HTMLEditorKit();
Document doc = kit.createDefaultDocument();
// The Document class does not yet handle charset's properly.
doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
try
{
// Create a reader on the HTML content.
Reader rd = getReader(args[0]);
// Parse the HTML.
kit.read(rd, doc, 0);
System.out.println( doc.getText(0, doc.getLength()) );
System.out.println("----");
// Iterate through the elements of the HTML document.
ElementIterator it = new ElementIterator(doc);
Element elem = null;
while ( (elem = it.next()) != null )
{
AttributeSet as = elem.getAttributes();
System.out.println( "\n" + elem.getName() + " : " + as.getAttributeCount() );
/*
if ( elem.getName().equals( HTML.Tag.IMG.toString() ) )
{
Object o = elem.getAttributes().getAttribute( HTML.Attribute.SRC );
System.out.println( o );
}
*/
Enumeration enum1 = as.getAttributeNames();
while( enum1.hasMoreElements() )
{
Object name = enum1.nextElement();
Object value = as.getAttribute( name );
System.out.println( "\t" + name + " : " + value );
if (value instanceof DefaultComboBoxModel)
{
DefaultComboBoxModel model = (DefaultComboBoxModel)value;
for (int j = 0; j < model.getSize(); j++)
{
Object o = model.getElementAt(j);
Object selected = model.getSelectedItem();
if ( o.equals( selected ) )
System.out.println( o + " : selected" );
else
System.out.println( o );
}
}
}
//
if ( elem.getName().equals( HTML.Tag.SELECT.toString() ) )
{
Object o = as.getAttribute( HTML.Attribute.ID );
System.out.println( o );
}
// Wierd, the text for each tag is stored in a 'content' element
if (elem.getElementCount() == 0)
{
int start = elem.getStartOffset();
int end = elem.getEndOffset();
System.out.println( "\t" + doc.getText(start, end - start) );
}
}
}
catch (Exception e)
{
e.printStackTrace();
}
System.exit(1);
}
// Returns a reader on the HTML data. If 'uri' begins
// with "http:", it's treated as a URL; otherwise,
// it's assumed to be a local filename.
static Reader getReader(String uri)
throws IOException
{
// Retrieve from Internet.
if (uri.startsWith("http:"))
{
URLConnection conn = new URL(uri).openConnection();
return new InputStreamReader(conn.getInputStream());
}
// Retrieve from file.
else
{
return new FileReader(uri);
}
}
}

How to read bookmarks in PDF using itext at multi level?

I am using iText-Java to split PDFs at bookmark level.
Does anybody know or have any examples for splitting a PDF at bookmarks that exist at a level 2 or 3?
For ex: I have the bookmarks in the following levels:
Father
|-Son
|-Son
|-Daughter
|-|-Grand son
|-|-Grand daughter
Right now I have below code to read the bookmark which reads the base bookmark(Father). Basically SimpleBookmark.getBookmark(reader) line did all the work.
But I want to read the level 2 and level 3 bookmarks to split the content present between those inner level bookmarks.
public static void splitPDFByBookmarks(String pdf, String outputFolder){
try
{
PdfReader reader = new PdfReader(pdf);
//List of bookmarks: each bookmark is a map with values for title, page, etc
List<HashMap> bookmarks = SimpleBookmark.getBookmark(reader);
for(int i=0; i<bookmarks.size(); i++){
HashMap bm = bookmarks.get(i);
HashMap nextBM = i==bookmarks.size()-1 ? null : bookmarks.get(i+1);
//In my case I needed to split the title string
String title = ((String)bm.get("Title")).split(" ")[2];
log.debug("Titel: " + title);
String startPage = ((String)bm.get("Page")).split(" ")[0];
String startPageNextBM = nextBM==null ? "" + (reader.getNumberOfPages() + 1) : ((String)nextBM.get("Page")).split(" ")[0];
log.debug("Page: " + startPage);
log.debug("------------------");
extractBookmarkToPDF(reader, Integer.valueOf(startPage), Integer.valueOf(startPageNextBM), title + ".pdf",outputFolder);
}
}
catch (IOException e)
{
log.error(e.getMessage());
}
}
private static void extractBookmarkToPDF(PdfReader reader, int pageFrom, int pageTo, String outputName, String outputFolder){
Document document = new Document();
OutputStream os = null;
try{
os = new FileOutputStream(outputFolder + outputName);
// Create a writer for the outputstream
PdfWriter writer = PdfWriter.getInstance(document, os);
document.open();
PdfContentByte cb = writer.getDirectContent(); // Holds the PDF data
PdfImportedPage page;
while(pageFrom < pageTo) {
document.newPage();
page = writer.getImportedPage(reader, pageFrom);
cb.addTemplate(page, 0, 0);
pageFrom++;
}
os.flush();
document.close();
os.close();
}catch(Exception ex){
log.error(ex.getMessage());
}finally {
if (document.isOpen())
document.close();
try {
if (os != null)
os.close();
} catch (IOException ioe) {
log.error(ioe.getMessage());
}
}
}
Your help is much appreciated.
Thanks in advance! :)

You get an ArrayList<HashMap> when you call SimpleBookmark.getBookmark(reader); (do the cast if you need it). Try to iterate through that Arraylist and see its structure. If a bookmarks have sons (as you call it), it will contains another list with the same structure.
A recursive method could be the solution.

Reference for those who are looking at this using itext7
public void walkOutlines(PdfOutline outline, Map<String, PdfObject> names, PdfDocument pdfDocument,List<String>titles,List<Integer>pageNum) { //----------loop traversing all paths
for (PdfOutline child : outline.getAllChildren()){
if(child.getDestination() != null) {
prepareIndexFile(child,names,pdfDocument,titles,pageNum,list);
}
}
}
//-----Getting pageNumbers from outlines
public void prepareIndexFile(PdfOutline outline, Map<String, PdfObject> names, PdfDocument pdfDocument,List<String>titles,List<Integer>pageNum) {
String title = outline.getTitle();
PdfDestination pdfDestination = outline.getDestination();
String pdfStr = ((PdfString)pdfDestination.getPdfObject()).toUnicodeString();
PdfArray array = (PdfArray) names.get(pdfStr);
PdfObject pdfObj = array != null ? array.get(0) : null;
Integer pageNumber = pdfDocument.getPageNumber((PdfDictionary)pdfObj);
titles.add(title);
pageNum.add(pageNumber);
if(outline.getAllChildren().size() > 0) {
for (PdfOutline child : outline.getAllChildren()){
prepareIndexFile(child,names,pdfDocument,titles,pageNum);
}
}
}
public boolean splitPdf(String inputFile, final String outputFolder) {
boolean splitSuccess = true;
PdfDocument pdfDoc = null;
try {
PdfReader pdfReaderNew = new PdfReader(inputFile);
pdfDoc = new PdfDocument(pdfReaderNew);
final List<String> titles = new ArrayList<String>();
List<Integer> pageNum = new ArrayList<Integer>();
PdfNameTree destsTree = pdfDoc.getCatalog().getNameTree(PdfName.Dests);
Map<String, PdfObject> names = destsTree.getNames();//--------------------------------------Core logic for getting names
PdfOutline root = pdfDoc.getOutlines(false);//--------------------------------------Core logic for getting outlines
walkOutlines(root,names, pdfDoc, titles, pageNum,content); //------Logic to get bookmarks and pageNumbers
if (titles == null || titles.size()==0) {
splitSuccess = false;
}else { //------Proceed if it has bookmarks
for(int i=0;i<titles.size();i++) {
String title = titles.get(i);
String startPageNmStr =""+pageNum.get(i);
int startPage = Integer.parseInt(startPageNmStr);
int endPage = startPage;
if(i == titles.size() - 1) {
endPage = pdfDoc.getNumberOfPages();
}else {
int nextPage = pageNum.get(i+1);
if(nextPage > startPage) {
endPage = nextPage - 1;
}else {
endPage = nextPage;
}
}
String outFileName = outputFolder + File.separator + getFileName(title) + ".pdf";
PdfWriter pdfWriter = new PdfWriter(outFileName);
PdfDocument newDocument = new PdfDocument(pdfWriter, new DocumentProperties().setEventCountingMetaInfo(null));
pdfDoc.copyPagesTo(startPage, endPage, newDocument);
newDocument.close();
pdfWriter.close();
}
}
}catch(Exception e){
//---log
}
}

XML parsing fails on Blackberry

I am using the following code for parsing an XML file. But I don't get any response. Can anyone help?
I am also getting a warning when I open a connection:
"Warning!: Invocation of questionable method: java.lang.String.() found"`?
public static void main(String arg[]){
XML_Parsing_Sample application = new XML_Parsing_Sample();
//create a new instance of the application
//and start the application on the event thread
application.enterEventDispatcher();
}
public XML_Parsing_Sample() {
_screen.setTitle("XML Parsing");//setting title
_screen.add(new RichTextField("Requesting....."));
_screen.add(new SeparatorField());
pushScreen(_screen); // creating a screen
//creating a connection thread to run in the background
_connectionthread = new Connection();
_connectionthread.start();//starting the thread operation
}
public void updateField(String node, String element){
//receiving the parsed node and its value from the thread
//and updating it here
//so it can be displayed on the screen
String title="Title";
_screen.add(new RichTextField(node+" : "+element));
if(node.equals(title)){
_screen.add(new SeparatorField());
}
}
private class Connection extends Thread{
public Connection(){
super();
}
public void run(){
// define variables later used for parsing
Document doc;
StreamConnection conn;
try{
//providing the location of the XML file,
//your address might be different
conn=(StreamConnection)Connector.open
("http://www.w3schools.com/xml/note.xml",Connector.READ);
//next few lines creates variables to open a
//stream, parse it, collect XML data and
//extract the data which is required.
//In this case they are elements,
//node and the values of an element
DocumentBuilderFactory docBuilderFactory
= DocumentBuilderFactory. newInstance();
DocumentBuilder docBuilder
= docBuilderFactory.newDocumentBuilder();
docBuilder.isValidating();
doc = docBuilder.parse(conn.openInputStream());
doc.getDocumentElement ().normalize ();
NodeList list=doc.getElementsByTagName("*");
_node=new String();
_element = new String();
//this "for" loop is used to parse through the
//XML document and extract all elements and their
//value, so they can be displayed on the device
for (int i=0;i<list.getLength();i++){
Node value=list.item(i).
getChildNodes().item(0);
_node=list.item(i).getNodeName();
_element=value.getNodeValue();
updateField(_node,_element);
}//end for
}//end try
//will catch any exception thrown by the XML parser
catch (Exception e){
System.out.println(e.toString());
}
}//end connection function
}// end connection class

You are probably timing out.
Try opening the connection as HTTP connection instead, and use the new ConnectionFactory class to get rid of annoying suffixes.

public InputStream getResult(String url) {
System.out.println("in get result");
HttpConnection httpConn;
httpConn = (HttpConnection) getHTTPConnection(url);
try {
if (httpConn != null) {
final int iResponseCode = httpConn.getResponseCode();
if (iResponseCode == httpConn.HTTP_OK) {
_inputStream = httpConn.openInputStream();
byte[] data = new byte[20];
int len = 0;
int size = 0;
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
while (-1 != (len = _inputStream.read(data))) {
baos.write(data, 0, len);
size += len;
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
InputStream is2 = new ByteArrayInputStream(
baos.toByteArray());
return is2;
} else {
return null;
}
}
} catch (IOException e) {
System.err.println("Caught IOException: " + e.getMessage());
}
try {
httpConn.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return null;
}
To download xml
To parse the downloaded xml.
public Document XMLfromInputStream(InputStream xml) {
Document doc = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setAllowUndefinedNamespaces(true);
dbf.setCoalescing(true);
dbf.setExpandEntityReferences(true);
try {
DocumentBuilder db;
db = dbf.newDocumentBuilder();
InputSource _source = new InputSource();
_source.setEncoding("UTF-8");
_source.setByteStream(xml);
db.setAllowUndefinedNamespaces(true);
doc = db.parse(_source);
} catch (SAXException e) {
System.out.println("Wrong XML file structure: " + e.getMessage());
return null;
} catch (IOException e) {
System.out.println("I/O exeption: " + e.getMessage());
return null;
} catch (net.rim.device.api.xml.parsers.ParserConfigurationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
return null;
} finally {
}
return doc;
}
Then parse this document in to element
Element rootElement = document.getDocumentElement();
rootElement.normalize();
displayNode( rootElement, 0 );
private void displayNode( Node node, int depth )
{
if ( node.getNodeType() == Node.ELEMENT_NODE )
{
StringBuffer buffer = new StringBuffer();
indentStringBuffer( buffer, depth );
NodeList childNodes = node.getChildNodes();
int numChildren = childNodes.getLength();
Node firstChild = childNodes.item( 0 );
// If the node has only one child and that child is a Text node, then it's of
// the form <Element>Text</Element>, so print 'Element = "Text"'.
if ( numChildren == 1 && firstChild.getNodeType() == Node.TEXT_NODE )
{
buffer.append( node.getNodeName() ).append( " = \"" ).append( firstChild.getNodeValue() ).append( '"' );
add( new RichTextField( buffer.toString() ) );
}
else
{
// The node either has > 1 children, or it has at least one Element node child.
// Either way, its children have to be visited. Print the name of the element
// and recurse.
buffer.append( node.getNodeName() );
add( new RichTextField( buffer.toString() ) );
// Recursively visit all this node's children.
for ( int i = 0; i < numChildren; ++i )
{
displayNode( childNodes.item( i ), depth + 1 );
}
}
}
else
{
// Node is not an Element node, so we know it is a Text node. Make sure it is
// not an "empty" Text node (normalize() doesn't consider a Text node consisting
// of only newlines and spaces to be "empty"). If it is not empty, print it.
String nodeValue = node.getNodeValue();
if ( nodeValue.trim().length() != 0 )
{
StringBuffer buffer = new StringBuffer();
indentStringBuffer( buffer, depth );
buffer.append( '"' ).append( nodeValue ).append( '"' );
add( new RichTextField( buffer.toString() ) );
}
}
}
/**
* Adds leading spaces to the provided string buffer according to the depth of
* the node it represents.
*
* #param buffer The string buffer to add leading spaces to.
* #param depth The depth of the node the string buffer represents.
*/
private static void indentStringBuffer( StringBuffer buffer, int depth )
{
int indent = depth * _tab;
for ( int i = 0; i < indent; ++i )
{
buffer.append( ' ' );
}
}
You can download and parse the xml using above sample.

NamedNodeMap attributes = (NamedNodeMap)value.getAttributes();
for (int g = 0; g < attributes.getLength(); g++) {
Attr attribute = (Attr)attributes.item(g);
System.out.println(" Attribute: " + attribute.getName() +
" with value " +attribute.getValue());
Above code fetches attributes and its values.. cheers

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - Print any detail of HTML element - java

Related

Bad characters when replacing text in pdf using pdfbox

Get text from xml - Android

Java editor HTML kit class listener for check boxes in public page

How to read bookmarks in PDF using itext at multi level?

XML parsing fails on Blackberry

Categories

Resources