Get Text Content of xml element - java

So, i have some question about xml Documents in Java. Can i get all text(only text) content of some element and all descendant's elements of this element, without to iterate through all this elements and using Element.getText()? By another words, it must be some analogous function to JavaScript textContent. Or i must to iterate through all nodes?

You'll need to iterate and append.

Related

How to pass an arraylist from Java to xsl and access this individual arraylist elements in XSL?

I am passing an arraylist from Java to Xsl using transformer.setParameter.
ArrayList books=new ArrayList<String>;
transformer.setparameter("booksinXSL","books");
Now I need to access this array's elements in XSL.
<xsl:param name="booksinXSL">
Now If I use this line of code in XSL it throws an error:Invalid conversion of ArrayList to NodeSet.
<value-of select="$booksinXSL[0]">
but if I set it as the below line it prints the entire array [book1,book2] without any error
<value-of select="$booksinXSL">
XSL does not have the concept on arrays defined, but you can define a variable to contain a set of nodes and then iterate through these nodes. You can see a useful example at this page.

Jdoms annoying textnodes and addContent(index, Element) - schema solutions?

i have some already generated xmls and the application causing problems now needs to add elements to it which need to be at a specific position to be valid with to the applied schemata...
now there are two problems the first one is that i have to hardcode the positions which is not that nice but "ok".
But the much bigger one is jdom... I printed the content list and it looks like:
element1
text
element2
element4
text
element5
while the textnodes are just whitespaces and every element i add makes it even more unpredictable how many textnodes there are (because sometimes there are added some sometimes not) which are just counted as it were elements but i want to ignore them because when i add element3 at index 2 its not between element2 and element4 it comes after this annoying textnode.
Any suggestions? The best solution imho would be something that automatically puts it where it has to be according to the schema but i think thats not possible?
Thanks for advice :)
The JDOM Model of the XML is very literal... it has to be. On the other hand, JDOM offers ways to filter and process the XML in a way that should make your task easier.
In your case, you want to add Element content to the document, and all the text content is whitespace..... so, just ignore all the text content, and worry about the Element content only.
For example, if you want to insert a new element nemt before the 3rd Element, you can:
rootemt.getChildren().add(3, new Element("nemt"));
The elements are now sorted out.... what about the text...
A really simple solution is to just pretty-print the output:
XMLOutputter xout = new XMLOutputter(Format.getPrettyFormat());
xout.output(System.out, mydoc);
That way all the whitespace will be reformatted to make the XML 'pretty'.
EDIT - and no, there is no way with JDOM to automatically insert the element in the right place according to the schema....
Rolf

In Jsoup, is it possible get the Elements from a list of Elements without runs through it?

I'm new to Jsoup, but this appears to be a great tool. I'm trying to extract the robots metatag.
I have the following code:
Document doc = Jsoup.parse(htmlContent);
Elements metatags = doc.select("meta");
Element robots = metatags.attr("name", "robots"); // is getting the first element of the list
The last line is wrong.
I want to know if is necessary to run the list of elements to find the element that matches the attribute or there a way that extracts the element that matches the attribute from the Elements list.
Edit 1: I solved this changing to doc.select("meta[name=robots]").
Edit 2: In another words: I want to know how to get all elements in a Elements list that matches some atribute requisite.
Edit 3: I was precipitated doing this question because I had not seen the main documentation yet. Sorry.
It's possible to set the attribute and value you want to retrieve in the select() method to do a better filtering.
Change the select to: doc.select("meta[name=robots]"); and it will get all elements that has the meta tag and it have the name attribute equals robots.
Have you read the JSoup documentation? Here it is from the method you are using:
attr
public Elements attr(String attributeKey,
String attributeValue)
Set an attribute on all matched elements.
Parameters:
attributeKey - attribute key
attributeValue - attribute value
Returns:
this
It returns this. Which means it will return an Elements object. This can't be assigned to an Element object.
I also think you want to use Document.getElementsByTag(String), instead of select.

StringTemplate Formatting last item in a list

I am generating source using StringTemplate, I need to render a list of statements I want all but last to be separated with a ";\n" but format last one to be wrapped in a "return item;\n"
can i achieve this in the template or do i have to do some preprocessing manually?
$call.stmts:{$it$;} ;separator="\n"$
Currently I am using the above.
Try using the trunc() function to get everything in the list but the last element, and the last() method to get the last element, as described here

Xpath - How to get the data contained between elements, not the elements themselves

I'm writing a Java program that scrapes a web page for links and then stores them in a database. I'm having problems though. Using HTMLUnit, I wrote the following:
page.getByXPath("//a[starts-with(#href, \"showdetails.aspx\")]");
It returns the correct anchor elements, but I only want the actual path contained in the href attribute, not the entire thing. How can I do this, and further, how can I get the data contained between nodes:
I need this data, too.
Thanks in advance!
The first (getting the href)
page.getByXPath("//a[starts-with(#href, \"showdetails.aspx\")]/#href");
The second (getting the text)
page.getByXPath("//a[starts-with(#href, \"showdetails.aspx\")]/text()");
I assume that getByXPath is a utility function written by you which uses XPath.evaluate? To get the string value you could use either xpath.evaluate(expression, object) or xpath.evaluate(expression, object, XMLConstants.STRING).
Alternatively you could call getNodeValue() on the attribute node returned by evaluating "//a[starts-with(#href, \"showdetails.aspx\")]/#href".

Categories