Been trying to find a way of getting the String representation of an HTMLElement in HTTPUnit. I'm using HTTPUnit in some tests to get response HTML, and can get the text content of an element, however this does not include a text representation of its surrounding HTML, which I want to compare with a test value.
Any help appreciated.
The is more than one way to represent an HTML element. For instance, the attributes of an HTML element can be in any order, so you could produce a string, but it is not guarantied to be identical to the original element.
Related
I have following part of XML code and I want to get string from specific tag.
Input XML:
<subject>
<title> this title has 5 <sup>+</sup> rating star </title>
</subject>
From the above XML I want a string using xpath like
Expected output:
"this title has 5+ rating star"
Note: I have used #XmlPath like
#XmlPath('subject/title/text()')
private String title;
But it returns second value as result in title variable like "rating star"
This is precisely why people advise against using text() unless you have very specialized requirements. The XPath function string() applied to a node returns the string value of the node, which for an element is the concatenation of all the contained text, ignoring markup - which is exactly what you are looking for. So you want string(subject/title). If you're using a Java API note that the result will now be a string, not a node-set.
Perhaps you could try string(subject/title). I don't have access to Java for the time being but the string operation is common enough in XPath. It is supposed to concatenate text content without traces of any elements.
I have a XML files, and each file contains some informations, it also contains description of itself closed in element <namespace:description></namespace:description>. This description will be inserted in HTML web page and uploaded to web.
The problem is that in description element are other HTML elements and I want to keep them there, so that text can be formatted, but XPath escape all those elements and returns only their text.
<namespace:descr>Some <i>nice</i> description</namespace:descr>
I tried variations on this XPath query: //*[local-name()='descr']
(I'm not really skilled with XPath)
Also tried something like //*[local-name()='descr']//*[not(descendant::*[self::p or self::i])] found in this answer, but it doesn't work for me.
So my question: is there some way to keep XML/HTML elements in text after using XPath query?
The return value of an XPath expression can either be a string, number, boolean or a node-set. Each of these types can be converted to one of the primitive types.
The expression //*[local-name()='descr'] returns a node-set but you then obviously convert it to a string which returns the concatenated text content of the first node in the node-set, stripping off all markup.
To print the content of the result node as markup you would need to do the following:
Retrieve the expression result as node-set. The implementation type of the node-set depends on the XPath engine, and for instance could be a DOM nodelist.
Serialize the nodes as XML fragment. This of course depends on the API node-set and the XPath engine. XSLT could be used for that but it may also be as simple as calling toString() on the node implementation.
I would like to find any WebElement based on text using XPath.
WebElement that I am interested to find,
Its HTML,
Basically my WebElement that I am trying to retrieve by Text contains an input element.
I currently use,
driver.findElement(By.xpath("//*[normalize-space(text()) = 'Own Hotel']"));
which does not find the WebElement above, but it usually works to retrieve all other web elements.
Even,
By.xpath("//*[contains(text(),'Own Hotel')]")
did not give me any results. Although I am interested in exact text match.
I am looking for a way to find web element by text immaterial of the elements that are present inside the web element. If text matches, it should return the WebElement.
Thanks!
It seems text is wrapped inside a label and not input. Try this
driver.findElement(By.xpath(".//label[text()[normalize-space() = 'Own Hotel']]"));
There is nice explanation about this xpath pattern here
In the HTML below:
The innerText Own Hotel within the <input> node contains a lot of white-space characters in the beginning as well at the end. Due to the presence of these leading and trailing white-space characters you can't use the location path text() as:
text() selects all text node children of the context node
As an alternative, you need to use the String Function string normalize-space(string?) as follows:
driver.findElement(By.xpath("//*[normalize-space()='Own Hotel']"));
However, it would a better idea to make your search a bit more granular adding the tagName and preferably an unque attribute as follows:
Using tagName and normalize-space():
driver.findElement(By.xpath("//input[normalize-space()='Own Hotel']"));
Using tagName, and normalize-space():
driver.findElement(By.xpath("//input[#name='ownHotel' and normalize-space()='Own Hotel']"));
References
you can find a couple of relevant discussions using normalize-space() in:
How to click on a link with trailing white-space characters on a web page using Selenium?
How to locate and click the element when the innerText contains leading and trailing white-space characters using Selenium and Python
How to click on the button when the textContext contains leading and trailing white-space characters using Selenium and Python
Which data type should be used to store an HTML page in JAVA?
Recommendation: Store the page in a (Jsoup) Document.
pros:
you can parse those documents from string / file / website in a single line
all entities are escaped (and can be unescaped)
pretty printing
you get a string out of it with a single line of code - a html string as well as a text only one
you can easily select / modify your html
html is "cleaned"
...
see: http://jsoup.org/
But some more informations about what you want to do would be helpful ...
Without knowing what you will do with it, I'd suggest a
java.lang.String
because that's what it actually is. A character string.
I'm using Java 6 on a Tomcat 6.0.33 application server. I'm getting XML that I must render as a form element. The XML I receive looks like
<pquotec type='input' label='Price Quote Currency' nwidth='200' vlength='10'>
XYZ
</pquotec>
and below is the desired output.
<label for="#valid_id_goes_here">Price Quote Currency</label>
<input type="text" size="10" style="width:200px;" value="XYZ" name="#valid_name_goes_here#" id="#valid_id_goes_here#" />
My question is, what is a strategy for transforming the value stored in the XML element's label attribute to something I can replace "#valid_name_goes_here#" above with? Preferably the strategy would allow me to translate back again. Note that things that appear within "" may not necessarily be suitable for values for id and name.
Thanks for your help, - Dave
The name attribute of the input element is defined as having type CDATA, which basically means "any character data", so I think there shouldn't really be a problem.
If you do encounter a validity issue, you could convert any 'awkward' (or simply all) characters to their encoded form. E.g. é would become é.
USE XSLT - Heres an example that converts XML to HTML, but it is trivial to convert XML to XML as well.
In java Xalan can do XSLT, and this thread might also help you.
In case you want to do the XML Parsing and render the target HTML using JSP refer to this thread for a list of XML Parsers
EDIT:
Hmmm, you could have written the question without XML & HTML fragments, and asked simply how to convert any string into a valid HTML Id, and back again.
Use the data- attributes HTML to store the original incoming string. Then use regex to extract valid characters from the incoming string, replacing all invalid characters with underscore, and use that as ID. There is a small chance that you may get duplicate IDs. In that case you can always go back and make the XML come in a way that does not have duplicates.
This way you can get back the original string and have the Valid IDs