HTMLDocument: does Swing "optimize out" span elements? - java

I'm messing about with HTMLDocument in a JTextPane in Swing. If I have this situation:
<html>...
<p id='paragraph1'><span>something</span></p>
<span id='span1'><span>something else</span></span>
...</html>
(the extra <span> tags are to prevent Swing from complaining that I can't change the innerHTML of a leaf) or this situation
<html>...
<p id='paragraph1' />
<span id='span1' />
...</html>
I can call HTMLDocument.getElement() and find the element with ID 'paragraph1' but not the element with id 'span1'. If I change the tag for 'span1' from "span" to "p" then I'm fine. WTF is going on here? Is there another HTML element I can use instead that will allow me to access a particular portion of the document using the id attribute, that will not cause linebreaks? (span would have been perfect :( argh!)
edit: I think the solution is to re-examine what I'm trying to do, which was to leverage the fact that I know how to make GUIs + tables + displays in HTML a lot more than I do in Swing, so I'll ask a different question....

I don't know Swing, but
<p style="display: inline;"> does not line-break, the same as <span>

I have exactly this problem. My span elements disappear.
Whereas if i used div i can see them. But of course I don't want div elements, because it causes a line break.
Damn it! Damn java.
Edit!
STOP THE PRESS!!
Found the answer. At least, an answer which fixes it for me.
I was still able to determine that I had my span element. I will describe what I am doing, an d provide the code to how i did it.
I want to know what element the caret is in. So, this code exists within the caretUpdate function, which provides me with the caret position each time it moves.
#Override
public void caretUpdate(CaretEvent e)
{
System.out.println("caret event: " + e.toString());
Object source = e.getSource();
if (source instanceof JEditorPane)
{
JEditorPane jep = (JEditorPane)source;
Document doc = jep.getDocument();
if (doc instanceof HTMLDocument)
{
HTMLDocument hdoc = (HTMLDocument)doc;
int pos = e.getDot();
Element elem = hdoc.getCharacterElement(pos);
AttributeSet a = elem.getAttributes();
AttributeSet spanAttributeSet = (AttributeSet)a.getAttribute(HTML.Tag.SPAN);
// if spanAttributeSet is not null, then we properly found ' a span '.
// now we need to discover if it is one of OUR spans
if (spanAttributeSet!=null)
{
Object type = spanAttributeSet.getAttribute(HTML.Attribute.TYPE);
if (type !=null && type.equals("dragObject"))
{
// for our logging, we get the ref, which holds the source
// of our value later
System.out.println("the value is: " + spanAttributeSet.getAttribute("ref"));
}
}
}
}
}
Edit!!!
Scratch that... This almost works... except the idiots at Sun decided the key was going to be of type HTML.Attribute. Not only that, that the constructor for the HTML.Attribute is private, and it just so happens that the attribute type that I wanted doesn't exist within their privileged set of attributes. Bastards!
So, all is not lost... I can still get it via the enumerator.. but it is a little more difficult than it needed to be.
LAST EDIT!
Ok, I get it now. If the attribute is of a known type, it is stored in the AttributeSet as an instance of HTML.Attribute("type").
Otherwise, it is stored in the AttributeSet with a 'String' as the key.
Stupid. But i've got there.

I took a look at the javadoc for HTMLDocument, which pointed me to HTMLReader.
I don't see any mention of span in HTMLReader. Maybe it just doesn't know that element.
P probably isn't a good replacement for span. P is a block-level element and span is a text-level element (see description of those terms). Maybe try font (another text-level element) with no attributes?

Related

Find 'li' element

Could someone please assist me in order to find 5th 'li' element since that receiving error when trying to locate it. Following is how looks on page:
<ul class="navigation-primary navigation-primary--right js-navigation-primary">
<li><a data-modal="login-modal" href="javascript:void(0);" data-modal-content-switch="login-options" class="is-button-group-right js-prevent-trigger modal-content-button"><span>Logga in</span></a></li>
<li></li>
<li>Svenska</li>
Actually, I need last one with following attribute
data-menu="language"
Since there are several languages - I suppose that it would solve with if loop:
if (driver.findElement(By.xpath("//*[#id="header"]/div[2]/div/ul[2]/li[5]/a") != null
driver.findElement(By.xpath("//*[#id="header"]/div[2]/div/ul[2]/li[5]/a").click();
else {
system.out.println("element not present");
}
Since that there are several languages and every has last li[5] - thought that lang name could solve it, but did not find solution.
Thank you in advance
I know you accepted an answer but it would be a lot cleaner and clearer if you used a simple CSS selector like
driver.findElement(By.cssSelector("a[data-menu='language']")).click();
or you could be more specific and find the element by language using XPath
driver.findElement(By.xpath("//a[#data-menu='language'][.='Svenska']")).click();
Also, you can't check if an element is null. If it's not there, .findElement() will just throw an exception. If you want to check if an elements exists, use .findElements() and check to see if the collection is empty
List<WebElement> links = driver.findElements(...);
if (links.isEmpty())
{
// element doesn't exist
}
else
{
// element exists
links[0].click(); // or whatever
}
As per the HTML you provided the following xpath/cssSelectorshould work:
driver.findElement(By.xpath("//ul[#class='navigation-primary navigation-primary--right js-navigation-primary']/li[contains(.,'Svenska')]"));
OR
driver.findElement(By.cssSelector("ul.navigation-primary.navigation-primary--right.js-navigation-primary > li.navigation-primary.navigation-primary--right.js-navigation-primary"));
At the risk of redefining your question, I'd be tempted to add some id tags into your html and call driver.findElement(By.id(name)).
The advantage is that if an extra item is added to the start of list then your tests will not break.

Selenium: how to get the value of hidden element which has all div tags

I would like to get the value of all div tags specified in attached. I have tried with all possible locators like classname etc, which is showing null. and tried with JavaScript also which is returning null.
Please see the screen shot and I need the selected text which is in blue color starts with "Enables enterprise IT to deploy networking services"
You need to research creating selectors as this isn't a difficult one. There are numerous approaches for this element, but here's one for you: $$("#offers-popover .description"). Obviously this is a CSS selector based on the $$ and you use getText from the Selenium API in order to scrape the element text, which is what I assume you are intending to do.
driver.findElement(By.css("#offers-popover .description")).getText();
Since your element is not visible you can try this:
String divText = driver.findElement(By.className("description")).getAttribute("textContent");
Or, if this is not the only element on the page with the class description:
WebElement popElement = driver.findElement(By.id("offers-popover"));
String divText = popElement.findElement(By.className("description")).getAttribute("textContent");

Webdriver: find element by html tag which contains an html tag which belongs to a certain class

to clarify a really messy title:
There is a piece of page code that looks like this (I have simplified the actual code):
<div class="CLASS1">
<p>...
<p>
<strong>
<i CLASS2 "></i>
THE DATA I WANT
</strong>
</p>
<p>...
<p>...
</div>
I am using Selenium Webdriver with Java and I've been trying to get the "DATA I WANT" line, but since it is contained between tags (and not the CLASS2 italic tag), I am having hard time accomplishing this. Note that the rest of the paragraphs (marked as <p>...) contain the analogous constructions.
Currently it seems like searching for a CLASS1 <strong> tag that contains CLASS2 element <i> might be the solution, but I have not been able to compose a correct search path so far. Now I am not sure if I am approaching this problem from a correct angle at all.
Any suggestions are greatly aprpeciated! I would like to have the shortest and the most reliable solution for this...
For the HTML you have posted. Following CSS selector should work.
div.CLASS1 > p:nth-child(2) > strong
If you need to get the the strong tag that contains i.CLASS2 then you could do something like this.
// get all p tags
List<WebElement> pTags = driver.findElements(By.cssSelector("div.CLASS1 > p"));
WebElement myWebElement = null;
// iterate and find p tag that contains i.CLASS2
for (WebElement element : pTags) {
if (element.findElements(By.cssSelector("i.CLASS2")).size() == 1) {
myWebElement = element.findElement(By.cssSelector("strong"));
break;
}
}
// the data you want
System.out.println(myWebElement.getText());

Building Strings by scraping html with JSoup

I'm a novice Java programmer, and am just now beginning to expand into the world of libraries, APIs, and the like. I'm at the point where I have an idea that is relatively simple, and can be my pet project when I'm not working on homework.
I'm interested in scraping html from a few different sites, and building strings that look like " Artist - "Track Name" ". I've got one site working the way I want, but I feel it could be done a lot more smoothly... Here's the rundown on what I do for Site A:
I have JSoup create Elements for everything that is of the class plrow like so:
<p class="plrow"><b>Artist</b> “Title” (<span class="sn_ld">Label</span>) <SMALL><b>N </b></SMALL></p></td></tr><tr class="ev"><td><a name="98069"></a><p class="pltime">Time</p>
From there, I create a String array of lines that are split after the last </p>, then use the following code to process the array:
for (int i = 0; i < tracks.length; i++){
tracks[i] = Jsoup.parse(tracks[i]).text();
tracks[i] = tracks[i].split("”")[0];
tracks[i] = tracks[i].toString()+ "”";
}
Which is a pretty hackish way to get Artist "Title" the way I want, but the result is fine for me.
Site B is a little bit different.
I've determined that the Artists and Titles are all contained like this:
<span class="artist" property="foaf:name">Artist Name</span> </a> </span> <span class="title" property="dc:title">Title</span>
along with more information, all inside of <li id="segmentevent-random" class="segment track" typeof="po:MusicSegment" about="/url"> song info </li>
I was trying to go through and snag all of the artists first, and then the titles and then merge the two, but I was having trouble with that because the "dc:title" property used to display the track title is used for other non music things, so I can't directly match up the artist with a track.
I have spent the lion's share of this weekend trying to get this working by viewing countless questions tagged with Jsoup, and spending a lot of time reading the Jsoup cookbook and API guide. I have a feeling that part of my trouble could also stem from my relatively limited knowledge of how web pages are coded, though that may mostly be my trouble with my understanding of how to plug these bits of code into Jsoup.
I appreciate any help or guidance, and I've got to say, it's really nice to ask a non-homework question here (though I find quite a few hints from what others have asked! ;) )
Common:
If you have some different websites where you want to parse content its a good idea to differ between them. Maybe you can decide if you parse Page A or Page B by the URL.
Example:
if( urlToPage.contains("pagea.com") )
{
// Call parsemethod for Page A or create parserclass
}
else if( urlToPage.contains("pageb.com") )
{
// Call parsemethod for Page B or create parserclass
}
// ...
else
{
// Eg. throw Exception because there's no parser available
}
You can connect and parse each page into a document with a single line of code:
// Note: the protocol (http) is required here
Document doc = Jsoup.connect("http://pagewhaterver.com").get();
Without knowing the Html or the structure of each page, here are some basic approaches:
Page A:
for( Element element : doc.select("p.plrow") )
{
String title = element.ownText(); // Title - output: '“Title” ()' (you have to replace the " and () here)
String artist = element.select("a").first().text(); // Artist
String label = element.select("span.sn_ld").first().text(); // Label
// etc.
}
Page B:
Similar to Page B, Artitst and Title can be selected like this:
String artist = doc.select("span.artist").first().text();
String title = doc.select("span.title").first().text();
Here's a good overview of the Jsoup Selector API: http://jsoup.org/cookbook/extracting-data/selector-syntax

How to mix html with gwt widgets?

I try to generate a dynamically gwt ui. As a result I would get a html fragment like this:
<ol>
<li>MyLabel</li>
<li><input type="text"></li>
</ol>
The Label should be a GWT Label and the input should be a GWT TextBox.
How can I achieve this with GWT? I've tried to use the HTMLPanel class, but how can I inject the
<li>
Tags?
I can't use UIBinder, since I would like to dynamically create such fragments as shown above.
You should create your own subclass of ComplexPanel. This will give you something that works much the same as a HorizontalPanel or VerticalPanel, only based on list elements rather than table elements. It should look something like this:
public class OListPanel extends ComplexPanel {
final OListElement ol = Document.get().createOLElement();
public OListPanel() {
setElement(ol);
}
public void add(Widget w) {
LIElement li = Document.get().createLIElement();
ol.appendChild(li);
add(w, (Element)li.cast());
}
public void insert(Widget w, int beforeIndex) {
checkIndexBoundsForInsertion(beforeIndex);
LIElement li = Document.get().createLIElement();
ol.insertBefore(li, ol.getChild(beforeIndex));
insert(w, (Element)li.cast(), beforeIndex, false);
}
public boolean remove(Widget w) {
Element li = DOM.getParent(w.getElement());
boolean removed = super.remove(w);
if (removed) {
ol.removeChild(li);
}
return removed;
}
}
I haven't tested that but it's basically right. WRT the markup you posted in your question, this code will produce one slight difference, since GWT labels have their own <div> element:
<ol>
<li><div>MyLabel</div></li>
<li><input type="text"></li>
</ol>
You might prefer InlineLabel, which is based on the less intrusive <span> element.
You can always do something like:
Document doc = Document.get();
final OListElement ol = doc.createOLElement();
LIElement li = doc.createLIElement();
li.appendChild((new Label()).getElement());
ol.appendChild(li);
li = doc.createLIElement();
li.appendChild((new TextBox()).getElement());
ol.appendChild(li);
panel.add(new Widget() {{
setElement(ol);
}});
Why not put that fragment in a standalone Widget created via UiBinder? (if you know how the structure will look beforehand and just want to insert MyLabel and a TextBox)
Don't be afraid to split your widgets like this - the GWT Compiler is great at optimizing and UiBinder templates are processed at compile time so there shouldn't be any performance penalty (benchmarking is still strongly recommended - YMMV). I'd even say that it'd be faster then trying to add this structure via the DOM package - with UiBinder, the compiler knows what it's dealing with, with DOM you are basically saying: "I know what I'm doing, don't touch my code!" (at least that's my view on this :)). HTMLPanel could be an alternative, but you'd have to assign an id to every element you want to modify/attach stuff to... :/
Bottom line: use UiBinder for this, that's what it was built for.

Categories