I am using jsoup to do recursive crawl a web page.I have links like this
<a href ="#">hash</>
<a href ="#top">hashtop</>
<a href ="http://www.google.com">google</>
I don't have a problem with links like the third one. When u see first and second which will have the navigation within in the same page.When I do document. get to anchor tags I am getting same parent URL for # and parenturl#top for the second one.I don't want those kinds of links to fetch. Can some let me know how to avoid fetching those kinds of links in jsoup
You should be able to use the following :
doc.select("a[href~=^[^#]")
This uses the [attr~=regex] selector syntax with a regex that will only match strings that do not start with #.
Related
Using this code below I get the href from a link element.
Sting url
List<WebElement> wElements = DriverFactory.getWebDriver().findElements(By.className("link-class"))
if(wElements.size() > 0) {
url = wElements[0].getAttribute("href")
}
The problem is that it finds the element but not the href attribute!!!
If I use "Firefox Inspector" the element appears as <span> with the right class name but without href attribute.
<div><span class="link-class">The Title</span></div>
If I use the "View Page Source" the same element appears as an <a> tag, it has href attribute, but different class name!!!
<a href="/href/attribute/here" class="other-link-class"<span>The Title</span></a>
So, is there any way to get the href of an <a> element but as it shown in "View Page Source"? Using Java of course.
There can be some minor difference in the WebElements as shown through View Source and as shown through Inspector tool.
Both of the methods are two different browser features which allows users to look into the HTML DOM of the webpage. The main difference is that, the View Source shows the HTML that was delivered from the AUT (Application under Test) to the browser. Where as, Inspect element is a Developer Tool e.g. Chrome DevTools to look at the state of the DOM Tree after the browser has applied its error correction and after any Javascript have manipulated the DOM. In short, using View Source you will observe the Javascript but not the HTML. The HTML errors may get corrected in the Inspect Elements tool.
As using the Firefox Inspector you don't find the href attributes, similarly when you execute your tests as the <span> elements doesn't contains the href attributes and the value of the href attributes can't be collected.
screenshot.png
Hi All,
I am having a situation where a ::before in HTML code is pointing to asterisk (mandatory field) in HTML page. Please see attached screenshot.
HTML code:
<lightning-input-field class="customRequired abc">
::before
<lightning-picklist>
</lightning-picklist>
</lightning-input-field>
How to write xpath for ::before?
I think there is no straight forward solution instead use javascript :
querySelector takes css selectors
css_selector = 'lightning-input-field[class="customerRequired"]'
browser.execute_script("return window.getComputedStyle(document.querySelector('{}'),':before').getPropertyValue('content')".format(css_selector))
I'm trying to find an element by the text it contains, then check that that element also has a link to a particular place. I'm using selenium/java.
I'm trying to find elements by text when I can to minimise how many changes I will need to make if the UI is updated (reduce test maintenance costs).
I've tried the following, but the assert fails as the getAttribute ends up being null.
WebElement newsHeadlineTemplate = driver.findElement(By.xpath("//*[contains(text(), 'News Headline')]"));
Assert.assertEquals("Template not clickable", "/news/create/new", newsHeadlineTemplate.getAttribute("href"));
HTML for element I'm trying to find/use:
<div class="columns">
<div class="column is-one-third">
<p>News Headline</p>
</div>
</div>
I'm still fairly new to selenium so any help is very much appreciated.
Your XPath selector is a little bit wrong, you're matching <p> tag and you need to match the <a> tag which is the following-sibling for the <p> tag.
So you need to amend your expression to look like:
//p[text()='News Headline']/following-sibling::a
More information:
XPath Tutorial
XPath Axes
XPath Operators & Functions
i'm tried to select an element from an auto suggestion field but i got always an error saying that the element could not be found even that i'm sure my xpath is correct
here's my code :
wait.until(ExpectedConditions.visibilityOfElementLocated(By.xpath("//*[#class=\"ui-menu-item-with-icon ui-menu-item\"][1]")));
driver.findElement(By.xpath("//*[#class=\"ui-menu-item-with-icon ui-menu-item\"][1]")).click();
it should find //*#class=\"ui-menu-item-with-icon ui-menu-item\" which is the first suggestion albert cammus
here's the outerHtml
<li class="ui-menu-item-with-icon ui-menu-item" role="menuitem">
<a class="ui-corner-all" tabindex="-1">
<span class="item-icon"></span>
Albert Camus (SARCELLES)</a>
</li>"
Your XPath is more or less OK apart from using wildcard which may result into longer processing so you can go for li instead of *.
Another option is sticking to the <a> tag containing the text you would like to click using normalize-space() function something like:
//a[normalize-space()="Albert Camus (SARCELLES)"]
Also your popup may reside within an iframe so you might have to switch the webdriver context to the relevant iframe element.
Why don't you try linkText over Xpath ?
linkText is more stable then Xpath, there's no doubt about that.
Code :
wait.until(ExpectedConditions.visibilityOfElementLocated(By.partialLinkText("Albert Camus (SARCELLES)")));
I'm not very sure about spaces in your HTML, that's the reason why I have used partialLinkText
I am filtering links out of a html body using JSOUP.
for such a webpage: https://en.wikipedia.org/wiki/Cloud_computing
i want to filter links such as:
https://en.wikipedia.org/wiki/Light
for hash tag links en.wikipedia.org/wiki/Cloud_computing#cite_note-1
i try doc.select("a[href*=#]").remove(); and it works well where hash tag links in page html src: <a href="#cite_ref-1">
but when i use doc.select("a[href]*=/]").remove(); where links in page html src
CH
But there are still links not filtered . How is this possible?
You have a typo.
doc.select("a[href]*=/]").remove();
It should be like this
doc.select("a[href*=/]").remove();
But this would remove every link containing a /.
Is this what you want, or do you want to remove every link that starts with /.
In that case you need this
doc.select("a[href^=/]").remove();