Selenium chrome driver scrape dynamically add attribute in element

Selenium chrome driver scrape dynamically add attribute in element - java

Hello I am new in selenium chrome driver. I am scraping ecommerce web site where i am scraping all products details from home page but in that page products image are loading dynamically(after 5-7 seconds when products loaded).
source code is like this
<img alt="product1" class="image" />
after 5-7 seconds
<img alt="product1" class="image" src="product image url" />
So i want to scrape that image src attribute value.
I tried by below way
driver.manage().timeouts().pageLoadTimeout(20, TimeUnit.SECONDS);
or
driver.manage().timeouts().implicitlyWait(20, TimeUnit.SECONDS);
or
Thread.sleep(20000)
but i am failed
anybody help me for how to get image src attribute value?

Selenium's "FluentWait" is your friend
final WebElement imgWithSrc = new FluentWait<>(driver)
.withTimeout(Duration.of(10_000, ChronoUnit.MILLIS))
.pollingEvery(Duration.of(250, ChronoUnit.MILLIS))
.ignoring(NoSuchElementException.class)
.ignoring(StaleElementReferenceException.class)
.ignoring(ScriptTimeoutException.class)
.until(d -> {
final WebElement imgElement = d.findElement(By.cssSelector("img.image"));
if (StringUtils.isNotBlank(imgElement.getAttribute("src"))) {
return imgElement;
}
return null;
});
In the second line you see the max. wait of 10s, with polling every 250ms (third line)

Try this:
WebElement image = new FluentWait<WebDriver>(driver)
.withTimeout(Duration.of(10, ChronoUnit.SECONDS))
.until(
ExpectedConditions.presenceOfElementLocated(
By.xpath("//img[#alt='product1'][#src]")
)
);
The above code means that the waiter will be polling your DOM for 10 seconds unless your DOM gets the element described in the xpath. This [#src] part of xpath means that we query the element having src attribute so no positive result would be returned unless the required attribute is assigned to an element.

Related

Unable to parse dynamic content with Selenium

I'm practicing and trying to parse behance.net to retrieve .jpg files.
First, I tried with JSOUP, but I only receives and JS code without any useful code. Then i tried with selenium:
System.setProperty("webdriver.chrome.driver", "S:\\behance-id\\src\\main\\resources\\chromedriver.exe");
WebDriver driver = new ChromeDriver();
driver.get("https://www.behance.net/gallery/148589707/Hercules-and-Randy");
String str = driver.getPageSource();
And I got same result. Through Google Chrome inspect option I found what I need:
But I cannot acces to this source page via Selenium and JSOUP and other instruments.
I only receive this with <script> tags:
Is it possible?

That page is loading its resources dynamically, after the original HTML, so you should use waits in Selenium. This is the Java example for waiting an element to be loaded in page, from the documentation:
WebDriver driver = new ChromeDriver();
driver.get("https://google.com/ncr");
driver.findElement(By.name("q")).sendKeys("cheese" + Keys.ENTER);
// Initialize and wait till element(link) became clickable - timeout in 10 seconds
WebElement firstResult = new WebDriverWait(driver, Duration.ofSeconds(10))
.until(ExpectedConditions.elementToBeClickable(By.xpath("//a/h3")));
// Print the first result
System.out.println(firstResult.getText());
The documentation can be found at https://www.selenium.dev/documentation/webdriver/waits/

xPath Text Contains Selenium Web Driver

I'm trying to select an element based on its text contents. I am using XPath to achieve this.
I am just puzzled as this should work?
WebElement link = obj.driver.findElement(By.xpath("//div[contains(text(), 'Notifications')]"));
I'll even copy the HTML code:
<div class="linkWrap noCount">Notifications <span class="count _5wk0 hidden_elem uiSideNavCountText">(<span class="countValue fsm">0</span><span class="maxCountIndicator"></span>)</span></div>
The div element has the words "Notifications" inside it. So why doesn't it work.
Go to this page on Facebook: https://www.facebook.com/settings
Use this chrome extension to highlight any area via xPath.

You have a space before the word Notifications:
WebElement link = obj.driver.findElement(By.xpath("//div[contains(text(), 'Notifications')]"));
You should also add a wait for element before trying to find the element:
WebDriverWait wait = new WebDriverWait(webDriver, timeoutInSeconds);
wait.until(ExpectedConditions.visibilityOfElementLocated(By.xpath("//div[contains(text(), 'Notifications')]"));
WebElement link = obj.driver.findElement(By.xpath("//div[contains(text(), 'Notifications')]"));

I found the issue with the help of some amazing people in this community.
Ok, so my element was in an iFrame.
In order to access the element, I must first access the iFrame
WebElement iframe = obj.driver.findElement(By.xpath("//iframe[#tabindex='-1']"));
obj.driver.switchTo().frame(iframe);

Selenium - iterate over elements

My goal is:
Iterate on WebElements in a webpage
Click on all elements founded and open link in same session
Parse the new page with some other logic
Return back to prev page and continue the loop for all prev matched id
I have this code:
List<WebElement> links = driver.findElements(By.cssSelector("div[data-sigil='touchable']"));
// this will display list of all images exist on page
for(WebElement ele:links)
{
System.out.println("test->"+ele.getAttribute("id"));
ele.click();
Thread.sleep(500);
System.out.println("URI->"+driver.getCurrentUrl());
js.executeScript("window.history.go(-1)");
}
return "ok";
Which is working fine and it finds correct elements id, "ele.click()" is actually working, but I'm always failing when I execute js.executeScript("window.history.go(-1)")
This is my error message:
org.openqa.selenium.StaleElementReferenceException: stale element reference: element is not attached to the page document
(Session info: chrome=73.0.3683.103)
(Driver info: chromedriver=2.40.565498 (ea082db3280dd6843ebfb08a625e3eb905c4f5ab),platform=Windows NT 10.0.17134 x86_64) (WARNING: The server did not provide any stacktrace information)
Command duration or timeout: 0 milliseconds
So basically I'm not able to continue the loop.
Is it there any useful technique to "click into new tab" and manage different Selenium driver session?
Thanks a lot in advance for any suggestion.

What is happening is that when you proceed to another page it makes all the elements in the list stale. Those elements are not attached to the page when you come back to the page again. You need to find the elements every time you load the page.
Try this:
List<WebElement> links = driver.findElements(By.cssSelector("div[data-sigil='touchable']"));
// this will display list of all images exist on page
String address;
for(int i=0; i<links.size(); i++){
address = driver.getCurrentUrl();
links = driver.findElements(By.cssSelector("div[data-sigil='touchable']"));
System.out.println("size: "+links.size());
WebElement ele = links.get(i);
System.out.println("test->"+ele.getAttribute("id"));
ele.click();
Thread.sleep(500);
System.out.println("URI->"+driver.getCurrentUrl());
//js.executeScript("window.history.go(-1)");
//driver.navigate().back();
driver.get(address);
}
Edit:
Try the driver.get() as it waits for the page to load. Or you can directly add another sleep as you used after the click.

I think you need to create the js object, like so.
Reason being that you "lost" the reference to the JavascriptExecutor
List<WebElement> links = driver.findElements(By.cssSelector("div[data-sigil='touchable']"));
// this will display list of all images exist on page
for(WebElement ele:links){
System.out.println("test->"+ele.getAttribute("id"));
ele.click();
Thread.sleep(500);
System.out.println("URI->"+driver.getCurrentUrl());
// Re initialise js executor
JavascriptExecutor js = (JavascriptExecutor) driver;
js.executeScript("window.history.go(-1)");
}
return "ok";

Selenium WebDriver- Java- It is not willing to click a link

I have tried, quite a lot and I am not even getting any sort of errors but it is not printing anything (I want it to print the title of the page)
WebDriver driver = new HtmlUnitDriver();
WebElement element = driver.findElement(By.cssSelector("a[href*='Alerts.htm']"));
element.click();
System.out.println(driver.getTitle());
Here is the HTML code (the part I wish to click), there is a title for both , the page I want to click and the current page.
<li title="Alerts"><span>Alerts</span></li>
I am not any errors but it should print the title, which it is not doing.
I have followed many sorts of instructions found here and on the web.
Things I have tried so far:
By locator = By.xpath("//li[#title='Alerts']/a");
WebElement element = driver.findElement(locator);
element.click();
WebElement element = driver.findElement(By.partialLinkText("Alert"));
element.click();
Where am I going wrong?

The title of an HTML document is defined within a <title> tag, typically in the <head> section.
This is the title that the getTitle method returns.
See http://www.w3schools.com/tags/tag_title.asp.

I am not sure of this. But I think HTMLUnitDriver is a headless browser instance. Kindly try in another browser, firefox perhaps.
WebDriver driver = new FirefoxDriver();

You first need to open a page in a browser!
WebDriver driver = new ...
driver.get(url) // THIS LAUNCHES AN ACTUAL BROWSER
// now you can actually do things
driver.findElement ...
driver.getTitle()
driver.quit() // TO DISMISS THE BROWSER

How to use selenium to collect information from an element that has not been loaded yet

hi i am very new to selenium and i want to collect all those elements that have a particular span id.
Problem: When selenium opens the webpage, it shows(by default) only the first three divs containing approx 50 lines of data per div. I want to fetch information(text) from spans contained in all divs. Is there a way to fetch information from those unloaded divs? If not how can i load those divs by controlling the scroll-bar?

You can use below code for scrolling your page down but fetching information from div without loading will not be possible.
WebDriver driver = = new FirefoxDriver();
JavascriptExecutor jse = (JavascriptExecutor)driver;
jse.executeScript("window.scrollBy(0,300)", "");
OR
JavascriptExecutor js = (JavascriptExecutor)driver;
js.executeScript("window.scrollTo(0,Math.max(document.documentElement.scrollHeight," +
"document.body.scrollHeight,document.documentElement.clientHeight));");

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Selenium chrome driver scrape dynamically add attribute in element - java

Related

Unable to parse dynamic content with Selenium

xPath Text Contains Selenium Web Driver

Selenium - iterate over elements

Selenium WebDriver- Java- It is not willing to click a link

How to use selenium to collect information from an element that has not been loaded yet

Categories

Resources