Extract an image url from a LI in an UL - java

I am trying to extract the href value <img ng-src= from the following HTML.
Where everything after //images.mycar.com... is dynamic.
It is a <ul> with many <li>, it will be the first item in the list. I want to be able to capture it so I can construct a URL to click on the image.
Sorry if my post is confusing .. i will try and clarify .
In the example //images.mycar.com/BJ/11/BJ11TFZ/BJ11TFZ-used-FORD-FIESTA-DIESEL-HATCHBACK-1-6-TDCi-95-Titanium-3dr-Diesel-Manual-RED-2011-HR-S-01.jpg
Is the item i want to extract, so i can then click on the image.
<!-- Begin Results List -->
<ul id="resultsLists" class="o-media-list c-animate c-animate--show" ng-show="rc.results.vehicles.length" style="">
<li id="BJ11TFZ" class="c-animate c-animate--repeat u-pb ng-scope" ng-repeat="car in rc.results.vehicles track by car.registration" style="">
<div class="o-media c-card c-card--hover c-card__block u-p-0 u-shadowed u-shadowed--hover">
<div class="o-media__left o-grid__col-sm-5 o-grid__col-xs-12 u-p-0 u-no-float--sm">
<a ng-href="/used-car/FORD/FIESTA/BJ11TFZ" class="o-embed-responsive o-embed-responsive--16by9 o-media__object c-rollover" ng-click="rc.viewCar(car, $index)" href="/used-car/FORD/FIESTA/BJ11TFZ">
<img ng-src="//images.mycar.com/BJ/11/BJ11TFZ/BJ11TFZ-used-FORD-FIESTA-DIESEL-HATCHBACK-1-6-TDCi-95-Titanium-3dr-Diesel-Manual-RED-2011-HR-S-01.jpg" alt="FORD FIESTA" class="o-embed-responsive__item c-rollover__image" cs-src-responsive="[ [ 'small', '//images.mycar.com/BJ/11/BJ11TFZ/BJ11TFZ-used-FORD-FIESTA-DIESEL-HATCHBACK-1-6-TDCi-95-Titanium-3dr-Diesel-Manual-RED-2011-HR-M-01.jpg' ], [ 'retina', '//images.mycar.com/BJ/11/BJ11TFZ/BJ11TFZ-used-FORD-FIESTA-DIESEL-HATCHBACK-1-6-TDCi-95-Titanium-3dr-Diesel-Manual-RED-2011-HR-M-01.jpg' ] ]" src="//images.mycar.com/BJ/11/BJ11TFZ/BJ11TFZ-used-FORD-FIESTA-DIESEL-HATCHBACK-1-6-TDCi-95-Titanium-3dr-Diesel-Manual-RED-2011-HR-M-01.jpg">
</a>
</div>

If I understand your question right you are trying to extract this link below from the code above.
//images.mycar.com/BJ/11/BJ11TFZ/BJ11TFZ-used-FORD-FIESTA-DIESEL-HATCHBACK-1-6-TDCi-95-Titanium-3dr-Diesel-Manual-RED-2011-HR-S-01.jpg
That's simple enough. A pure javascript solution to this would be:
var imgUrl = document.getElementById("resultsLists").getElementsByTagName('img')[0].getAttribute('ng-src');
The code starts by selecting the main UL of the document. Then it selects the image itself. Then it gets the "ng-src" attribute of the image and saves it to the variable imgUrl.
Let me know if this is what you were trying to do.

Okay bro, so in what's problem? If you need to use webdriver selenium and java 8
use next approach
driver.findElements(By.xpath("//ul[#id='resultsLists']/li//img")).stream()
.map(e -> e.getAttribute("ng-src"))
.collect(Collectors.toList())

You could get it like this below.
String href = driver.findElement(by.xpath("//ul/li/div/div/a")).getAttribute("href")

Related

Is there a way to parse an entire HTML tag in JSoup?

Hi I'm wondering if there's a way to parse an entire HTML tag using JSoup? In my example pictures below, the five elements (4 images and 1 string) are all inside the "li" container. However, when you open the "li" tag, there are multiple nested containers. Is there a way to parse it so that I have access to all 5 elements contained in this "li" tag? I'm thinking of using getElementsMatchingOwnText("Collins") but that seems to only get me "span class="text text_14 mix-text_color7">Panorama". Any help would be appreciated, thanks!
Yes, you can iterate over the children of your <li> tag using jsoup.
Here is a simplified version of the HTML in your screenshot, showing the 5 elements:
<li>
<span class="foo"><img src="bar" class="img"></span>
<span class="bar">Collins</span>
<i class="baz1"><img src="baz1" class="img"></i>
<i class="baz2"><img src="baz2" class="img"></i>
<i class="baz3"><img src="baz3" class="img"></i>
</li>
Assuming you have selected this specific <li> tag in your document, you can use the following approach:
String html = "<li><span class=\"foo\"><img src=\"bar\" class=\"img\"></span><span class=\"bar\">Collins</span><i class=\"baz1\"><img src=\"baz1\" class=\"img\"></i><i class=\"baz2\"><img src=\"baz2\" class=\"img\"></i><i class=\"baz3\"><img src=\"baz3\" class=\"img\"></i></li>";
Document document = Jsoup.parse(html);
Element element = document.selectFirst("li");
element.children().forEach(child -> {
// do your processing here - this is just an example:
if (child.hasText()) {
System.out.println(child.text());
} else {
System.out.println(child.html());
}
});
The above code prints the following output:
<img src="bar" class="img">
Collins
<img src="baz1" class="img">
<img src="baz2" class="img">
<img src="baz3" class="img">
UPDATE
If the starting point is a URL, then you would need to start with this:
Document document = Jsoup.connect("https://www...").get();
Then the exercise is about identifying a unique way to find your specific element. So, if we update my earlier example, let's assume your web page is like this:
<html>
<head>...</head>
<body>
<div>
<ul class="vList_4">
<li>
<span class="foo"><img src="bar" class="img"></span>
<span class="bar">Collins</span>
<i class="baz1"><img src="baz1" class="img"></i>
<i class="baz2"><img src="baz2" class="img"></i>
<i class="baz3"><img src="baz3" class="img"></i>
</li>
</ul>
</div>
</body
</html>
Here we have a class in a <ul> tag called vList_4. If that is a unique class name, we can use it to jump to that section of the HTML page (IDs are better than class names because they are guaranteed to be unique - but I did not see any ID names in your screenshot).
Now, instead of my previous selector:
Element element = document.selectFirst("li");
We can use this more specific selector:
Element element = document.selectFirst("ul.vList_4 li");
The same results will be printed as before.
So, it's all about you looking at the page structure and figuring out how to jump to the relevant section of the page.
See here for technical details describing how selectors are constructed.

Extract Text from Span Element via XPath [Selenium]

I have the following HTML:
<div class="a-row a-spacing-small a-size-small">
<div class="a-row">
<a class="a-link-normal a-declarative g-visible-js reviewStarsPopoverLink" href="#" data-action="a-popover" data-a-popover="{"closeButton":"false","url":"/gp/customer-reviews/widgets/average-customer-review/popover/ref=wl_it_o_cm_cr_acr_img_hz?ie=UTF8&a=B05555JQP&contextId=wishi&link=1&seeall=1","name":"review-hist-pop.B075555RJQP","max-width":"700","position":"triggerBottom","data":{"itemId":"I2555555554GT","isGridViewInnerPopover":""},"header":"","cache":"true"}">
<i id="review_stars_I2J55555554GT" class="a-icon a-icon-star a-star-4-5">
<span class="a-icon-alt">4.5 out of 5 stars</span>
</i>
<i class="a-icon a-icon-popover"/>
</a>
<a class="a-link-normal g-visible-no-js" href="/product-reviews/B075555JQP/ref=wl_it_o_cm_cr_acr_txt_hz?ie=UTF8&colid=2K4U5555551D&coliid=I2J5555555T&showViewpoints=1">
<span class="a-letter-space"/>
<a id="review_count_I2J55555555GT" class="a-link-normal" href="/product-reviews/B05555555P/ref=wl_it_o_cm_cr_acr_txt_hz?ie=UTF8&colid=255555555D&coliid=I2555555GT&showViewpoints=1">(68)</a>
</div>
<div class="a-row">
<div class="a-row a-size-small itemAvailability">
<div class="a-row itemUsedAndNew">
</div>
I'm trying to extract the value 4.5 out of 5 stars via one of the following XPath:
.//*[contains(#id,'review_stars')]/span[#class='a-icon-alt']
.//*[contains(#id,'review_stars')]
However, everything that I've tried so far has failed (returns empty String)
The funny thing is that all of these XPaths actually work in Firebug so I'm not sure why it isn't working in my program (I suspect it has something to do with the fact that the rating isn't actually visible in browser unless you hover over a specific element but I'm not sure if/why/how this would cause the above mentioned problem and how to fix it)
Thanks!
You failed to include the image between the anchor and span. The span is inside the image, not a sibling of the anchor.
try:
.//*[contains(#id,'review_stars')]/i/span[#class='a-icon-alt']
I will attempt to answer my own question although I do not entirely understand why my previous code isn't working. If someone could provide me with an in depth explanation I will accept their answer as the final answer.
For now this is what works for me:
Instead of calling element.getText(); call element.getAttribute("innerHTML");
This returns the correct result but I would like to understand why getText() does not work in this case. Again, if someone could provide an XPath that works or could provide explanation to all this I will accept it as the final answer.
Thanks
To extract the value 4.5 out of 5 stars through XPath you can use :
//a[#class='a-link-normal a-declarative g-visible-js reviewStarsPopoverLink']/i[starts-with(#id,'review_stars_') and #class='a-icon a-icon-star a-star-4-5']/span[#class='a-icon-alt']
Update :
As you mentioned This does not work either. I just tried it. you must have missed out a part from the xpath which I have provided. My Answer was a proven one. See the snapshot below :
Note : Though your question was related to xpath you have pulled out your answer with respect to getText() method and getAttribute("innerHTML") method. How ever my Answer will be working with both getText() and getAttribute("innerHTML") method.

How do I click 1 of 2 links with same name, no id and same class. selenium java

I'm trying to get selenium to click the select button but I can't use by.linkText() because there are two buttons with the same name.
I'm using this xpath ".//*[contains(#id,'view-something_111111_2A22DF2_)']/div/a[text()='Select']"; to find the button but it can't find it.
I've also tried ".//*[contains(#id,'view-something_111111_2A22DF2_)']/div/a";.
I've looked over the Selenium documentation and can't seem to find a solution.
Here is the section of website code:
<div id="view-something_111111_2A22DF2_0" class="coverage-wrap collapse" aria-expanded="false" style="height: 30px;">...</div>
<div class="btn-raplace">
<a class="btn-beer" data-toggle="collapse" data-target="#view-effectData_111111_2A22DF2_0">Select</a>
for reference, the second Select button has this code:
<div id="view-something_111111_2A3B5DF2_0" class="coverage-wrap collapse" aria-expanded="false" style="height: 30px;">...</div>
<div class="btn-raplace">
<a class="btn-beer" data-toggle="collapse" data-target="#view-effectData_111111_2A3B5DF2_0">Select</a>
which is why I am using the id in my xpath.
Thanks.
You can try this XPATH :- //*[#class="btn-raplace"]/a[#class="btn-beer"][1] here [1] is postion of ur button. Which you want to click
I can see two mistakes in the Xpath you are using.
First Mistake:
.//*[contains(#id,'view-something_111111_2A22DF2_)'] is incorrect.You have placed the single quote at a wrong place. It should be
//div[contains(#id,'view-something_111111_2A22DF2')]
Second Mistake
The element div with the class="btn-raplace" is not the child of the above element. I can see in the HTML that the above element has the closing tags before this element.
Please replace your XPATH with:
//div[contains(#id,'view-something_111111_2A22DF2')]/following-sibling::div[1]/a
Here is the Answer to your Question:
Use this xpath:
//div[#class='btn-raplace']/a[#class='btn-beer']
Let me know if this Answers your Question.

How fetch first element from json list in selenium java?

I have following list of ids in json format. I want to access first id in selenium using java. I tried using
String item = driver.findElement(By.xpath("//ul//li[1]")).getText();
but didn't help.
<body>
<div id="json">
<span class="collapser"></span>
{
<ul class="obj collapsible">
<li>
<span class="prop" title="<root>.hdps">
<span class="q">"</span>
hdps
<span class="q">"</span>
</span>
:
<span class="collapser"></span>
[
<ul class="array collapsible">
<li>
<span class="num">65085</span>
,
</li>
<li>
<span class="num">65089</span>
,
</li>
<li>
<span class="num">65711</span>
,
</li>
</ul>
]
</li>
</ul>
}
</div>
What i understand is you are trying to read the value of ID attribute of an element. I am really not sure the intent of your question . But this how you can get to the value of ID.
You will need to get reference to the element using one of the various element locators. In this case, you have leveraged By.xpath(). You can validate the correctness of the XPATH used by Firefox Xpath checker. Once you use correct XPATH , you will get reference of webElement.
WebElement wElement = driver.findElement(By.xpath("//ul//li[1]"));
// validate the correct XPATH using available tools - ex : firefox xpath checker etc.
You will need to get the value of id attribute of the element.
String requiredID = wElement.getAttribute("id");
Let me know if this works.
As pendem answered that, you can get first ID as you want by using xpath. But here I found that the xpath which you have used find 2 elements as there are two ul elements having li. If you use xpath having specific ul with class attribute as - .//ul[# class="array collapsible"]//li[1] will work.
If you have provided all the relevant HTML, it should be as simple as the below.
String id = driver.findElement(By.cssSelector("span.num")).getText();
This just returns the first instance of the IDs.
If that doesn't work, you'll have to do some more digging, e.g. is this in an IFRAME or is it a timing issue or ?

differentiate two html elements with same class

I have this html code below and I want to differentiate between these two PagePostsSectionPagelet as I only want to find web elements from the first PagePostsSectionPagelet. Is there any way I can do it without using <div id="PagePostsSectionPagelet-183102686112-0" as the value will not always be the same?
<div id="PagePostsSectionPagelet-183102686112-0" data-referrer="PagePostsSectionPagelet-183102686112-0">
<div class="_1k4h _5ay5">
<div class="_5sem">
</div>
</div>
<div id="PagePostsSectionPagelet-183102686112-1" class="" data-referrer="PagePostsSectionPagelet-183102686112-1" style="">
<div class="_1k4h _5ay5">
<div class="_5dro _5drq">
<div class="clearfix">
<span class="_5em9 lfloat _ohe _50f4 _50f7">Earlier in 2015</span>
<div id="u_jsonp_3_4e" class="_6a uiPopover rfloat _ohf">
</div>
</div>
<div id="u_jsonp_3_4j" class="_5sem">
<div id="u_jsonp_3_4g" class="_5t6j">
<div class="_1k4h _5ay5">
<div class="_5sem">
</div>
</div>
Tried using //div[#class='_1k4h _5ay5']//div[#class ='_5sem'] but it will return both.
Using //div[#class='_5dro _5drq']//span[contains(#class,'_5em9 lfloat _ohe _50f4 _50f7') and contains(text(), '')] will help me find the second PagePostsSectionPagelet instead.
you need to use the following xpath:
//div[contains(#class,'_1k4h') and contains(#class,'_5ay5')]
as selenium doesn't work properly with search of several classes in one attribute.
I mean By.Class("_1k4h _5ay5") will found nothing in any case and By.Xpath("//div[#class='_1k4h _5ay5']") can also found nothing in case of class will be "_5ay5 _1k4h" or " _5ay5 _1k4h".(as they possibly generated automatically, its may be have different position on page reload)
But for the best result by performance and by correctness I think will be the following xpath:
".//div[contains(#id, 'PagePostsSectionPagelet')][1]" -- for first div
".//div[contains(#id, 'PagePostsSectionPagelet')][2]" -- for second div
I see that dynamic in the div id is only the number so you can use something like:
WebElement element = driver.FindElements(By.XPath("//div[contains(.,'PagePostsSectionPagelet')])")[1];
This will take only the first web element.
Try using a css selector as below and refine further if required.
The code below returns a List of matching WebElements and then you grab the first one in the List.
List<WebElement> listOfElements = driver.findElements(By.cssSelector("div[data-referrer]"));
WebElement myElement = listOfElements.get(0);
Hint: use the Chrome console to test your css and xpath selectors directly. e.g. use
$$("div[data-referrer]") in the console to reveal what will get selected.

Categories