What is the proper CSS selector to get what I need? - java

I'm using a Java lib (JSoup) to fetch content from a website that my program can ingest and then process. Specifcally, the content I'm looking for is inside the ontw div below:
<div class="ms5">
<div class="header">
<!-- ... -->
</div>
<div class="body">
<div class="ontw">
<!-- What I want is here -->
</div>
</div>
</div>
With JSoup, you download the page using Document doc = JSoup.connect("http://www.example.com").get(), and then you parse the contents of that page using doc.select("Your CSS selector string here.");. It's really that simple.
I tried:
doc.select("ms5 body ontw");
But that doesn't work. Judging by the HTML above, what should my CSS selector string be? Thanks in advance!

Classes are selected with a dot, so you have to select .md5 .body .ontw
doc.select(".md5 .body .ontw");

Element masthead = doc.select("div.ontw").first();// div with class=ontw
You can refer the JSoup Documentation:
http://jsoup.org/cookbook/extracting-data/selector-syntax

doc.select("div.ontw");
Would be what I would expect

.ms5 .body .ontw
is what you want. Here is a demo: http://try.jsoup.org/~jAMCqcMjLMSA5FYJV7Cn3Aah4AE

Related

How to identify a locator in below Html code and how to write a xpath for this in selenium?

How to identify a locator in below Html and how to write a xpath for
this in selenium
<div>
<div class="slds-icon-waffle" data-aura-rendered-by="231:0;p">
<div class="slds-r1" data-aura-rendered-by="232:0;p"></div>
<div class="slds-r2" data-aura-rendered-by="233:0;p"></div>
<div class="slds-r3" data-aura-rendered-by="234:0;p"></div>
<div class="slds-r4" data-aura-rendered-by="235:0;p"></div>
<div class="slds-r5" data-aura-rendered-by="236:0;p"></div>
<div class="slds-r6" data-aura-rendered-by="237:0;p"></div>
<div class="slds-r7" data-aura-rendered-by="238:0;p"></div>
<div class="slds-r8" data-aura-rendered-by="239:0;p"></div>
<div class="slds-r9" data-aura-rendered-by="240:0;p"></div>
</div>
You can generate xpath as below using div class :
//div[contains(#class,'slds-icon-waffle')]
Hope it will help you.
Here class names are looking unique. you can use the following xpath to identify the first elemen.
//div[#class="slds-icon-waffle"]
for the second element
//div[#class="slds-r1"] and so on.
If you want to locate all elements with single xpath, then use the following.
//div[starts-with(#class,"slds")]
Writing xpath is a pretty basic thing in selenium. I prefer you to learn how to write xpath first. This video might give you some idea.
http://learn-automation.com/how-to-write-dynamic-xpath-in-selenium/
Once you got some idea, try writing the xpath by yourself. If you got any issues, feel free to comment. Thanks.

How to display java String with html tag appended, with the html behavior in angualrjs front end

I have a string in java,I need to append html tag to it dynamically so that when displayed in the frond it,the html tags behavior is felt.
Eg:
String content="Hello World,this is a test <em>content</em> to demonstrate the requirement";
In the above string content is wrapped inside the <em> tag.But when I am trying to display it in angularjs front end, the string is not taking the tag behavior and displayed as "Hello World,this is a test <em>content</em> to demonstrate the requirement".
use angular-sanitize.js for the same -
example
<div ng-controller="testCtrl">
<div ng-bind-html="stringTest"></div>
</div>
you can use ng-bind-html
<div ng-controller="testCtrl">
<div ng-bind-html="stringTest"></div>
</div>
However, if you find this directive too restrictive and when you absolutely trust the source of the content you are binding to, then you can also use ng-bind-html-unsafe.
<div ng-controller="testCtrl">
<div ng-bind-html-unsafe="stringTest"></div>
</div>

differentiate two html elements with same class

I have this html code below and I want to differentiate between these two PagePostsSectionPagelet as I only want to find web elements from the first PagePostsSectionPagelet. Is there any way I can do it without using <div id="PagePostsSectionPagelet-183102686112-0" as the value will not always be the same?
<div id="PagePostsSectionPagelet-183102686112-0" data-referrer="PagePostsSectionPagelet-183102686112-0">
<div class="_1k4h _5ay5">
<div class="_5sem">
</div>
</div>
<div id="PagePostsSectionPagelet-183102686112-1" class="" data-referrer="PagePostsSectionPagelet-183102686112-1" style="">
<div class="_1k4h _5ay5">
<div class="_5dro _5drq">
<div class="clearfix">
<span class="_5em9 lfloat _ohe _50f4 _50f7">Earlier in 2015</span>
<div id="u_jsonp_3_4e" class="_6a uiPopover rfloat _ohf">
</div>
</div>
<div id="u_jsonp_3_4j" class="_5sem">
<div id="u_jsonp_3_4g" class="_5t6j">
<div class="_1k4h _5ay5">
<div class="_5sem">
</div>
</div>
Tried using //div[#class='_1k4h _5ay5']//div[#class ='_5sem'] but it will return both.
Using //div[#class='_5dro _5drq']//span[contains(#class,'_5em9 lfloat _ohe _50f4 _50f7') and contains(text(), '')] will help me find the second PagePostsSectionPagelet instead.
you need to use the following xpath:
//div[contains(#class,'_1k4h') and contains(#class,'_5ay5')]
as selenium doesn't work properly with search of several classes in one attribute.
I mean By.Class("_1k4h _5ay5") will found nothing in any case and By.Xpath("//div[#class='_1k4h _5ay5']") can also found nothing in case of class will be "_5ay5 _1k4h" or " _5ay5 _1k4h".(as they possibly generated automatically, its may be have different position on page reload)
But for the best result by performance and by correctness I think will be the following xpath:
".//div[contains(#id, 'PagePostsSectionPagelet')][1]" -- for first div
".//div[contains(#id, 'PagePostsSectionPagelet')][2]" -- for second div
I see that dynamic in the div id is only the number so you can use something like:
WebElement element = driver.FindElements(By.XPath("//div[contains(.,'PagePostsSectionPagelet')])")[1];
This will take only the first web element.
Try using a css selector as below and refine further if required.
The code below returns a List of matching WebElements and then you grab the first one in the List.
List<WebElement> listOfElements = driver.findElements(By.cssSelector("div[data-referrer]"));
WebElement myElement = listOfElements.get(0);
Hint: use the Chrome console to test your css and xpath selectors directly. e.g. use
$$("div[data-referrer]") in the console to reveal what will get selected.

Selenium CSS selector syntax for checking class and text both

Question is for JAVA + Selenium:
My HTML is:
<section class="d-menu d-outclass-bootstrap unclickable d-apps d-app-list">
<section class="standard-component image-sequence-button" tabindex="0" role="link">
<div class="image-region">
<div class="core-component image">...
</div>
<div class="sequence-region">
<div class="core-component section">
<div>
<section class="standard-component text hide-section-separator-line">
<div class="text-region">
<div class="core-component text">
<span class="main-text">BART Times</span>
<span class="sub-text">Provider</span>
</div>
</div>
</section>
<section class="standard-component speech-bubble hide-section-separator-line">...
<section class="standard-component text">...
</div>
</div>
</div>
<div class="button-region">
<div class="core-component button" tabindex="0" role="link">...
</div>
</section>
<section class="standard-component image-sequence-button" tabindex="0" role="link">...
<section class="standard-component image-sequence-button" tabindex="0" role="link">...
<section class="standard-component image-sequence-button" tabindex="0" role="link">...</section>
EDIT:
All <section class="standard-component image-sequence-button"... have exact same structure and hierarchy (same attributes for all tags). The only thing that changes are the TEXT values of the tags(e.g. span)
PART1:
I'm looking for various elements inside the second section tag. So, What I'm trying to do is get the <span class="main-text"> which has a value BART Times because of the business requirement.
I already know how to get it via xpath:
My xpath (verified via firebug):
"//section//div[#class = 'sequence-region']//section[#class = 'standard-component text hide-section-separator-line']//span[#class = 'main-text' and text() = '%s']"
I can get the span tag via checking for %s values (e.g. BART Times).
However, due to design considerations, we've been told to use CSS only. So, I tried to come up with a CSS counterpart for the above xpath but did not find it.
The following CSS
"section div.sequence-region section.standard-component.text.hide-section-separator-line span[class=main-text]"
returns all the span tags under all the section tags.
Question1: How do I get the span tag which has a certain TEXT value (the %s part of xpath)?
Things I've tried for that last span tag which did not worked(according to the firebug):
span.main-text[text='BART Times']
span[class=main-text][text='BART Times']
span.main-text:contains('BART Times')
span[class=main-text]:contains('BART Times')
span.main-text[text="BART Times"]
span[class=main-text][text="BART Times"]
span.main-text[text=\"BART Times\"]
span[class=main-text][text=\"BART Times\"]
span[text="BART Times"]
span[text=\"BART Times\"]
span:contains('BART Times')
span:contains("BART Times")
span:contains(\"BART Times\")
So, basically I want to put a check on BOTH class and TEXT value of the span tag in CSS selector.
Part 2:
Then I want to get the <section class="standard-component image-sequence-button"... element where I found the <span class="main-text"> and then find other elements inside that specific section tag
Question 2:
Assuming, I found the span tag in question 1 via CSS, how do I get the section tag (which is a super--- parent of the span tag)?
If CSS is not possible, please provide an xpath counterpart for this as a workaround for a while.
CSS selectors can't select based on text. The answers to Is there a CSS selector for elements containing certain text? go into detail on why.
To select based on class and text in xpath: //span[contains(#class, 'main-text') and text() = 'BART Times']
Regarding question 1, it is not possible, as stated in the other answer here. This is another thread about the topic : CSS selector based on element text?
Regarding question 2, once again there is no such parent selector in XPath : Is there a CSS parent selector?. Now for the xpath counterpart, you can use parent axis (parent::*) or shortcut notation for the same (..), or put the span selector as predicate for the parent (the third example below) :
....//span[#class = 'main-text' and text() = '%s']/parent::*
....//span[#class = 'main-text' and text() = '%s']/..
....//*[span[#class = 'main-text' and text() = '%s']]
See the following thread for some better (yet more complicated) alternative to match element by CSS class using XPath, just in case you haven't came across link on this topic : How can I find an element by CSS class with XPath?

Get link from anchor tag using Java

I'm trying to get value of a href attribute from an anchor tag (a tag) using Java, without a third party API. I know the class of the label. The website looks like this:
<html>
<body>
<div id="uix_wrapper">
<div id="button">
<label class="downloadButton">
Button
</label>
</div>
</div>
</body>
</html>
You can use a regular expression if you try to avoid using any third-party libraries. A very basic expression for your example is
<a href="(.*?)">
and should work. You can try out yourself using https://www.debuggex.com/
The result will be in the second group.

Categories