I'm using a Java lib (JSoup) to fetch content from a website that my program can ingest and then process. Specifcally, the content I'm looking for is inside the ontw div below:
<div class="ms5">
<div class="header">
<!-- ... -->
</div>
<div class="body">
<div class="ontw">
<!-- What I want is here -->
</div>
</div>
</div>
With JSoup, you download the page using Document doc = JSoup.connect("http://www.example.com").get(), and then you parse the contents of that page using doc.select("Your CSS selector string here.");. It's really that simple.
I tried:
doc.select("ms5 body ontw");
But that doesn't work. Judging by the HTML above, what should my CSS selector string be? Thanks in advance!
Classes are selected with a dot, so you have to select .md5 .body .ontw
doc.select(".md5 .body .ontw");
Element masthead = doc.select("div.ontw").first();// div with class=ontw
You can refer the JSoup Documentation:
http://jsoup.org/cookbook/extracting-data/selector-syntax
doc.select("div.ontw");
Would be what I would expect
.ms5 .body .ontw
is what you want. Here is a demo: http://try.jsoup.org/~jAMCqcMjLMSA5FYJV7Cn3Aah4AE
Related
How to identify a locator in below Html and how to write a xpath for
this in selenium
<div>
<div class="slds-icon-waffle" data-aura-rendered-by="231:0;p">
<div class="slds-r1" data-aura-rendered-by="232:0;p"></div>
<div class="slds-r2" data-aura-rendered-by="233:0;p"></div>
<div class="slds-r3" data-aura-rendered-by="234:0;p"></div>
<div class="slds-r4" data-aura-rendered-by="235:0;p"></div>
<div class="slds-r5" data-aura-rendered-by="236:0;p"></div>
<div class="slds-r6" data-aura-rendered-by="237:0;p"></div>
<div class="slds-r7" data-aura-rendered-by="238:0;p"></div>
<div class="slds-r8" data-aura-rendered-by="239:0;p"></div>
<div class="slds-r9" data-aura-rendered-by="240:0;p"></div>
</div>
You can generate xpath as below using div class :
//div[contains(#class,'slds-icon-waffle')]
Hope it will help you.
Here class names are looking unique. you can use the following xpath to identify the first elemen.
//div[#class="slds-icon-waffle"]
for the second element
//div[#class="slds-r1"] and so on.
If you want to locate all elements with single xpath, then use the following.
//div[starts-with(#class,"slds")]
Writing xpath is a pretty basic thing in selenium. I prefer you to learn how to write xpath first. This video might give you some idea.
http://learn-automation.com/how-to-write-dynamic-xpath-in-selenium/
Once you got some idea, try writing the xpath by yourself. If you got any issues, feel free to comment. Thanks.
I have a string in java,I need to append html tag to it dynamically so that when displayed in the frond it,the html tags behavior is felt.
Eg:
String content="Hello World,this is a test <em>content</em> to demonstrate the requirement";
In the above string content is wrapped inside the <em> tag.But when I am trying to display it in angularjs front end, the string is not taking the tag behavior and displayed as "Hello World,this is a test <em>content</em> to demonstrate the requirement".
use angular-sanitize.js for the same -
example
<div ng-controller="testCtrl">
<div ng-bind-html="stringTest"></div>
</div>
you can use ng-bind-html
<div ng-controller="testCtrl">
<div ng-bind-html="stringTest"></div>
</div>
However, if you find this directive too restrictive and when you absolutely trust the source of the content you are binding to, then you can also use ng-bind-html-unsafe.
<div ng-controller="testCtrl">
<div ng-bind-html-unsafe="stringTest"></div>
</div>
I have this html code below and I want to differentiate between these two PagePostsSectionPagelet as I only want to find web elements from the first PagePostsSectionPagelet. Is there any way I can do it without using <div id="PagePostsSectionPagelet-183102686112-0" as the value will not always be the same?
<div id="PagePostsSectionPagelet-183102686112-0" data-referrer="PagePostsSectionPagelet-183102686112-0">
<div class="_1k4h _5ay5">
<div class="_5sem">
</div>
</div>
<div id="PagePostsSectionPagelet-183102686112-1" class="" data-referrer="PagePostsSectionPagelet-183102686112-1" style="">
<div class="_1k4h _5ay5">
<div class="_5dro _5drq">
<div class="clearfix">
<span class="_5em9 lfloat _ohe _50f4 _50f7">Earlier in 2015</span>
<div id="u_jsonp_3_4e" class="_6a uiPopover rfloat _ohf">
</div>
</div>
<div id="u_jsonp_3_4j" class="_5sem">
<div id="u_jsonp_3_4g" class="_5t6j">
<div class="_1k4h _5ay5">
<div class="_5sem">
</div>
</div>
Tried using //div[#class='_1k4h _5ay5']//div[#class ='_5sem'] but it will return both.
Using //div[#class='_5dro _5drq']//span[contains(#class,'_5em9 lfloat _ohe _50f4 _50f7') and contains(text(), '')] will help me find the second PagePostsSectionPagelet instead.
you need to use the following xpath:
//div[contains(#class,'_1k4h') and contains(#class,'_5ay5')]
as selenium doesn't work properly with search of several classes in one attribute.
I mean By.Class("_1k4h _5ay5") will found nothing in any case and By.Xpath("//div[#class='_1k4h _5ay5']") can also found nothing in case of class will be "_5ay5 _1k4h" or " _5ay5 _1k4h".(as they possibly generated automatically, its may be have different position on page reload)
But for the best result by performance and by correctness I think will be the following xpath:
".//div[contains(#id, 'PagePostsSectionPagelet')][1]" -- for first div
".//div[contains(#id, 'PagePostsSectionPagelet')][2]" -- for second div
I see that dynamic in the div id is only the number so you can use something like:
WebElement element = driver.FindElements(By.XPath("//div[contains(.,'PagePostsSectionPagelet')])")[1];
This will take only the first web element.
Try using a css selector as below and refine further if required.
The code below returns a List of matching WebElements and then you grab the first one in the List.
List<WebElement> listOfElements = driver.findElements(By.cssSelector("div[data-referrer]"));
WebElement myElement = listOfElements.get(0);
Hint: use the Chrome console to test your css and xpath selectors directly. e.g. use
$$("div[data-referrer]") in the console to reveal what will get selected.
Question is for JAVA + Selenium:
My HTML is:
<section class="d-menu d-outclass-bootstrap unclickable d-apps d-app-list">
<section class="standard-component image-sequence-button" tabindex="0" role="link">
<div class="image-region">
<div class="core-component image">...
</div>
<div class="sequence-region">
<div class="core-component section">
<div>
<section class="standard-component text hide-section-separator-line">
<div class="text-region">
<div class="core-component text">
<span class="main-text">BART Times</span>
<span class="sub-text">Provider</span>
</div>
</div>
</section>
<section class="standard-component speech-bubble hide-section-separator-line">...
<section class="standard-component text">...
</div>
</div>
</div>
<div class="button-region">
<div class="core-component button" tabindex="0" role="link">...
</div>
</section>
<section class="standard-component image-sequence-button" tabindex="0" role="link">...
<section class="standard-component image-sequence-button" tabindex="0" role="link">...
<section class="standard-component image-sequence-button" tabindex="0" role="link">...</section>
EDIT:
All <section class="standard-component image-sequence-button"... have exact same structure and hierarchy (same attributes for all tags). The only thing that changes are the TEXT values of the tags(e.g. span)
PART1:
I'm looking for various elements inside the second section tag. So, What I'm trying to do is get the <span class="main-text"> which has a value BART Times because of the business requirement.
I already know how to get it via xpath:
My xpath (verified via firebug):
"//section//div[#class = 'sequence-region']//section[#class = 'standard-component text hide-section-separator-line']//span[#class = 'main-text' and text() = '%s']"
I can get the span tag via checking for %s values (e.g. BART Times).
However, due to design considerations, we've been told to use CSS only. So, I tried to come up with a CSS counterpart for the above xpath but did not find it.
The following CSS
"section div.sequence-region section.standard-component.text.hide-section-separator-line span[class=main-text]"
returns all the span tags under all the section tags.
Question1: How do I get the span tag which has a certain TEXT value (the %s part of xpath)?
Things I've tried for that last span tag which did not worked(according to the firebug):
span.main-text[text='BART Times']
span[class=main-text][text='BART Times']
span.main-text:contains('BART Times')
span[class=main-text]:contains('BART Times')
span.main-text[text="BART Times"]
span[class=main-text][text="BART Times"]
span.main-text[text=\"BART Times\"]
span[class=main-text][text=\"BART Times\"]
span[text="BART Times"]
span[text=\"BART Times\"]
span:contains('BART Times')
span:contains("BART Times")
span:contains(\"BART Times\")
So, basically I want to put a check on BOTH class and TEXT value of the span tag in CSS selector.
Part 2:
Then I want to get the <section class="standard-component image-sequence-button"... element where I found the <span class="main-text"> and then find other elements inside that specific section tag
Question 2:
Assuming, I found the span tag in question 1 via CSS, how do I get the section tag (which is a super--- parent of the span tag)?
If CSS is not possible, please provide an xpath counterpart for this as a workaround for a while.
CSS selectors can't select based on text. The answers to Is there a CSS selector for elements containing certain text? go into detail on why.
To select based on class and text in xpath: //span[contains(#class, 'main-text') and text() = 'BART Times']
Regarding question 1, it is not possible, as stated in the other answer here. This is another thread about the topic : CSS selector based on element text?
Regarding question 2, once again there is no such parent selector in XPath : Is there a CSS parent selector?. Now for the xpath counterpart, you can use parent axis (parent::*) or shortcut notation for the same (..), or put the span selector as predicate for the parent (the third example below) :
....//span[#class = 'main-text' and text() = '%s']/parent::*
....//span[#class = 'main-text' and text() = '%s']/..
....//*[span[#class = 'main-text' and text() = '%s']]
See the following thread for some better (yet more complicated) alternative to match element by CSS class using XPath, just in case you haven't came across link on this topic : How can I find an element by CSS class with XPath?
I'm trying to get value of a href attribute from an anchor tag (a tag) using Java, without a third party API. I know the class of the label. The website looks like this:
<html>
<body>
<div id="uix_wrapper">
<div id="button">
<label class="downloadButton">
Button
</label>
</div>
</div>
</body>
</html>
You can use a regular expression if you try to avoid using any third-party libraries. A very basic expression for your example is
<a href="(.*?)">
and should work. You can try out yourself using https://www.debuggex.com/
The result will be in the second group.