i am trying to display two "text text-pass" from html in chrome browser to my print console, apparently, it did not work, any advise please?
my browser html code
<a href="/abc/123" class="active">
<div class="sidebar-text">
<span class="text text-pass"> </span> </a>
<a href="/abc/1234" class="active">
<div class="sidebar-text">
<span class="text text-pass"> </span> </a>
My code
String 123= driver.findElement(By.xpath("//*[#id="js-app"]/div/div/div[2]/div[1]/div/div/ul/li[5]/a")).getText();
System.out.println(123);
String 1234= driver.findElement(By.xpath("//*[#id="js-app"]/div/div/div[2]/div[1]/div/div/ul/li[5]/a")).getText();
System.out.println(1234);
You can use .findElements to get multiple elements with the same pattern, it will return a list collection.
UPDATE
Refers to your comment, you need put the string into a list again and check with the Collection.contains() method:
List<String> results = new ArrayList<>();
List<WebElement> elements = driver.findElements(By.xpath("//div[#class='sidebar-text']//span"));
for(WebElement element: elements) {
String attr = element.getAttribute("class");
results.add(attr);
System.out.println(attr);
}
if(results.contains("text text-fail")) {
System.out.println("this is list contains 'text text-fail'");
}
Try this Code :
String pass = driver.findElement(By.xpath("//*[#class='sidebar-text']/span")).getAttribute("class");
System.out.println(pass);
Related
I use Jsoup get all data from website and save element if match some content when i get. I want when we get element. If it match some thing character , I save element from database(MYSQL,Postgress...). I code look like :
Connection conn = Jsoup.connect("https://viblo.asia");
Document doc = conn.userAgent("Mozilla").get();
Elements elements = doc.getElementsByClass("post-feed").get(0).children();
Elements list = new Elements();
Elements strings = new Elements();
for (Element element : elements) {
if (element.hasClass("post-feed-item")) {
list.add(element);
Element e = element.children().get(1).children().get(1).children().get(0);
if (e.text().matches("^.*?(Docker|docker|DOCKER).*$")) {
strings.add(e);
//save to element to DB
}
}
}
for (Element page : elements) {
if (links.add(URL)) {
//Remove the comment from the line below if you want to see it running on your editor
System.out.println(URL);
}
getPageLinks(page.attr("abs:href"));
}
I want if title from element contain : "Docker" it save my element to Database. But in element, It contain div and some thing link url, img , content. How to i save it to database. What if I want to save each element in a field in a database that is feasible? If not I can convert element to html and save it? Please help.
Example html i want save data base:
<div class="post-feed-item">
<img src="https://images.viblo.asia/avatar/1d0e5458-ad41-4d1c-89db-292dc198b4fa.png" srcset="https://images.viblo.asia/avatar/1d0e5458-ad41-4d1c-89db-292dc198b4fa.png 1x, https://images.viblo.asia/avatar-retina/1d0e5458-ad41-4d1c-89db-292dc198b4fa.png 2x" class="avatar avatar--md mr-05">
<div class="post-feed-item__info">
<div class="post-meta--inline">
<div class="user--inline d-inline-flex">
<!---->
Hoàn Kì
<!---->
</div>
<div class="post-meta d-inline-flex align-items-center flex-wrap">
<div class="text-muted mr-05">
<span class="mr-05">about 3 hours ago</span>
<button title="Copy URL" class="icon-btn _13z_mK0hRyRB3dPzawysKe_0"><i aria-hidden="true" class="fa fa-link"></i></button>
</div>
<!---->
<!---->
</div>
</div>
<div class="post-title--inline">
<h3 class="word-break mr-05">Docker: Chưa biết gì đến biết dùng (Phần 3 docker-compose )</h3>
<div class="tags" data-v-cbe11868>
<a href="/tags/docker" class="el-tag _3wKNDsArij9ZFjXe8k4ryR_0 el-tag--info el-tag--mini" data-v-cbe11868>Docker</a>
</div>
</div>
<!---->
<div class="d-flex justify-content-between">
<div class="d-flex">
<div class="stats">
<span title="Views" class="stats-item text-muted"><i aria-hidden="true" class="stats-item__icon fa fa-eye"></i> 62 </span>
<span title="Clips" class="stats-item text-muted"><i aria-hidden="true" class="stats-item__icon fa fa-paperclip"></i> 1 </span>
<span title="Comments" class="stats-item text-muted"><i aria-hidden="true" class="stats-item__icon fa fa-comments"></i> 0 </span>
</div>
<!---->
</div>
<div title="Score" class="points">
<div class="carets">
<i aria-hidden="true" class="fa fa-caret-up"></i>
<i aria-hidden="true" class="fa fa-caret-down"></i>
</div>
<span class="text-muted">4</span>
</div>
</div>
</div>
</div>
First, modify your logic for fetching post-feed-item like this-
Connection conn = Jsoup.connect("https://viblo.asia");
Document doc = conn.userAgent("Mozilla").get();
Elements elements = doc.getElementsByClass("post-feed-item"); //This will get the whole element.
for (Element element : elements) {
String postFeeds = "";
if (element.toString().contains("docker")) {
postFeeds = postFeeds.concat(element.toString());
//save postFeeds to DB
}
}
Extra
/**
* Your parsed element may contain single quote (').
* This will cause error while persisting.
* to avoid this you need to escape single quote (')
* with double single quote ('')
*/
if (element.toString().contains("docker")) {
postFeeds = postFeeds.concat(element.toString().replaceAll("'", "''"));
//save postFeeds to DB
}
Second, What if I want to save each element in a field in a database that is feasible?
You don't need separate columns to store each element at the database. However you can save but the feasibility depends on your use case. If you just want to store the post-feed-items only for writing it back to your web page then it is not feasible.
Third, How can I convert element to html and save?
You don't need to convert the element to html but you need to convert the element to String if you want to save it the database.
All you need is a column type of BLOB data type (you can also save it as VARCHAR but BLOB is safer).
Update
How can I traverse all pages?
By looking at the source code of that page I found this is how you can get the total page number -
Elements pagination = doc.getElementsByAttributeValueMatching("href", "page=\\d");
int totalPageNo = Integer.parseInt(pagination.get(pagination.size() - 2).text());
then loop through each page.
for(int page = 1; page <= totalPageNo; page++) {
Connection conn = Jsoup.connect("https://viblo.asia/?page=" + page);
//rest of your code
}
I properly know what's your mean.Here are some views:First you should clearify what`s your search for and make fields of tables in database. Such as according your ideas, you can make a table_docker table in db and there are field_id,field_content,field_start_time,field_links and so on in it. Second you should code some utils of classes such as JsoupUtils which is get HTML and parse it , HtmlUtils which is used to handle the html remarks and download these pictures,DBUtils which is used to connect db and save data,POIUtils which is used to show your data,DataUtils which is used to handle your data by your ways.
I am using the code below
WebElement inputele = driver.findElement(By.className("class_name"));
String inputeleval = inputele.getAttribute("value");
System.out.println(inputeleval);
but the value is empty. The HTML is below.
<div id="main">
<div id="hiddenresult">
<div class="tech-blog-list">
<label for="Question">1st Question</label>
<input id="txt60" class="form-control" type="text" value="sddf sd sdfsdf sdf sdfsdf sdfsdfsd fsd" />
</div>
</div>
<div class="pagination_main pull-left">
<div id="Pagination">
<div class="pagination">
<a class="previous" onclick="PreviousBtnClickEvent();" href="javascript:void(0)">Previous</a>
<a id="pg59" class="ep" onclick="PaginationBtnClickEvent(this);" href="javascript:void(0)" name="Textbox">1</a>
<a id="pg41" class="ep" onclick="PaginationBtnClickEvent(this);" href="javascript:void(0)" name="Textbox">2</a>
<a id="pg40" class="ep" onclick="PaginationBtnClickEvent(this);" href="javascript:void(0)" name="Textarea">3</a>
<a id="pg60" class="ep current" onclick="PaginationBtnClickEvent(this);" href="javascript:void(0)" name="Textbox">4</a>
</div>
</div>
</div>
</div>
Try using WebDriverWait to wait until element fully loaded on page and visible as below :-
WebDriverWait wait = new WebDriverWait(driver, 10);
WebElement inputele= wait.until(ExpectedConditions.visibilityOfElementLocated(By.className("class_name")));
String inputeleval = inputele.getAttribute("value");
System.out.println(inputeleval);
Note :-By.className("class_name") will give that element which class attribute equal to class_name. Make sure which element you want to locate is unique element with class attribute equal to class_name otherwise wise it will give first element with condition true.
Hope it will work..:)
Looks like your code is pretty close but you have the wrong class name? In your code above, you had "class_name" instead of "form-control". I'm assuming that was some sample code and not the actual code you are using? There is only one INPUT in the HTML and the code below should work. It also has an ID so that should be more specific in case there are more than one INPUTs on the page.
WebElement inputele= driver.findElement(By.className("form-control"));
String inputeleval = inputele.getAttribute("value");
System.out.println(inputeleval);
I have HTML code like :
<div class="ex1">
<div class="ex2">
<span>test1</span>
<span class="ex3">test2</span>
</div>
<div class="ex2">
<span>test3</span>
<span class="ex3">test2</span>
</div>
</div>
I'm using Selenium Webdriver.
And I need to create Java code which could:
If <span>test3 then select a <span class="ex3"> which located inside the same div class="ex2"
But since I have div's and spans with the same className inside one main I can't differ this spans..
Could you help me please with this issue?
So,something like this:
If <span>test3 then <span class=ex3>test2.
or
If <span>test1 then <span class=ex3>test2.
Thanks
1- Use this xpath to get to <span>test1 then <span class=ex3>test2:
//span[.='test1']/following-sibling::span[.='test2']
And, use in code like this:
WebElement ele = driver.findElement(By.xpath("//span[.='test1']/following-sibling::span[.='test2']"));
2-And, Use this xpath to get to <span>test3 then <span class=ex3>test2:
//span[.='test3']/following-sibling::span[.='test2']
Use in code like this:
WebElement ele = driver.findElement(By.xpath("//span[.='test3']/following-sibling::span[.='test2']"));
I have implemented the If statement as per your question in practice you can modify it as per your requirement.
WebElement test1_class1 =wait.until(ExpectedConditions.elementToBeClickable(By.xpath("//div[#class='ex1']//div[1]//span[1]")));
String test1= test1_class1.getText();
WebElement test2_class1 =wait.until(ExpectedConditions.elementToBeClickable(By.xpath("//div[#class='ex1']//div[1]//span[2]")));
String test2_1 =test2_class1.getText();
WebElement test3_class2 =wait.until(ExpectedConditions.elementToBeClickable(By.xpath("//div[#class='ex1']//div[2]//span[1]")));
String test3 =test3_class2.getText();
WebElement test2_class2 =wait.until(ExpectedConditions.elementToBeClickable(By.xpath("//div[#class='ex1']//div[2]//span[2]")));
String test2_2 =test3_class2.getText();
try
{
if(test1.equals("test1"))
{
System.out.println(test2_1);
}
if(test3.equals("test3"))
else
{
System.out.println(test2_2);
}
}
catch(Throwable e)
{
System.out.println("Exception in program"+e);
}
I want to parse the data out of this HTML (CompanyName, Location, jobDescription,...) using JSoup (java). I get stuck when trying to iterate the joblistings
The extract from the HTML is one of many "JOBLISTING" divs which I want to iterate and extract the Data out of it. I just can't handle how to iterate the specific div objects. Sorry for this noob question, but maybe someone can help me who already knows which function to use. Select?
<div class="between_listings"><!-- local.spacer --></div>
<div id="joblisting-2944914" class="joblisting listing-even listing-even company-98028 " itemscope itemtype="http://schema.org/JobPosting">
<div class="company_logo" itemprop="hiringOrganization" itemscope itemtype="http://schema.org/Organization">
<a href="/stellenangebote-des-unternehmens--Delivery-Hero-Holding-GmbH--98028.html" title="Jobs Delivery Hero Holding GmbH" itemprop="url">
<img src="/upload_de/logo/D/logoDelivery-Hero-Holding-GmbH-98028DE.gif" alt="Logo Delivery Hero Holding GmbH" itemprop="image" width="160" height="80" />
</a>
</div>
<div class="job_info">
<div class="h3 job_title">
<a id="jobtitle-2944914" href="/stellenangebote--Junior-Business-Intelligence-Analyst-CRM-m-f-Berlin-Delivery-Hero-Holding-GmbH--2944914-inline.html?ssaPOP=204&ssaPOR=203" title="Arbeiten bei Delivery Hero Holding GmbH" itemprop="url">
<span itemprop="title">Junior Business Intelligence Analyst / CRM (m/f)</span>
</a>
</div>
<div class="h3 company_name" itemprop="hiringOrganization" itemscope itemtype="http://schema.org/Organization">
<span itemprop="name">Delivery Hero Holding GmbH</span>
</div>
</div>
<div class="job_location_date">
<div class="job_location target-location">
<div class="job_location_info" itemprop="jobLocation" itemscope itemtype="http://schema.org/Place">
<div class="h3 locality" itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">
<span itemprop="addressLocality"> Berlin</span>
</div>
<span class="location_actions">
<a href="javaScript:PopUp('http://www.stepstone.de/5/standort.html?OfferId=2944914&ssaPOP=203&ssaPOR=203','resultList',800,520,1)" class="action_showlistingonmap showlabel" title="Google Maps" itemprop="maps">
<span class="location-icon"><!-- --></span>
<span class="location-label">Google Maps</span>
</a>
</span>
</div>
</div>
<div class="job_date_added" itemprop="datePosted"><time datetime="2014-07-04">04.07.14</time></div>
</div>
<div class="job_actions">
</div>
</div>
<div class="between_listings"><!-- local.spacer --></div>
File input = new File("C:/Talend/workspace/WEBCRAWLER/output/keywords_SOA.txt"); // Load file into extraction1 Document ParseResult = Jsoup.parse(input, "UTF-8", "http://example.com/"); Elements jobListingElements = ParseResult.select(".joblisting"); for (Element jobListingElement: jobListingElements) { jobListingElement.select(".companyName span[itemprop=\"name\"]"); // other element properties System.out.println(jobListingElements);
Java code:
File input = new File("C:/Talend/workspace/WEBCRAWLER/output/keywords_SOA.txt");
// Load file into extraction1
Document ParseResult = Jsoup.parse(input, "UTF-8", "http://example.com/");
Elements jobListingElements = ParseResult.select(".joblisting");
for (Element jobListingElement: jobListingElements) {
jobListingElement.select(".companyName span[itemprop=\"name\"]");
// other element properties
System.out.println(jobListingElements);
}
Thank you!
So you got your Jsoup document right? Than it seems pretty easy if the css class joblisting does not appear anywhere else.
Document document = Jsoup.parse(new File("d:/bla.html"), "utf-8");
Elements elements = document.select(".joblisting");
for (Element element : elements) {
Elements jobTitleElement = element.select(".job_title span");
Elements companyNameElement = element.select(".company_name spanspan[itemprop=name]");
String companyName = companyNameElement.text();
String jobTitle = jobTitleElement.text();
System.out.println(companyName);
System.out.println(jobTitle);
}
I don't know why the attribute [itemprop*=\"name\"] selector does not find the span (Further reading: http://jsoup.org/cookbook/extracting-data/selector-syntax )
Got it: span[itemprop=name] without any quotes or escapes. Other attributes or values also should work to get a more specific selection.
I'm relatively new to using jsoup, and I can't seem to find the correct query to parse out the value I'm looking for. The HTML is as follows.
<img src='http://rootzwiki.com/public/style_images/ginger/t_unread.png' alt='New Replies' /><br />
</a>
</td>
<td class='col_f_content '>
<h4><a id="tid-link-12251" href="http://rootzwiki.com/topic/12251-romlte-rootzboat-403-v61/" title='View topic, started 17 December 2011 - 09:32 AM' class='topic_title'>[ROM][LTE] RootzBoat 4.0.3 V6.1</a></h4>
<br />
<span class='desc lighter blend_links'>
Started by <a hovercard-ref="member" hovercard-id="5" class="_hovertrigger url fn " href='http://rootzwiki.com/user/5-birdman/'>birdman</a>, 17 Dec 2011
</span>
<ul class='mini_pagination'>
<li><a href="http://rootzwiki.com/topic/12251-romlte-rootzboat-403-v61/" title='Go to page 1'>1</a></li>
<li><a href="http://rootzwiki.com/topic/12251-romlte-rootzboat-403-v61/page__st__10" title='Go to page 2'>2</a></li>
<li><a href="http://rootzwiki.com/topic/12251-romlte-rootzboat-403-v61/page__st__20" title='Go to page 3'>3</a></li>
<li><a href="http://rootzwiki.com/topic/12251-romlte-rootzboat-403-v61/page__st__1990" title='Go to page 200'>200 →</a></li>
</ul>
</td>
<td class='col_f_preview __topic_preview'>
<a href='http://rootzwiki.com/topic/12251-romlte-rootzboat-403-v61/' class='expander closed' title='Preview this topic'> </a>
</td>
<td class='col_f_views desc blend_links'>
<ul>
<li>
<span class='ipsBadge ipsBadge_orange'>Hot</span>
1,999 replies
</li>
<li class='views desc'>180,213 views</li>
</ul>
</td>
<td class='col_f_post'>
<a href='http://rootzwiki.com/user/49940-jakeday/' class='ipsUserPhotoLink left'>
<img src='http://rootzwiki.com/uploads/profile/photo-thumb-49940.jpg' class='ipsUserPhoto ipsUserPhoto_mini' />
</a>
<ul class='last_post ipsType_small'>
<li><a hovercard-ref="member" hovercard-id="49940" class="_hovertrigger url fn " href='http://rootzwiki.com/user/49940-jakeday/'>jakeday</a></li>
<li>
<a href='http://rootzwiki.com/topic/12251-romlte-rootzboat-403-v61/page__view__getlastpost' title='Go to last post'>Today, 04:20 AM</a>
</li>
</ul>
</td>
I need to parse out birdman from there. I know that once I've defined the element, I can get "birdman" out with author.text();, but I cant figure out how to define the author element. I thought perhaps the following block of code would work, but as I mentioned, I'm pretty new to jsoup and html and it obviously didnt work. Theres nothing wrong with the connection, and jsoup is working for the other values I parsed out.
TitleResults titleArray = new TitleResults();
Document doc = null;
try {
doc = Jsoup.connect(Constants.FORUM).get();
} catch (IOException e) {
e.printStackTrace();
}
Elements threads = doc.select(".topic_title");
for (Element thread : threads) {
titleArray = new TitleResults();
//Thread title
threadTitle = thread.text();
titleArray.setItemName(threadTitle);
//Thread link
String threadStr = thread.attr("abs:href");
String endTag = "/page__view__getnewpost"; //trim link
threadStr = new String(threadStr.replace(endTag, ""));
threadArray.add(threadStr);
titleArray.setAuthorDate("Author/Date");
results.add(titleArray);
}
Elements authors = doc.select("a[hovercard-ref]");
for (Element author : authors) {
if (author.attr("abs:href").contains("/user/")){
Log.d("POC", "SUCCESS " + author.attr("abs:href"));
} else {
Log.d("POC", "FAILURE " + author.text());
}
}
}
I think you're thinking too hard ;)
To get the birdman portion of the link, just use the following:
Elements authors = doc.select("a");
for (Element author : authors) {
Log.d("POC", author.text());
}
The "a" retrieves all links. After that you can just use the .text() like you said to retrieve the value.
Selvin answered it in the comments. I wasnt getting the source correctly and it was causing errors.
http://pastebin.com/xfUQkGw0