so I have a tool that scans a API for changes. If he found a change, he get a String like:
word=\don\u2019t\ item-id=\"1086\">\n <span class=\
I want to extract the Number from item-id , however there are multiple Numbers in the response.
Is there a possible way to do so? (I also dont know if the Number will 4 digits or just 1-2)
so the Regex should search for something like "NUMBERS\" and print it. (for Java)
Based on your comment it looks like you are receiving JSON structure
{
...
"data":{
"html":".. <a .. data-sku=\"XXX\"> ..",
...
}
...
}
and you are interested in value of data-sku attribute.
In that case parse that JSON and traverse it to get HTML structure. You can use org.json.JSONObject for that (or other parser, pick one you like)
String response = "{\"success\":1,\"data\":{\"html\":\"<div class=\\\"inner\\\">\\n <span class=\\\"title js-title-eligible\\\">Upgrade available<\\/span>\\n <span class=\\\"title js-title-warning\\\"><strong>WARNING :<\\/strong> You don\\u2019t own a <span class=\\\"js-from-ship\\\"><\\/span><\\/span>\\n <p class=\\\"explain js-title-eligible\\\">Buy this upgrade and it will be applicable to your <span class=\\\"js-from-ship\\\"><\\/span> from the My Hangar section.<\\/p>\\n <p class=\\\"explain js-title-warning\\\">You can buy this upgrade but it will only be applicable on a <span class=\\\"js-from-ship\\\"><\\/span>.<\\/p>\\n\\n <div class=\\\"price\\\"><strong class=\\\"final-price\\\">\\u20ac5<span class='super'>.41 <span class='currency'>EUR<\\/span><\\/span><\\/strong><div class=\\\"taxes js-taxes\\\">\\n <div class=\\\"taxes-details trans-02s\\\">\\n <div class=\\\"arrow\\\"><\\/div>\\n Tax Included: <br \\/>\\n <ul>\\n <li>VAT 19%<\\/li>\\n <\\/ul>\\n <\\/div>\\n<\\/div><\\/div>\\n\\n\\n <div>\\n <a href=\\\"\\/pledge\\/Upgrades\\/Mustang-Alpha-To-Aurora-LN-Upgrade\\\" class=\\\"add-to-cart holosmallbtn trans-03s js-add-to-cart-ship ty-js-add-to-cart\\\" data-sku=\\\"1086\\\">\\n <span class=\\\"holosmallbtn-top abs-overlay trans-02s\\\">BUY NOW<\\/span>\\n <span class=\\\"holosmallbtn-bottom abs-overlay trans-02s\\\"><\\/span>\\n <\\/a>\\n <a href=\\\"\\/pledge\\/Upgrades\\/Mustang-Alpha-To-Aurora-LN-Upgrade\\\" class=\\\"more-details\\\">View more details<\\/a>\\n <\\/div>\\n \\n <p class=\\\"explain info\\\">\\n Upgrades that you buy can be found in your <a href=\\\"\\/account\\/pledges\\\">Hangar section<\\/a>.<br \\/>\\n Click \\\"Apply Upgrade\\\" inside the Upgrade Pledge to pick where you want to apply it.\\n <\\/p>\\n <\\/div>\\n\\n\\n\\n\"},\"code\":\"OK\",\"msg\":\"OK\"}";
JSONObject jsonObject = new JSONObject(response);
String html = jsonObject.getJSONObject("data") //pick data:{...} object
.getString("html"); //from that object get value of html:"..."
Now that you have html you can parse it with HTML parser (I am using jsoup)
Document doc = Jsoup.parse(html);
String dataSku = doc.select("a[data-sku]") //get "a" element with "data-sku" attribute
.attr("data-sku"); //value of that attribute
Output: 1086.
String string = "{\"success\":1,\"data\":{\"html\":\"<div class=\\\"inner\\\">\\n <span class=\\\"title js-title-eligible\\\">Upgrade available<\\/span>\\n <span class=\\\"title js-title-warning\\\"><strong>WARNING :<\\/strong> You don\\u2019t own a <span class=\\\"js-from-ship\\\"><\\/span><\\/span>\\n <p class=\\\"explain js-title-eligible\\\">Buy this upgrade and it will be applicable to your <span class=\\\"js-from-ship\\\"><\\/span> from the My Hangar section.<\\/p>\\n <p class=\\\"explain js-title-warning\\\">You can buy this upgrade but it will only be applicable on a <span class=\\\"js-from-ship\\\"><\\/span>.<\\/p>\\n\\n <div class=\\\"price\\\"><strong class=\\\"final-price\\\">\\u20ac5<span class='super'>.41 <span class='currency'>EUR<\\/span><\\/span><\\/strong><div class=\\\"taxes js-taxes\\\">\\n <div class=\\\"taxes-details trans-02s\\\">\\n <div class=\\\"arrow\\\"><\\/div>\\n Tax Included: <br \\/>\\n <ul>\\n <li>VAT 19%<\\/li>\\n <\\/ul>\\n <\\/div>\\n<\\/div><\\/div>\\n\\n\\n <div>\\n <a href=\\\"\\/pledge\\/Upgrades\\/Mustang-Alpha-To-Aurora-LN-Upgrade\\\" class=\\\"add-to-cart holosmallbtn trans-03s js-add-to-cart-ship ty-js-add-to-cart\\\" data-sku=\"1086\\\">\\n <span class=\\\"holosmallbtn-top abs-overlay trans-02s\\\">BUY NOW<\\/span>\\n <span class=\\\"holosmallbtn-bottom abs-overlay trans-02s\\\"><\\/span>\\n <\\/a>\\n <a href=\\\"\\/pledge\\/Upgrades\\/Mustang-Alpha-To-Aurora-LN-Upgrade\\\" class=\\\"more-details\\\">View more details<\\/a>\\n <\\/div>\\n \\n <p class=\\\"explain info\\\">\\n Upgrades that you buy can be found in your <a href=\\\"\\/account\\/pledges\\\">Hangar section<\\/a>.<br \\/>\\n Click \\\"Apply Upgrade\\\" inside the Upgrade Pledge to pick where you want to apply it.\\n <\\/p>\\n <\\/div>\\n\\n\\n\\n\"},\"code\":\"OK\",\"msg\":\"OK\"}";
String pattern="(?<=data-sku=)([\\\\]*\")(\\d+)";
Pattern p = Pattern.compile(pattern);
Matcher matcher = p.matcher(string);
while (matcher.find()) {
System.out.print("Start index: " + matcher.start());
System.out.println(" End index: " + matcher.end() + " ");
System.out.println("number="+matcher.group(2));
}
Related
i am trying to display two "text text-pass" from html in chrome browser to my print console, apparently, it did not work, any advise please?
my browser html code
<a href="/abc/123" class="active">
<div class="sidebar-text">
<span class="text text-pass"> </span> </a>
<a href="/abc/1234" class="active">
<div class="sidebar-text">
<span class="text text-pass"> </span> </a>
My code
String 123= driver.findElement(By.xpath("//*[#id="js-app"]/div/div/div[2]/div[1]/div/div/ul/li[5]/a")).getText();
System.out.println(123);
String 1234= driver.findElement(By.xpath("//*[#id="js-app"]/div/div/div[2]/div[1]/div/div/ul/li[5]/a")).getText();
System.out.println(1234);
You can use .findElements to get multiple elements with the same pattern, it will return a list collection.
UPDATE
Refers to your comment, you need put the string into a list again and check with the Collection.contains() method:
List<String> results = new ArrayList<>();
List<WebElement> elements = driver.findElements(By.xpath("//div[#class='sidebar-text']//span"));
for(WebElement element: elements) {
String attr = element.getAttribute("class");
results.add(attr);
System.out.println(attr);
}
if(results.contains("text text-fail")) {
System.out.println("this is list contains 'text text-fail'");
}
Try this Code :
String pass = driver.findElement(By.xpath("//*[#class='sidebar-text']/span")).getAttribute("class");
System.out.println(pass);
I just started learning how to use JSoup. I think I've successfully selected this section of the html, and I successfully took "DARK SOULS III Deluxe Edition" out by doing .select("span.title").text but I was trying to get the prices, in this case $84.98 and $55.23. I tried doing .select("div.col search_price responsive_secondrow").text but it comes up as blank. I was wondering if someone could help me figure out how to extract that part, thanks in advance! Here's the html of the section of the page.
The full html is view-source:http://store.steampowered.com/search/?filter=topsellers
<a href="http://store.steampowered.com/sub/94174/?snr=1_7_7_topsellers_150_1" data-ds-packageid="94174" data-ds-appid="374320,442010"onmouseover="GameHover( this, event, 'global_hover', {"type":"sub","id":94174,"public":1,"v6":1} );" onmouseout="HideGameHover( this, event, 'global_hover' )" class="search_result_row ds_collapse_flag" >
<div class="col search_capsule"><img src="http://cdn.edgecast.steamstatic.com/steam/subs/94174/capsule_sm_120.jpg?t=1476893662"></div>
<div class="responsive_search_name_combined">
<div class="col search_name ellipsis">
<span class="title">DARK SOULS III Deluxe Edition</span>
<p>
<span class="platform_img win"></span> </p>
</div>
<div class="col search_released responsive_secondrow">12 Apr, 2016</div>
<div class="col search_reviewscore responsive_secondrow">
<span class="search_review_summary positive" data-store-tooltip="Very Positive<br>86% of the 29,204 user reviews for games in this bundle are positive.">
</span>
</div>
<div class="col search_price_discount_combined responsive_secondrow">
<div class="col search_discount responsive_secondrow">
<span>-35%</span>
</div>
<div class="col search_price discounted responsive_secondrow">
<span style="color: #888888;"><strike>$84.98</strike></span><br>$55.23 </div>
</div>
</div>
<div style="clear: left;"></div>
</a>
Use doc.select("a.search_result_row") instead:
public class JsoupSteamTest {
public static void main(String[] args) throws IOException {
Document doc = Jsoup.connect("http://store.steampowered.com/search/?filter=topsellers").userAgent("Mozilla")
.get();
Elements table = doc.select("a.search_result_row");
Iterator<Element> ite = table.iterator();
while (ite.hasNext()) {
Element element = ite.next();
System.out.println(element.text());
}
}
}
You will get a list like this:
PLAYERUNKNOWN'S BATTLEGROUNDS 23 Mar, 2017 29,99€
Steel Division: Normandy 44 Coming Soon 39,99€
DARK SOULS™ III 11 Apr, 2016 -50% 59,99€ 29,99€
Your particular problem comes from the div that has multiple classes.
To select an element that has multiple classes, use a dot instead of a space in your select:
doc.select("div.col.search_price.discounted.responsive_secondrow");
Take a look at this question: JSOUP get element with multiple classes
I am using the code below
WebElement inputele = driver.findElement(By.className("class_name"));
String inputeleval = inputele.getAttribute("value");
System.out.println(inputeleval);
but the value is empty. The HTML is below.
<div id="main">
<div id="hiddenresult">
<div class="tech-blog-list">
<label for="Question">1st Question</label>
<input id="txt60" class="form-control" type="text" value="sddf sd sdfsdf sdf sdfsdf sdfsdfsd fsd" />
</div>
</div>
<div class="pagination_main pull-left">
<div id="Pagination">
<div class="pagination">
<a class="previous" onclick="PreviousBtnClickEvent();" href="javascript:void(0)">Previous</a>
<a id="pg59" class="ep" onclick="PaginationBtnClickEvent(this);" href="javascript:void(0)" name="Textbox">1</a>
<a id="pg41" class="ep" onclick="PaginationBtnClickEvent(this);" href="javascript:void(0)" name="Textbox">2</a>
<a id="pg40" class="ep" onclick="PaginationBtnClickEvent(this);" href="javascript:void(0)" name="Textarea">3</a>
<a id="pg60" class="ep current" onclick="PaginationBtnClickEvent(this);" href="javascript:void(0)" name="Textbox">4</a>
</div>
</div>
</div>
</div>
Try using WebDriverWait to wait until element fully loaded on page and visible as below :-
WebDriverWait wait = new WebDriverWait(driver, 10);
WebElement inputele= wait.until(ExpectedConditions.visibilityOfElementLocated(By.className("class_name")));
String inputeleval = inputele.getAttribute("value");
System.out.println(inputeleval);
Note :-By.className("class_name") will give that element which class attribute equal to class_name. Make sure which element you want to locate is unique element with class attribute equal to class_name otherwise wise it will give first element with condition true.
Hope it will work..:)
Looks like your code is pretty close but you have the wrong class name? In your code above, you had "class_name" instead of "form-control". I'm assuming that was some sample code and not the actual code you are using? There is only one INPUT in the HTML and the code below should work. It also has an ID so that should be more specific in case there are more than one INPUTs on the page.
WebElement inputele= driver.findElement(By.className("form-control"));
String inputeleval = inputele.getAttribute("value");
System.out.println(inputeleval);
I want to extract the text inside the "job title" and the text inside "summary" class. There are many with the same class names. So I want the job title of the first one and summary of it. And then the job title of the next one and the summary of it. In that order.
The following code works. But it first gives all the titles and then all the text inside all the summary classes. I want the first job title and the first summary. Then the second job title and the second summary and so on. How do I modify the code for this? Please help.
<div class=" row result" id="p_64c5268586001bd2" data-jk="64c5268586001bd2" itemscope="" itemtype="http://schema.org/JobPosting" data-tn-component="organicJob">
<h2 id="jl_64c5268586001bd2" class="jobtitle">
<a rel="nofollow" href="/rc/clk?jk=64c5268586001bd2" target="_blank" onmousedown="return rclk(this,jobmap[0],0);" onclick="return rclk(this,jobmap[0],true,0);" itemprop="title" title="Fashion Assistant" class="turnstileLink" data-tn-element="jobTitle"><b>Fashion</b> Assistant</a>
</h2>
<span class="company" itemprop="hiringOrganization" itemtype="http://schema.org/Organization">
<span itemprop="name">
<a href="/cmp/Itv?from=SERP&campaignid=serp-linkcompanyname&fromjk=64c5268586001bd2&jcid=3bf3e8a57da58ff5" target="_blank">
ITV Jobs</a></span>
</span>
<a data-tn-element="reviewStars" data-tn-variant="cmplinktst2" class="turnstileLink " href="/cmp/Itv/reviews?jcid=3bf3e8a57da58ff5" title="Itv Jobs reviews" onmousedown="this.href = appendParamsOnce(this.href, '?campaignid=cmplinktst2&from=SERP&jt=Fashion+Assistant&fromjk=64c5268586001bd2');" target="_blank">
<span class="ratings"><span class="rating" style="width:49.5px;"><!-- -> </span></span><span class="slNoUnderline">28 reviews</span></a>
<span itemprop="jobLocation" itemscope="" itemtype="http://schema.org/Place"> <span class="location" itemprop="address" itemscope="" itemtype="http://schema.org/Postaladdress"><span itemprop="addressLocality">London</span></span></span>
<table cellpadding="0" cellspacing="0" border="0">
<tbody><tr>
<td class="snip">
<div>
<span class="summary" itemprop="description">
Do you have a passion for <b>Fashion</b>? You will be responsible for running our <b>fashion</b> cupboard, managing a team of interns and liaising with press officers to...</span>
</div>
doc = Jsoup.connect("http://www.indeed.co.uk/jobs?q=fashion&l=England").timeout(5000).get();
Elements f = doc.select(".jobtitle");
Elements e = doc.select(".summary");
System.out.println("Title: " + f.text());
System.out.println("Details: "+ e.text());
Iterate over titles and then find the summary for each title:
for (Element title : doc.select(".jobtitle")) {
Element summary = title.parent().select(".summary").first();
System.out.format("Title: %s. Summary: %s%n", title.text(), summary.text());
}
I want to parse the data out of this HTML (CompanyName, Location, jobDescription,...) using JSoup (java). I get stuck when trying to iterate the joblistings
The extract from the HTML is one of many "JOBLISTING" divs which I want to iterate and extract the Data out of it. I just can't handle how to iterate the specific div objects. Sorry for this noob question, but maybe someone can help me who already knows which function to use. Select?
<div class="between_listings"><!-- local.spacer --></div>
<div id="joblisting-2944914" class="joblisting listing-even listing-even company-98028 " itemscope itemtype="http://schema.org/JobPosting">
<div class="company_logo" itemprop="hiringOrganization" itemscope itemtype="http://schema.org/Organization">
<a href="/stellenangebote-des-unternehmens--Delivery-Hero-Holding-GmbH--98028.html" title="Jobs Delivery Hero Holding GmbH" itemprop="url">
<img src="/upload_de/logo/D/logoDelivery-Hero-Holding-GmbH-98028DE.gif" alt="Logo Delivery Hero Holding GmbH" itemprop="image" width="160" height="80" />
</a>
</div>
<div class="job_info">
<div class="h3 job_title">
<a id="jobtitle-2944914" href="/stellenangebote--Junior-Business-Intelligence-Analyst-CRM-m-f-Berlin-Delivery-Hero-Holding-GmbH--2944914-inline.html?ssaPOP=204&ssaPOR=203" title="Arbeiten bei Delivery Hero Holding GmbH" itemprop="url">
<span itemprop="title">Junior Business Intelligence Analyst / CRM (m/f)</span>
</a>
</div>
<div class="h3 company_name" itemprop="hiringOrganization" itemscope itemtype="http://schema.org/Organization">
<span itemprop="name">Delivery Hero Holding GmbH</span>
</div>
</div>
<div class="job_location_date">
<div class="job_location target-location">
<div class="job_location_info" itemprop="jobLocation" itemscope itemtype="http://schema.org/Place">
<div class="h3 locality" itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">
<span itemprop="addressLocality"> Berlin</span>
</div>
<span class="location_actions">
<a href="javaScript:PopUp('http://www.stepstone.de/5/standort.html?OfferId=2944914&ssaPOP=203&ssaPOR=203','resultList',800,520,1)" class="action_showlistingonmap showlabel" title="Google Maps" itemprop="maps">
<span class="location-icon"><!-- --></span>
<span class="location-label">Google Maps</span>
</a>
</span>
</div>
</div>
<div class="job_date_added" itemprop="datePosted"><time datetime="2014-07-04">04.07.14</time></div>
</div>
<div class="job_actions">
</div>
</div>
<div class="between_listings"><!-- local.spacer --></div>
File input = new File("C:/Talend/workspace/WEBCRAWLER/output/keywords_SOA.txt"); // Load file into extraction1 Document ParseResult = Jsoup.parse(input, "UTF-8", "http://example.com/"); Elements jobListingElements = ParseResult.select(".joblisting"); for (Element jobListingElement: jobListingElements) { jobListingElement.select(".companyName span[itemprop=\"name\"]"); // other element properties System.out.println(jobListingElements);
Java code:
File input = new File("C:/Talend/workspace/WEBCRAWLER/output/keywords_SOA.txt");
// Load file into extraction1
Document ParseResult = Jsoup.parse(input, "UTF-8", "http://example.com/");
Elements jobListingElements = ParseResult.select(".joblisting");
for (Element jobListingElement: jobListingElements) {
jobListingElement.select(".companyName span[itemprop=\"name\"]");
// other element properties
System.out.println(jobListingElements);
}
Thank you!
So you got your Jsoup document right? Than it seems pretty easy if the css class joblisting does not appear anywhere else.
Document document = Jsoup.parse(new File("d:/bla.html"), "utf-8");
Elements elements = document.select(".joblisting");
for (Element element : elements) {
Elements jobTitleElement = element.select(".job_title span");
Elements companyNameElement = element.select(".company_name spanspan[itemprop=name]");
String companyName = companyNameElement.text();
String jobTitle = jobTitleElement.text();
System.out.println(companyName);
System.out.println(jobTitle);
}
I don't know why the attribute [itemprop*=\"name\"] selector does not find the span (Further reading: http://jsoup.org/cookbook/extracting-data/selector-syntax )
Got it: span[itemprop=name] without any quotes or escapes. Other attributes or values also should work to get a more specific selection.