Parsing a YouTube thumbnail in an iframe with Jsoup - java

I would like to display the default thumbnail image of this YouTube URL in my Android app:
<iframe width="560" height="315" src="https://www.youtube.com/embed/FXx_gbdIUKg" frameborder="0" allowfullscreen=""></iframe>
This is my method for doing so:
static String parseThumbnail(String youTubeURL){
org.jsoup.nodes.Document document = Jsoup.parse(youTubeURL);
Elements youtubeElements = document.select("FXx_gbdIUKg");
org.jsoup.nodes.Document iframeDoc = Jsoup.parse(youtubeElements.get(0).data());
Elements iframeElements = iframeDoc.select("iframe");
return iframeElements.attr("http://img.youtube.com/vi/"+youtubeElements+"/default.jpg");
the iframe is within the "content:encoded" node, so I'm calling this method here.
String itemYouTubeImage = null;
if (XML_TAG_CONTENT_ENCODED.equalsIgnoreCase(tag)) {
String contentEncoded = tagNode.getTextContent();
itemYouTubeImage = parseThumbnail(contentEncoded);
itemImageURL = parseImageFromHTML(contentEncoded);
itemContentEncodedText = parseTextFromHTML(contentEncoded);
How do I properly do this?
One problem I have is that the compiler tells me that the value parseThumbnail(contentEncoded) assigned to itemYouTubeImage is never used

If you want just the default thumbnail, this is provided in the <head> of the youtube HTML document. It is not encoded.
<link itemprop="thumbnailUrl"
href="https://i.ytimg.com/vi/2qhzsn3pZgk/maxresdefault.jpg">
To select on the attribute value and get the absolute URL:
String youtubeUrl = "https://www.youtube.com/watch?v=9wpqE8OSWrU";
Document doc = Jsoup.connect(youtubeUrl).get();
String thumbnailUrl = doc
.select("link[itemprop=thumbnailUrl]")
.first()
.absUrl("href");
System.out.println(thumbnailUrl);
Output
https://i.ytimg.com/vi/9wpqE8OSWrU/maxresdefault.jpg
Read more in the Jsoup cookbook.

Related

JSoup, how to return data from a dynamic <a href> tag

Very new to JSoup, trying to retrieve a changeable value that is stored within an tag, specifically from the following website and html.
Snapshot of HTML
the results after "consitituency/" are changeable and dependent on the input of the user. I am able to retrieve the h2 tags themselves but not the information within. At the moment the best return I can get is just tags using the method below
The desired return would be something that I can substring down into
Dublin Bay South
The actual return is
<well.col-md-4.h2></well.col-md-4.h2>
private String jSoupTDRequest(String aLine1, String aLine3) throws IOException {
String constit = "";
String h2 = "h2";
String url = "https://www.whoismytd.com/search?utf8=✓&form-input="+aLine1+"%2C+"+aLine3+"+Ireland";
//Switch to try catch if time
Document doc = Jsoup.connect(url)
.timeout(6000).get();
//Scrape elements from relevant section
Elements body = doc.select("well.col-md-4.h2");
Element e = new Element("well.col-md-4.h2");
constit = e.toString();
return constit;
I am extremely new to JSoup and scraping in general. Would appreciate any input from someone who knows what they're doing or any alternate ways to try and get the desired result
Change your scraping elements from relevant section code as follows:
Select the very first <div class="well"> element first.
Element tdsDiv = doc.select("div.well").first();
Select the very first <a> link element next. This link points to the constituency.
Element constLink = tdsDiv.select("a").first();
Get the constituency name by grabbing this link's text content.
constit = constLink.text();
import org.junit.jupiter.api.Test;
import java.io.IOException;
#DisplayName("JSoup, how to return data from a dynamic <a href> tag")
class JsoupQuestionTest {
private static final String URL = "https://www.whoismytd.com/search?utf8=%E2%9C%93&form-input=Kildare%20Street%2C%20Dublin%2C%20Ireland";
#Test
void findSomeText() throws IOException {
String expected = "Dublin Bay South";
Document document = Jsoup.connect(URL).get();
String actual = document.getElementsByAttributeValue("href", "/constituency/dublin-bay-south").text();
Assertions.assertEquals(expected, actual);
}
}

jsoup extract specific attribute from a hyperlink

I have I some hyperlinks in a web page that I want to extract the attribute title which within it
I tried
select("a[href]").attr("title")
but I get no thing
Edit
The complete div here
Trial code
Elements es = doc.select("div.mini-placard")
for(Element e:es)
{
System.out.println( e.select("span.align-image-vertically").select("a").attr("title"));
}
no output !
Please extract link element properly and then inspect attributes of the link element as below:
String html = "<p>An <a href='http://example.com/' title='hi'><b>example</b></a> link.</p>";
Document doc = Jsoup.parse(html);
Element link = doc.select("a").first();
String text = doc.body().text(); // "An example link"
String linkHref = link.attr("href"); // "http://example.com/"
String linkTitle = link.attr("title"); // 'hi'
Courtesy

Get class name Jsoup

I am trying to parse some html for android app, but I can't get the value for the data-id class
Here's the html code
<div class="popup event-popup Predavanja" style="display: none;" data-id="246274" data-position="bottom" >
How can I parse the 246274 value?
If you have the Element object of the div tag, then this code will work:
String attr = element.attr("data-id"); // get the value of the 'data-id' attribute
int dataID = Integer.parseInt(attr); // convert it to an int
Optionally, if you want to check first if the attribute even exists, use this:
if (element.hasAttr("data-id")) // etc.
I think you can do like this
Document doc = JSoup.parse(""Url");
Element divElement = doc.select("div.popup event-popup Predavanja").first();//Div with class name
String dataId = divElement.attr("data-id");
Follow this link https://jsoup.org/cookbook/extracting-data/selector-syntax

How to extract those elements in Jsoup

I want to extract the "Abstract" and the "Title" as shown in the photo below. However I can't extract the title and I tried to extract the tag "Abstract" but it didn't work.
String html = "http://example.com/";
Document doc = Jsoup.parse(html);
Element link = doc.select("Abstract").first();
Try this:
Element title = doc.select("FONT[size=+1]").first();
Element abstractParagraph = doc.select("CENTER:has(b:containsOwn(Abstract)) + p").first();

How to access the subclass using jsoup

I want to access this webpage: https://www.google.com/trends/explore#q=ice%20cream and extract the data within in the center line graph. The html file is(Here, I only paste the part that I use.):
<div class="center-col">
<div class="comparison-summary-title-line">...</div>
...
<div id="reportContent" class="report-content">
<!-- This tag handles the report titles component -->
...
<div id="report">
<div id="reportMain">
<div class="timeSection">
<div class = "primaryBand timeBand">...</div>
...
<div aria-lable = "one-chart" style = "position: absolute; ...">
<svg ....>
...
<script type="text/javascript">
var chartData = {...}
And the data I used is stored in the script part(last line). My idea is to get the class "report-content" first, and then select script. And my code follows as:
String html = "https://www.google.com/trends/explore#q=ice%20cream";
Document doc = Jsoup.connect(html).get();
Elements center = doc.getElementsByClass("center-col");
Element report = doc.getElementsByClass("report-content");
System.out.println(center);
System.out.println(report);
When I print "center" class, I can get all the subclasses content except the "report-content", and when I print the "report-content", the result is only like:
<div id="reportContent" Class="report-content"></div>
And I also try this:
Element report = doc.select(div.report-content).first();
but still does not work at all. How could I get the data in the script here? I appreciate your help!!!
Try this url instead:
https://www.google.com/trends/trendsReport?hl=en&q=${keywords}&tz=${timezone}&content=1
where
${keywords} is an encoded space separated keywords list
${timezone} is an encoded timezone in the Etc/GMT* form
DEMO
SAMPLE CODE
String myKeywords = "ice cream";
String myTimezone = "Etc/GMT+2";
String url = "https://www.google.com/trends/trendsReport?hl=en&q=" + URLEncoder.encode(keywords, "UTF-8") +"&tz="+URLEncoder.encode(myTimezone, "UTF-8")+"&content=1";
Document doc = Jsoup.connect(url).timeout(10000).get();
Element scriptElement = doc.select("div#TIMESERIES_GRAPH_0-time-chart + script").first();
if (scriptElement==null) {
throw new RuntimeException("Unable to locate trends data.");
}
String jsCode = scriptElement.html();
// parse jsCode to extract charData...
References:
How to extract the text of a <script> element with Jsoup?
Trying getting the same by Id, you would get the complete tag

Categories