I need get the currency data from website, here the website HTML table code:
<tr>
<td class="currency-up"></td>
<td class="currency">
ABD Doları
</td>
<td class>8,2805</td>
<td class>8,2856</td>
</tr>
I wrote these code but I could not handle the code:
String url = "https://uzmanpara.milliyet.com.tr/doviz-kurlari/";
Document doc = null;
try {
doc = Jsoup.connect(url).timeout(6000).get();
} catch (IOException ex) {
Logger.getLogger(den3.class.getName()).log(Level.SEVERE, null, ex);
}
Element link = doc.select("href").first();
String linkHref = link.attr("href"); // "http://example.com/"
System.out.println(linkHref);
But I got this problem:
Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException:
Cannot invoke "org.jsoup.nodes.Element.attr(String)" because "link" is
null
How can I handle this problem, how can I get currency rate.
You can try like this:
Element link = doc.select("a[href]").first();
If you just type href, it will search for the href tagname, but there is never such a tagname. You have to look for the href attribute of the a tag.
Let's start with a simple example.
Example, to get the value of the 2nd span below the element whose href value is /dolar-kuru/, you can try:
// Example of selection with id.
Element element2 = doc.select("#usd_header_son_data").first();
String usd2 = element2.text();
System.out.println(usd2);
// Example of selecting 2nd span with href value and below. (1)
Element element1 = doc.select("a[href='/dolar-kuru/'] > span > span").first();
String usd1 = element1.text();
System.out.println(usd1);
// Example of selecting 2nd span with href value and below. (2)
Element element3 = doc.select("a[href='/dolar-kuru/'] > span :nth-child(2)").first();
String usd3 = element3.text();
System.out.println(usd3);
We can take the example one step further.
Let's take both the buy and sell prices from a table of exchange rates.
Elements elements = doc.select(".borsaMain > div:nth-child(2) > div:nth-child(1) > table td.currency");
for (Element element : elements) {
Elements curreny = element.parent().select("td:nth-child(2)");
Elements buy = element.parent().select("td:nth-child(3)");
Elements sell = element.parent().select("td:nth-child(4)");
System.out.println(String.format("%s [buy=%s, sell=%s]",
curreny.text(), buy.text(), sell.text()));
}
Will give an output that looks like this:
ABD Doları [buy=8,2855, sell=8,2888]
Euro [buy=9,8389, sell=9,8645]
İngiliz Sterlini [buy=11,4203, sell=11,4775]
Kanada Doları [buy=6,5696, sell=6,6091]
İsviçre Frangı [buy=9,0128, sell=9,0671]
Suudi Riyali [buy=2,2025, sell=2,2135]
...
More different selectors can be used, see. https://jsoup.org/cookbook/extracting-data/selector-syntax
For the provided HTML code, you can do the following:
Element link = doc.select("a[href]").first();
String linkHref = link.attr("href");
System.out.println(linkHref);
For the url provided in the code sample, if you want to select only the first, you can do:
Element link = doc.select("td.currency > a").first();
String linkHref = link.attr("href");
System.out.println(linkHref);
Just to explain the previous code: "td.currency" will search for "td" tags with the class "currency". And " > a" will retrieve the child elements which are a "a" tag.
And if you want all currencies you can do
Elements links = doc.select("td.currency > a");
links.forEach(link -> System.out.println(link.attr("href")));
Note that there are some duplicates in the last code sample.
Related
I am scraping a medical website where I need to extract header wise information regarding a drug e.g Precautions, Contraindications,Dosage, Uses etc. The HTML data looks like below. If I just extract info using the tag p.drug-content I get content under all the headers as one big paragraph. How do I get header wise content where the paragraph for dosage should come under dosage, Precautions under precautions, so on and so forth?
<a name="Warning"></a>
<div class="report-content drug-widget">
<div class="drug-header"><h2 style="color:#000000!important;">What are the warnings and precautions for Abacavir? </h2></div>
<p class="drug-content">
• Caution is advised when used in patients with history of depression or at risk for heart disease<br>• Avoid use with alcohol.<br>• Take along with other anti-HIV drugs and not alone, to prevent resistance.<br>• Continue other precautions to prevent spread of HIV infection.</p></div>
<a name="Prescription"></a>
<div class="report-content drug-widget">
<div class="drug-header"><h2 style="color:#000000!important;">Why is Abacavir Prescribed? (Indications) </h2></div>
<p class="drug-content">Abacavir is an antiviral drug that is effective against the HIV-1 virus. It acts on an enzyme of the virus called reverse transcriptase, which plays an important role in its multiplication. Though abacavir reduces viral load and may slow the progression of the disease, it does not cure the HIV infection. </p></div>
<a name="Dosage"></a>
<div class="report-content drug-widget">
<div class="drug-header"><h2 style="color:#000000!important;">What is the dosage of Abacavir?</h2></div>
<p class="drug-content"> Treatment of HIV-1/AIDS along with other medications. Dose in adults is 600 mg daily, as a single dose or divided into two doses.
</p></div>
Here is my code:
private static void ScrapingDrugInfo() throws IOException{
Connection.Response response = null;
Document doc = null;
List<SideEffectsObject> sideEffectsList = new ArrayList<>();
int i=0;
String[] keywords = {"a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"};
for (String keyword : keywords){
final String url = "https://www.medindia.net/doctors/drug_information/home.asp?alpha=" + keyword;
response = Jsoup.connect(url)
.userAgent("Mozilla/5.0")
.execute();
doc = response.parse();
Element tds = doc.select("div.related-links.top-gray.col-list.clear-fix").first();
Elements links = tds.select("li[class=list-item]");
for (Element link : links){
final String newURL = "https://www.medindia.net/doctors/drug_information/".concat(link.select("a").attr("href")) ;
response = Jsoup.connect(newURL)
.userAgent("Mozilla/5.0")
.execute();
doc = response.parse();
Elements classification = doc.select("div.clear.b");
System.out.println("Classification::"+classification.text());
Elements drugBrands = doc.select("div.drug-content");
Elements drugBrandsIndian = drugBrands.select("div.links");
System.out.println("Drug Brand Links Indian::"+ drugBrandsIndian.select("a[href]"));
System.out.println("Drug Brand Names Indian::"+ drugBrandsIndian.text());
System.out.println("Drug Brand Names International::"+doc.select("div.drug-content.h3").text());
Elements prescritpionText = doc.select("a[name=Prescription]");
Elements prescriptionData = prescritpionText.select("p.drug-content");
System.out.println("Prescription Data::"+ prescriptionData.text());
Elements contraindications = doc.select("a[name=Contraindications]");
Elements contraindicationsText = contraindications.select("p[class=drug-content]");
System.out.println("Contrainidications Text::" + contraindicationsText.text());
Elements dosage = doc.select("a[name=Dosage]");
Elements dosageText = dosage.select("p[class=drug-content]");
System.out.println("Dosage Text::" + dosageText.text());
}
}
If I understand the question correctly, it sounds like you want to pair up the value of the a tags name attribute with the p content of the following div. You should be able to do that with the following code:
Elements aTags = doc.select("a[name]");
for(Element header : aTags){
System.out.println(header.attr("name"));
// Get the sibling div of a and get it's p content
Element pTag = header.nextElementSibling().select("p.drug-content").first();
System.out.println(pTag.text());
}
I'm parsing html of a website with JSoup. I want to parse this part:
<td class="lastpost">
This is a text 1<br>
Website Page - 1
</td>
I want like this:
String text = "This is a text 1";
String textNo = "Website Page - 1";
String link = "post/13594";
How can I get the parts like this?
Your code would only get all the text that is in the td elements that you are selecting. If you want to store the text in separate variables, you should grab the parts separately like the following code. Extra comments added so you can understand how/why it is getting each piece.
// Get the first td element that has class="lastpost"
Element lastPost = document.select("td.lastpost").first();
// Get the first a element that is a child of the td
Element linkElement = lastPost.getElementsByTag("a").first();
// This text is the first child node of td, get that node and call toString
String text = lastPost.childNode(0).toString();
// This is the text within the a (link) element
String textNo = linkElement.text();
// This text is the href attribute value of the a (link) element
String link = linkElement.attr("href");
I have I some hyperlinks in a web page that I want to extract the attribute title which within it
I tried
select("a[href]").attr("title")
but I get no thing
Edit
The complete div here
Trial code
Elements es = doc.select("div.mini-placard")
for(Element e:es)
{
System.out.println( e.select("span.align-image-vertically").select("a").attr("title"));
}
no output !
Please extract link element properly and then inspect attributes of the link element as below:
String html = "<p>An <a href='http://example.com/' title='hi'><b>example</b></a> link.</p>";
Document doc = Jsoup.parse(html);
Element link = doc.select("a").first();
String text = doc.body().text(); // "An example link"
String linkHref = link.attr("href"); // "http://example.com/"
String linkTitle = link.attr("title"); // 'hi'
Courtesy
I am trying to parse certain information through jsoup in Java from last 3 days -_-, this is my code:
Document document = Jsoup.connect(urlofpage).get();
Elements links = document.select(".contentBox");
for (Element link : links) {
// String name = link.text();
String title = link.select("h2").text();
String content = link.select("p").text();
System.out.println(title);
System.out.println(content);
}
It is fetching the data as it is directed, fetching the data of h2 and p separated, but the problem is, I want to parse the data inside of <p> tag which is just after every <h2> tag.
For example (HTML content):
<h2>main content</h2>
<div class="acx"><div>
<p>content</p>
<p>content 2</p>
<h2>content 2</h2>
<div class="acx"><div>
<p>new content od 2</p>
<p>new 2</p>
Now it should fetch like (in array):
array[0] = "content content 2",
array[1] = "new content od 2 new 2",
Any solutions?
You can play with "~" next element selector. For example
link.select("h2 ~ p").get(0).text(); // returns "content"
link.select("h2 ~ p").get(1).text(); // returns "new content od 2"
Just use your initial approach to iterate all necessary tags within selected .contentBox class:
Document document = Jsoup.connect(urlofpage).get();
Elements links = document.select(".contentBox");
for (Element link : links) {
for (Element h2Tag : link.select("h2"))
{
System.out.println(h2Tag.text());
}
for (Element pTag : link.select("p"))
{
System.out.println(pTag.text());
}
}
I am trying to parse some html for android app, but I can't get the value for the data-id class
Here's the html code
<div class="popup event-popup Predavanja" style="display: none;" data-id="246274" data-position="bottom" >
How can I parse the 246274 value?
If you have the Element object of the div tag, then this code will work:
String attr = element.attr("data-id"); // get the value of the 'data-id' attribute
int dataID = Integer.parseInt(attr); // convert it to an int
Optionally, if you want to check first if the attribute even exists, use this:
if (element.hasAttr("data-id")) // etc.
I think you can do like this
Document doc = JSoup.parse(""Url");
Element divElement = doc.select("div.popup event-popup Predavanja").first();//Div with class name
String dataId = divElement.attr("data-id");
Follow this link https://jsoup.org/cookbook/extracting-data/selector-syntax