Parsing currency exchange data from https://uzmanpara.milliyet.com.tr/doviz-kurlari/ - java

I prepare the program and I wrote this code with helping but the first 10 times it works then it gives me NULL values,
String url = "https://uzmanpara.milliyet.com.tr/doviz-kurlari/";
//Document doc = Jsoup.parse(url);
Document doc = null;
try {
doc = Jsoup.connect(url).timeout(6000).get();
} catch (IOException ex) {
Logger.getLogger(den3.class.getName()).log(Level.SEVERE, null, ex);
}
int i = 0;
String[] currencyStr = new String[11];
String[] buyStr = new String[11];
String[] sellStr = new String[11];
Elements elements = doc.select(".borsaMain > div:nth-child(2) > div:nth-child(1) > table.table-markets");
for (Element element : elements) {
Elements curreny = element.parent().select("td:nth-child(2)");
Elements buy = element.parent().select("td:nth-child(3)");
Elements sell = element.parent().select("td:nth-child(4)");
System.out.println(i);
currencyStr[i] = curreny.text();
buyStr[i] = buy.text();
sellStr[i] = sell.text();
System.out.println(String.format("%s [buy=%s, sell=%s]",
curreny.text(), buy.text(), sell.text()));
i++;
}
for(i = 0; i < 11; i++){
System.out.println("currency: " + currencyStr[i]);
System.out.println("buy: " + buyStr[i]);
System.out.println("sell: " + sellStr[i]);
}
here is the code, I guess it is a connection problem but I could not solve it I use Netbeans, Do I have to change the connection properties of Netbeans or should I have to add something more in the code
can you help me?

There's nothing wrong with the connection. Your query simply doesn't match the page structure.
Somewhere on your page, there's an element with class borsaMain, that has a direct child with class detL. And then somewhere in the descendants tree of detL, there is your table. You can write this as the following CSS element selector query:
.borsaMain > .detL table
There will be two tables in the result, but I suspect you are looking for the first one.
So basically, you want something like:
Element table = doc.selectFirst(".borsaMain > .detL table");
for (Element row : table.select("tr:has(td)")) {
// your existing loop code
}

Related

Missing Table Elements When Scraping

URL: https://stats.nba.com/player/1628381/defense-dash/
Attempting to get:
`<table>
<tbody>
<!----><tr data-ng-repeat="(i, row) in page" index="0">
<td class="player">Overall</td>
<td>45</td>
<td>45</td>
<td>5.7</td>
<td>12.3</td>
<td>46.6</td>
<td>100%</td>
<td>46.7</td>
<td>-0.1</td>
</tr><!---->
</tbody>
</table> `
My coding:
public static void getData(String url, String Name, int ID) throws
IOException
{
String html = Jsoup.connect(url).execute().body();
html = html.replaceAll("<!---->", "");
html = html.replaceAll("<!--", "");
html = html.replaceAll("-->", "");
Document doc = Jsoup.parse(html);
Elements tableElements = doc.select("table");
System.out.println("Elements " + tableElements);
for (Element tableElement : tableElements)
{
String tableId = tableElement.id();
if (tableId.isEmpty()) {
continue;
}
String fileName = "table" + Name + tableId + ID + ".csv";
System.out.println(fileName);
FileWriter writer = new FileWriter(new File("C:\\Users\\noman\\eclipse-workspace\\Senior Project\\src\\", fileName));
//System.out.println(doc);
Elements tableRowElements = tableElement.select(":not(thead) tr td");
for (int i = 0; i < tableRowElements.size(); i++) {
Element row = tableRowElements.get(i);
Elements rowItems = row.select("td");
for (int j = 0; j < rowItems.size(); j++) {
writer.append(rowItems.get(j).text());
if (j != rowItems.size() - 1) {
writer.append(',');
}
}
writer.append('\n');
}
Problem is no elements are being found. this same code works on another site perfectly which (seemingly) no differences in how they store data
Is there something different with this website preventing web-scraping? or a subtle difference maybe?
Please note HTML code provided is a shorten version
As said at the comments, the data you are looking for is loaded dynamically, but, you can fetch it with a simple GET request from this link -
https://stats.nba.com/stats/playerdashptshotdefend?DateFrom=&DateTo=&GameSegment=&LastNGames=0&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PerMode=PerGame&Period=0&PlayerID=1628381&Season=2018-19&SeasonSegment=&SeasonType=Regular+Season&TeamID=0&VsConference=&VsDivision=
EDIT
To find this link I've used the browser's developer tools and checked for xhr requests.
You can see that the link includes several parameters, among them the playerID which is identical to the number that appears in your intial link. By changing its value you can get stats of other players.

Nesting dropdowns using for loop is giving stale element reference error

I want to know how to nest dropdowns using selenium webdriver using java,i.e., I have 2 dropdowns and can these dropdowns be nested one after the other?
After looping 2 times for a dropdown it is showing stale element reference error
I have written the following code:
Select drpdwns6 = new Select(driver.findElement(By.xpath("//*[#id=\"MainContent_ddlBillable\"]")));
List <WebElement> sels6 = drpdwns6.getOptions();
sels6.size();
for(int s6=0;s6<sels6.size();s6++) {
drpdwns6.selectByIndex(s6);
System.out.println("selected value"+s6);
Select drpdwns7 = new Select(driver.findElement(By.xpath("//*[#id=\"MainContent_ddlofflinestatus\"]")));
List <WebElement> sels7 = drpdwns7.getOptions();
sels7.size();
for(int s7=0;s7<sels7.size();s7++) {
drpdwns7.selectByIndex(s7);
System.out.println("selected value"+s7);
}
}
My guess is selecting the option from the dropdown refreshes the DOM, so the exception is thrown. You need to relocate the dropdown in each itreation
Select drpdwns6 = new Select(driver.findElement(By.id("MainContent_ddlBillable")));
int drpdwns6Size = drpdwns6.getOptions().size();
for(int s6 = 0 ; s6 < drpdwns6Size ; s6++) {
drpdwns6 = new Select(driver.findElement(By.id("MainContent_ddlBillable")));
drpdwns6.selectByIndex(s6);
System.out.println("selected value"+s6);
Select drpdwns7 = new Select(driver.findElement(By.id("MainContent_ddlofflinestatus")));
int drpdwns7Size = drpdwns7.getOptions().size();
for(int s7 = 0 ; drpdwns7Size ; s7++) {
drpdwns7 = new Select(driver.findElement(By.id("MainContent_ddlofflinestatus")));
drpdwns7.selectByIndex(s7);
System.out.println("selected value"+s7);
}
}
As a side note, if you have an id use By.id it instead of By.xpath
You get the Stale element exception whenever the element present in the DOM is deleted or removed or is not available.
The above answer (ie) relocating the element once the DOM is refreshed or you could use Webdriver wait, If an element is not attached to DOM then you could try using ‘try-catch block’ within ‘for loop’ like below
driver.manage().timeouts().implicitlywait(30,TimeUnit.SECONDS);
try{
Select drpdwns6 = new
Select(driver.findElementByXpath("//[#id=\"MainContent_ddlBillable\"]")));
List <WebElement> sels6AllOptions = drpdwns6.getOptions();
int count1=sels6AllOptions.size();
for(int s6=0;s6<count1;s6++)
{
drpdwns6.selectByIndex(s6);
}
}
catch(StaleElementException e1){
System.out.println("selected value"+s6);
}
try{
Select drpdwns7 = new Select(driver.findElement(By.xpath("//*[#id=\"MainContent_ddlofflinestatus\"]")));
List <WebElement> sels7AllOptions = drpdwns7.getOptions();
int count2=sels7AllOptions.size();
for(int s7=0;s7<count2;s7++) {
drpdwns7.selectByIndex(s7);
catch(StaleElementException e2){
System.out.println("selected value"+s7);
}
}

How to generate XPath query matching a specific element in Jsoup?

_ Hi , this is my web page :
<html>
<head>
</head>
<body>
<div> text div 1</div>
<div>
<span>text of first span </span>
<span>text of second span </span>
</div>
<div> text div 3 </div>
</body>
</html>
I'm using jsoup to parse it , and then browse all elements inside the page and get their paths :
Document doc = Jsoup.parse(new File("C:\\Users\\HC\\Desktop\\dataset\\index.html"), "UTF-8");
Elements elements = doc.body().select("*");
ArrayList all = new ArrayList();
for (Element element : elements) {
if (!element.ownText().isEmpty()) {
StringBuilder path = new StringBuilder(element.nodeName());
String value = element.ownText();
Elements p_el = element.parents();
for (Element el : p_el) {
path.insert(0, el.nodeName() + '/');
}
all.add(path + " = " + value + "\n");
System.out.println(path +" = "+ value);
}
}
return all;
my code give me this result :
html/body/div = text div 1
html/body/div/span = text of first span
html/body/div/span = text of second span
html/body/div = text div 3
in fact i want get result like this :
html/body/div[1] = text div 1
html/body/div[2]/span[1] = text of first span
html/body/div[2]/span[2] = text of second span
html/body/div[3] = text div 3
please could any one give me idea how to get reach this result :) . thanks in advance.
As asked here a idea.
Even if I'm quite sure that there better solutions to get the xpath for a given node. For example use xslt as in the answer to "Generate/get xpath from XML node java".
Here the possible solution based on your current attempt.
For each (parent) element check if there are more than one element with this name.
Pseudo code: if ( count (el.select('../' + el.nodeName() ) > 1)
If true count the preceding-sibling:: with same name and add 1.
count (el.select('preceding-sibling::' + el.nodeName() ) +1
This is my solution to this problem:
StringBuilder absPath=new StringBuilder();
Elements parents = htmlElement.parents();
for (int j = parents.size()-1; j >= 0; j--) {
Element element = parents.get(j);
absPath.append("/");
absPath.append(element.tagName());
absPath.append("[");
absPath.append(element.siblingIndex());
absPath.append("]");
}
This would be easier, if you traversed the document from the root to the leafs instead of the other way round. This way you can easily group the elements by tag-name and handle multiple occurences accordingly. Here is a recursive approach:
private final List<String> path = new ArrayList<>();
private final List<String> all = new ArrayList<>();
public List<String> getAll() {
return Collections.unmodifiableList(all);
}
public void parse(Document doc) {
path.clear();
all.clear();
parse(doc.children());
}
private void parse(List<Element> elements) {
if (elements.isEmpty()) {
return;
}
Map<String, List<Element>> grouped = elements.stream().collect(Collectors.groupingBy(Element::tagName));
for (Map.Entry<String, List<Element>> entry : grouped.entrySet()) {
List<Element> list = entry.getValue();
String key = entry.getKey();
if (list.size() > 1) {
int index = 1;
// use paths with index
key += "[";
for (Element e : list) {
path.add(key + (index++) + "]");
handleElement(e);
path.remove(path.size() - 1);
}
} else {
// use paths without index
path.add(key);
handleElement(list.get(0));
path.remove(path.size() - 1);
}
}
}
private void handleElement(Element e) {
String value = e.ownText();
if (!value.isEmpty()) {
// add entry
all.add(path.stream().collect(Collectors.joining("/")) + " = " + value);
}
// process children of element
parse(e.children());
}
Here is the solution in Kotlin. It's correct, and it works. The other answers are wrong and caused me hours of lost work.
fun Element.xpath(): String = buildString {
val parents = parents()
for (j in (parents.size - 1) downTo 0) {
val parent = parents[j]
append("/*[")
append(parent.siblingIndex() + 1)
append(']')
}
append("/*[")
append(siblingIndex() + 1)
append(']')
}

How to list items of a collection with Word Ole Automation

I'm using the SWT OLE api to edit a Word document in an Eclipse RCP. I read articles about how to read properties from the active document but now I'm facing a problem with collections like sections.
I would like to retrieve only the body section of my document but I don't know what to do with my sections object which is an IDispatch object. I read that the item method should be used but I don't understand how.
I found the solution so I'll share it with you :)
Here is a sample code to list all paragraphs of the active document of the word editor :
OleAutomation active = activeDocument.getAutomation();
if(active!=null){
int[] paragraphsId = getId(active, "Paragraphs");
if(paragraphsId.length > 0) {
Variant vParagraphs = active.getProperty(paragraphsId[0]);
if(vParagraphs != null){
OleAutomation paragraphs = vParagraphs.getAutomation();
if(paragraphs!=null){
int[] countId = getId(paragraphs, "Count");
if(countId.length > 0) {
Variant count = paragraphs.getProperty(countId[0]);
if(count!=null){
int numberOfParagraphs = count.getInt();
for(int i = 1 ; i <= numberOfParagraphs ; i++) {
Variant paragraph = paragraphs.invoke(0, new Variant[]{new Variant(i)});
if(paragraph!=null){
System.out.println("paragraph " + i + " added to list!");
listOfParagraphs.add(paragraph);
}
}
return listOfParagraphs;
}
}
}
}
}

Get All Result in Solr with Solrj

I want to get all result with solrj, I add 10 document to Solr, I don't get any exception, but if I add more than 10 document to SolrI get exception. I search that, I get this exception for this, in http://localhost:8983/solr/browse 10 document in first page,11th document go to second page. How I can get all result?
String qry="*:*";
CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr");
QueryResponse rsp=server.query(new SolrQuery(qry));
SolrDocumentList docs=rsp.getResults();
for(int i=0;i<docs.getNumFound();i++){
System.out.println(docs.get(i));
}
Exception in thread "AWT-EventQueue-0" java.lang.IndexOutOfBoundsException: Index: 10, Size: 10
Integer start = 0;
query.setStart(start);
QueryResponse response = server.query(query);
SolrDocumentList rs = response.getResults();
long numFound = rs.getNumFound();
int current = 0;
while (current < numFound) {
ListIterator<SolrDocument> iter = rs.listIterator();
while (iter.hasNext()) {
current++;
System.out.println("************************************************************** " + current + " " + numFound);
SolrDocument doc = iter.next();
Map<String, Collection<Object>> values = doc.getFieldValuesMap();
Iterator<String> names = doc.getFieldNames().iterator();
while (names.hasNext()) {
String name = names.next();
System.out.print(name);
System.out.print(" = ");
Collection<Object> vals = values.get(name);
Iterator<Object> valsIter = vals.iterator();
while (valsIter.hasNext()) {
Object obj = valsIter.next();
System.out.println(obj.toString());
}
}
}
query.setStart(current);
response = server.query(query);
rs = response.getResults();
numFound = rs.getNumFound();
}
}
An easier way:
CloudSolrServer server = new CloudSolrServer(solrZKServerUrl);
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
query.setRows(Integer.MAX_VALUE);
QueryResponse rsp;
rsp = server.query(query, METHOD.POST);
SolrDocumentList docs = rsp.getResults();
for (SolrDocument doc : docs) {
Collection<String> fieldNames = doc.getFieldNames();
for (String s: fieldNames) {
System.out.println(doc.getFieldValue(s));
}
}
numFound gives you the total number of results that matched the Query.
However, by default Solr will return only top 10 results which is controlled by parameter rows.
You are trying to iterate over numFound, However as the results returned are only 10 it fails.
You should use the rows parameter for Iteration.
For getting the next set of results, you would need to requery Solr with a different start parameter. This is to support pagination so that you don't have to pull all the results at one go which is a very heavy operation.
If you refactor your code like this it will work
String qry="*:*";
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
query.setRows(Integer.MAX_VALUE); //Add me to avoid IndexOutOfBoundExc
CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr");
QueryResponse rsp=server.query(query);
SolrDocumentList docs=rsp.getResults();
for(int i=0;i<docs.getNumFound();i++){
System.out.println(docs.get(i));
}
The answer to why it's quite simple.
The response is telling you that there are getNumFound() matching documents,
but if you do not specify in your query how many of them the response must carry, this limit is automatically setted to 10,
ending up
fetching only the top 10 documents out of getNumFound() documents found
For this reason the docs list will have just 10 elements and trying to do the get of the i-th elementh with i > 9 (Eg 10) will take you to a
java.lang.IndexOutOfBoundsException
just like you are experimenting.
P.S i suggest you to use the for iterator just like #Chen Sheng-Lun did.
P.P.S at first this drove me crazy too.

Categories