Retrieve more than 19 results with AbderaClient? (pagination issue I guess) - java

The manual query in the browser gives 1305 results:
http://www.gahetna.nl/beeldbank-api/opensearch/?q=Leiden
But my code with Abdera (below) returns just 19 results. This must be due to pagination. How do I access all the pages of results? Thx!
List<NationaalArchiefDoc> docs = new ArrayList();
Abdera abdera = new Abdera();
AbderaClient client = new AbderaClient(abdera);
searchString = searchString.replaceAll(" ", "+");
ClientResponse resp = client.get("http://www.gahetna.nl/beeldbank-api/opensearch/?q=" + searchString.trim());
if (resp.getType() == ResponseType.SUCCESS) {
Document docAbdera = resp.getDocument();
Element elementAbdera = docAbdera.getRoot();
Iterator<Element> elementIterator = elementAbdera.getElements().get(0).getElements().iterator();
System.out.println("number of docs found: " + elementAbdera.getElements().get(0).getElements().size());
//looping through the 19 items
while (elementIterator.hasNext()) {
NationaalArchiefDoc doc = new NationaalArchiefDoc();
Element element1 = elementIterator.next();
// more actions in the loop...
}
}

Related

Parsing currency exchange data from https://uzmanpara.milliyet.com.tr/doviz-kurlari/

I prepare the program and I wrote this code with helping but the first 10 times it works then it gives me NULL values,
String url = "https://uzmanpara.milliyet.com.tr/doviz-kurlari/";
//Document doc = Jsoup.parse(url);
Document doc = null;
try {
doc = Jsoup.connect(url).timeout(6000).get();
} catch (IOException ex) {
Logger.getLogger(den3.class.getName()).log(Level.SEVERE, null, ex);
}
int i = 0;
String[] currencyStr = new String[11];
String[] buyStr = new String[11];
String[] sellStr = new String[11];
Elements elements = doc.select(".borsaMain > div:nth-child(2) > div:nth-child(1) > table.table-markets");
for (Element element : elements) {
Elements curreny = element.parent().select("td:nth-child(2)");
Elements buy = element.parent().select("td:nth-child(3)");
Elements sell = element.parent().select("td:nth-child(4)");
System.out.println(i);
currencyStr[i] = curreny.text();
buyStr[i] = buy.text();
sellStr[i] = sell.text();
System.out.println(String.format("%s [buy=%s, sell=%s]",
curreny.text(), buy.text(), sell.text()));
i++;
}
for(i = 0; i < 11; i++){
System.out.println("currency: " + currencyStr[i]);
System.out.println("buy: " + buyStr[i]);
System.out.println("sell: " + sellStr[i]);
}
here is the code, I guess it is a connection problem but I could not solve it I use Netbeans, Do I have to change the connection properties of Netbeans or should I have to add something more in the code
can you help me?
There's nothing wrong with the connection. Your query simply doesn't match the page structure.
Somewhere on your page, there's an element with class borsaMain, that has a direct child with class detL. And then somewhere in the descendants tree of detL, there is your table. You can write this as the following CSS element selector query:
.borsaMain > .detL table
There will be two tables in the result, but I suspect you are looking for the first one.
So basically, you want something like:
Element table = doc.selectFirst(".borsaMain > .detL table");
for (Element row : table.select("tr:has(td)")) {
// your existing loop code
}

Scraping multiple pages with jsoup

I am trying to scrap links in pagination of GitHub repositories
I have scraped them separately but what Now I want is to optimize it using some loop. Any idea how can i do it? here is code
ComitUrl= "http://github.com/apple/turicreate/commits/master";
Document document2 = Jsoup.connect(ComitUrl ).get();
Element pagination = document2.select("div.pagination a").get(0);
String Url1 = pagination.attr("href");
System.out.println("pagination-link1 = " + Url1);
Document document3 = Jsoup.connect(Url1).get();
Element pagination2 = document3.select("div.pagination a").get(1);
String Url2 = pagination2.attr("href");
System.out.println("pagination-link2 = " + Url2);
Document document4 = Jsoup.connect(Url2).get();
Element check = document4.select("span.disabled").first();
if (check.text().equals("Older")) {
System.out.println("No pagination link more");
}
else { Element pagination3 = document4.select("div.pagination a").get(1);
String Url3 = pagination3.attr("href");
System.out.println("pagination-link3 = " + Url3);
}
Try something like given below:
public static void main(String[] args) throws IOException{
String url = "http://github.com/apple/turicreate/commits/master";
//get first link
String link = Jsoup.connect(url).get().select("div.pagination a").get(0).attr("href");
//an int just to count up links
int i = 1;
System.out.println("pagination-link_"+ i + "\t" + link);
//parse next page using link
//check if the div on next page has more than one link in it
while(Jsoup.connect(link).get().select("div.pagination a").size() >1){
link = Jsoup.connect(link).get().select("div.pagination a").get(1).attr("href");
System.out.println("pagination-link_"+ (++i) +"\t" + link);
}
}

"Unable to locate element" error when reading data from a table, with 3/10 reproduction rate

I have written the code below for getting the data from a column:
String[] correct = new String[101];
String a = "//*[#id='mainData']/table/tbody/tr[";
String b = "]/td[2]";
int[] comparison = new int[101];
for (int i = 1; i <= 100; i++) {
String c = a + i + b;
WebElement element = driver.findElement(By.xpath(c));
correct[i] = element.getText();
if (correct[i].equals("Order Request Import")) {
String[] orderButton = new String[101];
String e = "//*[#id='mainData']/table/tbody/tr[";
String f = "]/td[1]/div/button";
String g = e + i + f;
orderButton[i] = driver.findElement(By.xpath(g)).getText();
comparison[i]=Integer.parseInt(orderButton[i].substring(4));
}
}
The problem is that 3 out of 10 tests are failing at line WebElement element = driver.findElement(By.xpath(c));, while the rest are working like a charm. Any idea why this occurs, or how to improve my code so it reads from the table all the time?
It might be possible when you are going to locate element, it would not be present on the DOM, that's why your test becomes fail some time.
For better way you should implement WebDriverWait here to wait until element present on the DOM then do the next operation as below :-
WebDriverWait wait = new WebDriverWait(driver, 10);
WebElement element = wait.until(ExpectedConditions.presenceOfElementLocated(By.xpath(c)));

Get Items in a PurchaseOrder using SuiteTalk

I am attempting to get the items and some of the related information from a Purchase Order with SuiteTalk. I am able to get the desired Purchase Orders with TransactionSearch using the following in Scala:
val transactionSearch = new TransactionSearch
val search = new TransactionSearchBasic
...
search.setLastModifiedDate(searchLastModified) //Gets POs modified in the last 10 minutes
transactionSearch.setBasic(search)
val result = port.search(transactionSearch)
I am able to cast each result to a record as an instance of the PurchaseOrder class.
if (result.getStatus().isIsSuccess()) {
println("Transactions: " + result.getTotalRecords)
for (i <- 0 until result.getTotalRecords) {
try {
val record = result.getRecordList.getRecord.get(i).asInstanceOf[PurchaseOrder]
record.get<...>
}
catch {...}
}
}
From here I am able to use the getters to access the individual fields, except for the ItemList.
I can see in the NetSuite web interface that there are items attached to the Purchase Orders. However using getItemList on the result record is always returning a null response.
Any thoughts?
I think you have not used search preferences and that is why you are not able to fetch purchase order line items. You will have to use following search preferences in your code -
SearchPreferences prefrence = new SearchPreferences();
prefrence.bodyFieldsOnly = false;
_service.searchPreferences = prefrence;
Following is working example using above preferences -
private void SearchPurchaseOrderByID(string strPurchaseOrderId)
{
TransactionSearch tranSearch = new TransactionSearch();
TransactionSearchBasic tranSearchBasic = new TransactionSearchBasic();
RecordRef poRef = new RecordRef();
poRef.internalId = strPurchaseOrderId;
poRef.type = RecordType.purchaseOrder;
poRef.typeSpecified = true;
RecordRef[] poRefs = new RecordRef[1];
poRefs[0] = poRef;
SearchMultiSelectField poID = new SearchMultiSelectField();
poID.searchValue = poRefs;
poID.#operator = SearchMultiSelectFieldOperator.anyOf;
poID.operatorSpecified = true;
tranSearchBasic.internalId = poID;
tranSearch.basic = tranSearchBasic;
InitService();
SearchResult results = _service.search(tranSearch);
if (results.status.isSuccess && results.status.isSuccessSpecified)
{
Record[] poRecords = results.recordList;
PurchaseOrder purchaseOrder = (PurchaseOrder)poRecords[0];
PurchaseOrderItemList poItemList = purchaseOrder.itemList;
PurchaseOrderItem[] poItems = poItemList.item;
if (poItems != null && poItems.Length > 0)
{
for (var i = 0; i < poItems.Length; i++)
{
Console.WriteLine("Item Line On PO = " + poItems[i].line);
Console.WriteLine("Item Quantity = " + poItems[i].quantity);
Console.WriteLine("Item Descrition = " + poItems[i].description);
}
}
}
}

Get All Result in Solr with Solrj

I want to get all result with solrj, I add 10 document to Solr, I don't get any exception, but if I add more than 10 document to SolrI get exception. I search that, I get this exception for this, in http://localhost:8983/solr/browse 10 document in first page,11th document go to second page. How I can get all result?
String qry="*:*";
CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr");
QueryResponse rsp=server.query(new SolrQuery(qry));
SolrDocumentList docs=rsp.getResults();
for(int i=0;i<docs.getNumFound();i++){
System.out.println(docs.get(i));
}
Exception in thread "AWT-EventQueue-0" java.lang.IndexOutOfBoundsException: Index: 10, Size: 10
Integer start = 0;
query.setStart(start);
QueryResponse response = server.query(query);
SolrDocumentList rs = response.getResults();
long numFound = rs.getNumFound();
int current = 0;
while (current < numFound) {
ListIterator<SolrDocument> iter = rs.listIterator();
while (iter.hasNext()) {
current++;
System.out.println("************************************************************** " + current + " " + numFound);
SolrDocument doc = iter.next();
Map<String, Collection<Object>> values = doc.getFieldValuesMap();
Iterator<String> names = doc.getFieldNames().iterator();
while (names.hasNext()) {
String name = names.next();
System.out.print(name);
System.out.print(" = ");
Collection<Object> vals = values.get(name);
Iterator<Object> valsIter = vals.iterator();
while (valsIter.hasNext()) {
Object obj = valsIter.next();
System.out.println(obj.toString());
}
}
}
query.setStart(current);
response = server.query(query);
rs = response.getResults();
numFound = rs.getNumFound();
}
}
An easier way:
CloudSolrServer server = new CloudSolrServer(solrZKServerUrl);
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
query.setRows(Integer.MAX_VALUE);
QueryResponse rsp;
rsp = server.query(query, METHOD.POST);
SolrDocumentList docs = rsp.getResults();
for (SolrDocument doc : docs) {
Collection<String> fieldNames = doc.getFieldNames();
for (String s: fieldNames) {
System.out.println(doc.getFieldValue(s));
}
}
numFound gives you the total number of results that matched the Query.
However, by default Solr will return only top 10 results which is controlled by parameter rows.
You are trying to iterate over numFound, However as the results returned are only 10 it fails.
You should use the rows parameter for Iteration.
For getting the next set of results, you would need to requery Solr with a different start parameter. This is to support pagination so that you don't have to pull all the results at one go which is a very heavy operation.
If you refactor your code like this it will work
String qry="*:*";
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
query.setRows(Integer.MAX_VALUE); //Add me to avoid IndexOutOfBoundExc
CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr");
QueryResponse rsp=server.query(query);
SolrDocumentList docs=rsp.getResults();
for(int i=0;i<docs.getNumFound();i++){
System.out.println(docs.get(i));
}
The answer to why it's quite simple.
The response is telling you that there are getNumFound() matching documents,
but if you do not specify in your query how many of them the response must carry, this limit is automatically setted to 10,
ending up
fetching only the top 10 documents out of getNumFound() documents found
For this reason the docs list will have just 10 elements and trying to do the get of the i-th elementh with i > 9 (Eg 10) will take you to a
java.lang.IndexOutOfBoundsException
just like you are experimenting.
P.S i suggest you to use the for iterator just like #Chen Sheng-Lun did.
P.P.S at first this drove me crazy too.

Categories