Get All Result in Solr with Solrj - java

I want to get all result with solrj, I add 10 document to Solr, I don't get any exception, but if I add more than 10 document to SolrI get exception. I search that, I get this exception for this, in http://localhost:8983/solr/browse 10 document in first page,11th document go to second page. How I can get all result?
String qry="*:*";
CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr");
QueryResponse rsp=server.query(new SolrQuery(qry));
SolrDocumentList docs=rsp.getResults();
for(int i=0;i<docs.getNumFound();i++){
System.out.println(docs.get(i));
}
Exception in thread "AWT-EventQueue-0" java.lang.IndexOutOfBoundsException: Index: 10, Size: 10

Integer start = 0;
query.setStart(start);
QueryResponse response = server.query(query);
SolrDocumentList rs = response.getResults();
long numFound = rs.getNumFound();
int current = 0;
while (current < numFound) {
ListIterator<SolrDocument> iter = rs.listIterator();
while (iter.hasNext()) {
current++;
System.out.println("************************************************************** " + current + " " + numFound);
SolrDocument doc = iter.next();
Map<String, Collection<Object>> values = doc.getFieldValuesMap();
Iterator<String> names = doc.getFieldNames().iterator();
while (names.hasNext()) {
String name = names.next();
System.out.print(name);
System.out.print(" = ");
Collection<Object> vals = values.get(name);
Iterator<Object> valsIter = vals.iterator();
while (valsIter.hasNext()) {
Object obj = valsIter.next();
System.out.println(obj.toString());
}
}
}
query.setStart(current);
response = server.query(query);
rs = response.getResults();
numFound = rs.getNumFound();
}
}

An easier way:
CloudSolrServer server = new CloudSolrServer(solrZKServerUrl);
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
query.setRows(Integer.MAX_VALUE);
QueryResponse rsp;
rsp = server.query(query, METHOD.POST);
SolrDocumentList docs = rsp.getResults();
for (SolrDocument doc : docs) {
Collection<String> fieldNames = doc.getFieldNames();
for (String s: fieldNames) {
System.out.println(doc.getFieldValue(s));
}
}

numFound gives you the total number of results that matched the Query.
However, by default Solr will return only top 10 results which is controlled by parameter rows.
You are trying to iterate over numFound, However as the results returned are only 10 it fails.
You should use the rows parameter for Iteration.
For getting the next set of results, you would need to requery Solr with a different start parameter. This is to support pagination so that you don't have to pull all the results at one go which is a very heavy operation.

If you refactor your code like this it will work
String qry="*:*";
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
query.setRows(Integer.MAX_VALUE); //Add me to avoid IndexOutOfBoundExc
CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr");
QueryResponse rsp=server.query(query);
SolrDocumentList docs=rsp.getResults();
for(int i=0;i<docs.getNumFound();i++){
System.out.println(docs.get(i));
}
The answer to why it's quite simple.
The response is telling you that there are getNumFound() matching documents,
but if you do not specify in your query how many of them the response must carry, this limit is automatically setted to 10,
ending up
fetching only the top 10 documents out of getNumFound() documents found
For this reason the docs list will have just 10 elements and trying to do the get of the i-th elementh with i > 9 (Eg 10) will take you to a
java.lang.IndexOutOfBoundsException
just like you are experimenting.
P.S i suggest you to use the for iterator just like #Chen Sheng-Lun did.
P.P.S at first this drove me crazy too.

Related

Parsing currency exchange data from https://uzmanpara.milliyet.com.tr/doviz-kurlari/

I prepare the program and I wrote this code with helping but the first 10 times it works then it gives me NULL values,
String url = "https://uzmanpara.milliyet.com.tr/doviz-kurlari/";
//Document doc = Jsoup.parse(url);
Document doc = null;
try {
doc = Jsoup.connect(url).timeout(6000).get();
} catch (IOException ex) {
Logger.getLogger(den3.class.getName()).log(Level.SEVERE, null, ex);
}
int i = 0;
String[] currencyStr = new String[11];
String[] buyStr = new String[11];
String[] sellStr = new String[11];
Elements elements = doc.select(".borsaMain > div:nth-child(2) > div:nth-child(1) > table.table-markets");
for (Element element : elements) {
Elements curreny = element.parent().select("td:nth-child(2)");
Elements buy = element.parent().select("td:nth-child(3)");
Elements sell = element.parent().select("td:nth-child(4)");
System.out.println(i);
currencyStr[i] = curreny.text();
buyStr[i] = buy.text();
sellStr[i] = sell.text();
System.out.println(String.format("%s [buy=%s, sell=%s]",
curreny.text(), buy.text(), sell.text()));
i++;
}
for(i = 0; i < 11; i++){
System.out.println("currency: " + currencyStr[i]);
System.out.println("buy: " + buyStr[i]);
System.out.println("sell: " + sellStr[i]);
}
here is the code, I guess it is a connection problem but I could not solve it I use Netbeans, Do I have to change the connection properties of Netbeans or should I have to add something more in the code
can you help me?
There's nothing wrong with the connection. Your query simply doesn't match the page structure.
Somewhere on your page, there's an element with class borsaMain, that has a direct child with class detL. And then somewhere in the descendants tree of detL, there is your table. You can write this as the following CSS element selector query:
.borsaMain > .detL table
There will be two tables in the result, but I suspect you are looking for the first one.
So basically, you want something like:
Element table = doc.selectFirst(".borsaMain > .detL table");
for (Element row : table.select("tr:has(td)")) {
// your existing loop code
}

SolrCloud Commit Slow for Indexing One Document

I am trying to use SolrJ to index one document to SolrCloud and query it. However, even if I am trying to index only one document, the commit is taking a really long time (10 minutes-ish to commit).
Here is my code and can anyone help me explain what is really going on here?
String zkHostString = "node1.datafireball.com:2181/solr";
SolrClient solr = new CloudSolrClient(zkHostString);
String mycollection = "mycollection";
java.util.Date date = new java.util.Date();
// index document
SolrInputDocument document = new SolrInputDocument();
String myid = "myweirdid" + Long.toString(date.getTime());
document.addField("id", myid);
document.addField("mfr", "datafireball_mfr");
document.addField("mpn", "datafireball_mpn");
UpdateResponse indexResponse = solr.add(mycollection, document);
System.out.println(indexResponse);
solr.commit(mycollection);
System.out.println("------------");
// query index
SolrQuery query1 = new SolrQuery();
query1.set("q", "id:" + myid);
QueryResponse response1 = solr.query(mycollection, query1);
SolrDocumentList list1 = response1.getResults();
System.out.println(list1);
System.out.println("------------");
System.out.println("Done");
And the output looks like:
{responseHeader={status=0,QTime=411}}
And took a really really long time to print the -------, basically means the commit take a super long time.

Convert jsonarray to string

I want to put value of first query (course) to other query (mapcategory) in my code.
The value in first query not convert to string in second query.
public String query() throws Exception {
JSONObject inputJsonObj = new JSONObject();
JSONArray inputArray = new JSONArray();
String array=null;
Database db = new Database();
db.my_Connection();
List<HashMap> course = db.Query("select NAME from file where visual_code='1'");
inputArray = new JSONArray();
if(course !=null)
for (int i=0; i<course.size(); i++){
HashMap data=(HashMap)course.get(i);
inputJsonObj = new JSONObject();
inputJsonObj.put("NAME",(String) data.get("NAME"));
inputArray.put(inputJsonObj);
}
arra=String.valueOf(course);
List<HashMap> mapCategory = db.Query("select PATH from c_"+arra+"_item item" +
" LEFT JOIN cr_"+arra+"_document doc" +
" ON item.ref = doc.id WHERE visibility = 1 AND tool = 'document'");
inputArray = new JSONArray();
if(mapCategory !=null)
for (int i=0; i<mapCategory.size(); i++){
HashMap data=(HashMap)mapCategory.get(i);
inputJsonObj = new JSONObject();
inputJsonObj.put("PATH",(String) data.get("PATH"));
inputArray.put(inputJsonObj);
}
db.Close_Connection();
return inputArray.toString();
}
i debuging this code and get an eror
HTTP Status 500 -
com.sun.jersey.api.container.MappableContainerException:
com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: You have an error
in your SQL syntax; check the manual that corresponds to your MySQL
server version for the right syntax to use near
'{NAME=TUTORIAL}_item_property item LEFT JOIN
crs_{NAME=TUTORIAL}_d' at line 1
I want get value only "TUTORIAL" in result.
your problem is in arra=String.valueOf(course); cuz you are save string representation of hashmap in string instead you need a value from hashmap.
replace
arra=String.valueOf(course);
to
arra=course.get("NAME");

Retrieve more than 19 results with AbderaClient? (pagination issue I guess)

The manual query in the browser gives 1305 results:
http://www.gahetna.nl/beeldbank-api/opensearch/?q=Leiden
But my code with Abdera (below) returns just 19 results. This must be due to pagination. How do I access all the pages of results? Thx!
List<NationaalArchiefDoc> docs = new ArrayList();
Abdera abdera = new Abdera();
AbderaClient client = new AbderaClient(abdera);
searchString = searchString.replaceAll(" ", "+");
ClientResponse resp = client.get("http://www.gahetna.nl/beeldbank-api/opensearch/?q=" + searchString.trim());
if (resp.getType() == ResponseType.SUCCESS) {
Document docAbdera = resp.getDocument();
Element elementAbdera = docAbdera.getRoot();
Iterator<Element> elementIterator = elementAbdera.getElements().get(0).getElements().iterator();
System.out.println("number of docs found: " + elementAbdera.getElements().get(0).getElements().size());
//looping through the 19 items
while (elementIterator.hasNext()) {
NationaalArchiefDoc doc = new NationaalArchiefDoc();
Element element1 = elementIterator.next();
// more actions in the loop...
}
}

Return all records in one query in Elasticsearch

I have a database in elastic search and want to get all records on my web site page. I wrote a bean, which connects to the elastic search node, searches records and returns some response. My simple java code, which does the searching, is
SearchResponse response = getClient().prepareSearch(indexName)
.setTypes(typeName)
.setQuery(queryString("\*:*"))
.setExplain(true)
.execute().actionGet();
But Elasticsearch set default size to 10 and I have 10 hits in response. There are more than 10 records in my database. If I set size to Integer.MAX_VALUE my search becomes very slow and this not what I want.
How can I get all the records in one action in an acceptable amount of time without setting size of response?
public List<Map<String, Object>> getAllDocs(){
int scrollSize = 1000;
List<Map<String,Object>> esData = new ArrayList<Map<String,Object>>();
SearchResponse response = null;
int i = 0;
while( response == null || response.getHits().hits().length != 0){
response = client.prepareSearch(indexName)
.setTypes(typeName)
.setQuery(QueryBuilders.matchAllQuery())
.setSize(scrollSize)
.setFrom(i * scrollSize)
.execute()
.actionGet();
for(SearchHit hit : response.getHits()){
esData.add(hit.getSource());
}
i++;
}
return esData;
}
The current highest-ranked answer works, but it requires loading the whole list of results in memory, which can cause memory issues for large result sets, and is in any case unnecessary.
I created a Java class that implements a nice Iterator over SearchHits, that allows to iterate through all results. Internally, it handles pagination by issuing queries that include the from: field, and it only keeps in memory one page of results.
Usage:
// build your query here -- no need for setFrom(int)
SearchRequestBuilder requestBuilder = client.prepareSearch(indexName)
.setTypes(typeName)
.setQuery(QueryBuilders.matchAllQuery())
SearchHitIterator hitIterator = new SearchHitIterator(requestBuilder);
while (hitIterator.hasNext()) {
SearchHit hit = hitIterator.next();
// process your hit
}
Note that, when creating your SearchRequestBuilder, you don't need to call setFrom(int), as this will be done interally by the SearchHitIterator. If you want to specify the size of a page (i.e. the number of search hits per page), you can call setSize(int), otherwise ElasticSearch's default value is used.
SearchHitIterator:
import java.util.Iterator;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.search.SearchHit;
public class SearchHitIterator implements Iterator<SearchHit> {
private final SearchRequestBuilder initialRequest;
private int searchHitCounter;
private SearchHit[] currentPageResults;
private int currentResultIndex;
public SearchHitIterator(SearchRequestBuilder initialRequest) {
this.initialRequest = initialRequest;
this.searchHitCounter = 0;
this.currentResultIndex = -1;
}
#Override
public boolean hasNext() {
if (currentPageResults == null || currentResultIndex + 1 >= currentPageResults.length) {
SearchRequestBuilder paginatedRequestBuilder = initialRequest.setFrom(searchHitCounter);
SearchResponse response = paginatedRequestBuilder.execute().actionGet();
currentPageResults = response.getHits().getHits();
if (currentPageResults.length < 1) return false;
currentResultIndex = -1;
}
return true;
}
#Override
public SearchHit next() {
if (!hasNext()) return null;
currentResultIndex++;
searchHitCounter++;
return currentPageResults[currentResultIndex];
}
}
In fact, realizing how convenient it is to have such a class, I wonder why ElasticSearch's Java client does not offer something similar.
You could use scrolling API.
The other suggestion of using a searchhit iterator would also work great,but only when you don't want to update those hits.
import static org.elasticsearch.index.query.QueryBuilders.*;
QueryBuilder qb = termQuery("multi", "test");
SearchResponse scrollResp = client.prepareSearch(test)
.addSort(FieldSortBuilder.DOC_FIELD_NAME, SortOrder.ASC)
.setScroll(new TimeValue(60000))
.setQuery(qb)
.setSize(100).execute().actionGet(); //max of 100 hits will be returned for each scroll
//Scroll until no hits are returned
do {
for (SearchHit hit : scrollResp.getHits().getHits()) {
//Handle the hit...
}
scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
} while(scrollResp.getHits().getHits().length != 0); // Zero hits mark the end of the scroll and the while loop.
It has been long since you asked this question, and I would like to post my answer for future readers.
As mentioned above answers, it is better to load documents with size and start when you have thousands of documents in the index. In my project, the search loads 50 results as default size and starting from zero index, if a user wants to load more data, then the next 50 results will be loaded. Here what I have done in the code:
public List<CourseDto> searchAllCourses(int startDocument) {
final int searchSize = 50;
final SearchRequest searchRequest = new SearchRequest("course_index");
final SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchAllQuery());
if (startDocument != 0) {
startDocument += searchSize;
}
searchSourceBuilder.from(startDocument);
searchSourceBuilder.size(searchSize);
// sort the document
searchSourceBuilder.sort(new FieldSortBuilder("publishedDate").order(SortOrder.ASC));
searchRequest.source(searchSourceBuilder);
List<CourseDto> courseList = new ArrayList<>();
try {
final SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
final SearchHits hits = searchResponse.getHits();
// Do you want to know how many documents (results) are returned? here is you get:
TotalHits totalHits = hits.getTotalHits();
long numHits = totalHits.value;
final SearchHit[] searchHits = hits.getHits();
final ObjectMapper mapper = new ObjectMapper();
for (SearchHit hit : searchHits) {
// convert json object to CourseDto
courseList.add(mapper.readValue(hit.getSourceAsString(), CourseDto.class));
}
} catch (IOException e) {
logger.error("Cannot search by all mach. " + e);
}
return courseList;
}
Info:
- Elasticsearch version 7.5.0
- Java High Level REST Client is used as client.
I Hope, this will be useful for someone.
You will have to trade off the number of returned results vs the time you want the user to wait and the amount of available server memory. If you've indexed 1,000,000 documents, there isn't a realistic way to retrieve all those results in one request. I'm assuming your results are for one user. You'll have to consider how the system will perform under load.
To query all, you should build a CountRequestBuilder to get the total number of records (by CountResponse) then set the number back to size of your seach request.
If your primary focus is on exporting all records you might want to go for a solution which does not require any kind of sorting, as sorting is an expensive operation.
You could use the scan and scroll approach with ElasticsearchCRUD as described here.
For version 6.3.2, the following worked:
public List<Map<String, Object>> getAllDocs(String indexName, String searchType) throws FileNotFoundException, UnsupportedEncodingException{
int scrollSize = 1000;
List<Map<String,Object>> esData = new ArrayList<>();
SearchResponse response = null;
int i=0;
response = client.prepareSearch(indexName)
.setScroll(new TimeValue(60000))
.setTypes(searchType) // The document types to execute the search against. Defaults to be executed against all types.
.setQuery(QueryBuilders.matchAllQuery())
.setSize(scrollSize).get(); //max of 100 hits will be returned for each scroll
//Scroll until no hits are returned
do {
for (SearchHit hit : response.getHits().getHits()) {
++i;
System.out.println (i + " " + hit.getId());
writer.println(i + " " + hit.getId());
}
System.out.println(i);
response = client.prepareSearchScroll(response.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
} while(response.getHits().getHits().length != 0); // Zero hits mark the end of the scroll and the while loop.
return esData;
}
SearchResponse response = restHighLevelClient.search(new SearchRequest("Index_Name"), RequestOptions.DEFAULT);
SearchHit[] hits = response.getHits().getHits();
1.set the max size ,e.g: MAX_INT_VALUE;
private static final int MAXSIZE=1000000;
#Override
public List getAllSaleCityByCity(int cityId) throws Exception {
List<EsSaleCity> list=new ArrayList<EsSaleCity>();
Client client=EsFactory.getClient();
SearchResponse response= client.prepareSearch(getIndex(EsSaleCity.class)).setTypes(getType(EsSaleCity.class)).setSize(MAXSIZE)
.setQuery(QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(), FilterBuilders.boolFilter()
.must(FilterBuilders.termFilter("cityId", cityId)))).execute().actionGet();
SearchHits searchHits=response.getHits();
SearchHit[] hits=searchHits.getHits();
for(SearchHit hit:hits){
Map<String, Object> resultMap=hit.getSource();
EsSaleCity saleCity=setEntity(resultMap, EsSaleCity.class);
list.add(saleCity);
}
return list;
}
2.count the ES before you search
CountResponse countResponse = client.prepareCount(getIndex(EsSaleCity.class)).setTypes(getType(EsSaleCity.class)).setQuery(queryBuilder).execute().actionGet();
int size = (int)countResponse.getCount();//this is you want size;
then you can
SearchResponse response= client.prepareSearch(getIndex(EsSaleCity.class)).setTypes(getType(EsSaleCity.class)).setSize(size);

Categories