SolrCloud Commit Slow for Indexing One Document

SolrCloud Commit Slow for Indexing One Document - java

I am trying to use SolrJ to index one document to SolrCloud and query it. However, even if I am trying to index only one document, the commit is taking a really long time (10 minutes-ish to commit).
Here is my code and can anyone help me explain what is really going on here?
String zkHostString = "node1.datafireball.com:2181/solr";
SolrClient solr = new CloudSolrClient(zkHostString);
String mycollection = "mycollection";
java.util.Date date = new java.util.Date();
// index document
SolrInputDocument document = new SolrInputDocument();
String myid = "myweirdid" + Long.toString(date.getTime());
document.addField("id", myid);
document.addField("mfr", "datafireball_mfr");
document.addField("mpn", "datafireball_mpn");
UpdateResponse indexResponse = solr.add(mycollection, document);
System.out.println(indexResponse);
solr.commit(mycollection);
System.out.println("------------");
// query index
SolrQuery query1 = new SolrQuery();
query1.set("q", "id:" + myid);
QueryResponse response1 = solr.query(mycollection, query1);
SolrDocumentList list1 = response1.getResults();
System.out.println(list1);
System.out.println("------------");
System.out.println("Done");
And the output looks like:
{responseHeader={status=0,QTime=411}}
And took a really really long time to print the -------, basically means the commit take a super long time.

Related

How to read MongoDB array in java

I wrote query in MongoDB which retrieved two columns, one for id and the other is array. I have tried to read the array using Java but I cannot.
try {
Bson filter = eq("_id", "1260718680159199238");
Bson project = eq("Tweets.Text", 1L);
MongoClient mongoClient = new MongoClient(
new MongoClientURI(
"mongodb://localhost:27017/?readPreference=primary&appname=MongoDB%20Compass%20Isolated%20Edition&ssl=false"
)
);
MongoDatabase database = mongoClient.getDatabase("Amazon-tweets");
MongoCollection<Document> collection = database.getCollection("tweets");
FindIterable<Document> result = collection.find(filter).projection(project);
for (Document doc : result) {
String s = doc.getString("Tweets.1");
System.out.println("orig " + s);
}
}//END try
catch (Exception e) {
}//

You can't use the dot notation to retrieve values like that post query.
Depending on the driver version you're using you can use either of these solutions:
(I'm assuming each tweet is a separate document here, but same should apply if it's something different)
List<Document> tweets = (List<Document>) doc.get("Tweets");
List<Document> tweets = doc.getList("Tweets", Document.class); // since 3.10
tweets.forEach(...
You can find the documentation here: https://mongodb.github.io/mongo-java-driver/

Prefix search using lucene

I am trying to do autocomplete using lucene search functionality. I have the following code which searches by the query prefix but along with that it also gives me all the sentences containing that word while I want it to display only sentence or word starting exactly with that prefix.
ex: m
--holiday mansion houseboat
--eye muscles
--movies of all time
--machine
I want it to show only last 2 queries. How to do it am stucked here also I am new to lucene. Please can any one help me in this. Thanks in advance.
addDoc(IndexWriter w, String title, String isbn) throws IOException {
Document doc = new Document();
doc.add(new Field("title", title, Field.Store.YES, Field.Index.ANALYZED));
// use a string field for isbn because we don't want it tokenized
doc.add(new Field("isbn", isbn, Field.Store.YES, Field.Index.ANALYZED));
w.addDocument(doc);
}
Main:
try {
// 0. Specify the analyzer for tokenizing text.
// The same analyzer should be used for indexing and searching
StandardAnalyzer analyzer = new StandardAnalyzer();
// 1. create the index
Directory index = FSDirectory.open(new File(indexDir));
IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(Version.LUCENE_30), true, IndexWriter.MaxFieldLength.UNLIMITED); //3
for (int i = 0; i < source.size(); i++) {
addDoc(writer, source.get(i), + (i + 1) + "z");
}
writer.close();
// 2. query
Term term = new Term("title", querystr);
//create the term query object
PrefixQuery query = new PrefixQuery(term);
// 3. search
int hitsPerPage = 20;
IndexReader reader = IndexReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
searcher.search(query, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
// 4. Get results
for (int i = 0; i < hits.length; ++i) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println(d.get("title"));
}
reader.close();
} catch (Exception e) {
System.out.println("Exception (LuceneAlgo.getSimilarString()) : " + e);
}
}
}

I see two solutions:
as suggested by Yahnoosh, save the title field twice, Once as TextField (=analyzed) and once as StringField (not analyzed)
save it just as TextField, but When Querying use SpanFirstQuery
// 2. query
Term term = new Term("title", querystr);
//create the term query object
PrefixQuery pq = new PrefixQuery(term);
SpanQuery wrapper = new SpanMultiTermQueryWrapper<PrefixQuery>(pq);
Query final = new SpanFirstQuery(wrapper, 1);

If I understand your scenario correctly, you want to autocomplete on the title field.
The solution is to have two fields: one analyzed, to enable querying over it, one non-analyzed to have titles indexed without breaking them into individual terms.
Your autocomplete logic should issue prefix queries against the non-analyzed field to match only on the first word. Your term queries should be issued against the analyzed field for matches within the title.
I hope that makes sense.

Complex Mongo DB query - Sub documents

I have the document structured as shown
I would like to get a collection of moduleDataItems which has version say more than 0.
This is my attempt :
Query qu = Query.query(Criteria.where("appKey")
.is("MOCK_APP").and("modules._id")
.is("APP_1_MOD_1")
.and("modules.moduleDataItems.version")
.gt(0));
List<DataItem> dList = mongoTemplate.find(qu,
DataItem.class,
ApplicationConstants.MONGO_APPLICATION_COLLECTION_NAME);
I m pretty sure I'm not doing the right thing. I do not get any DataItem in the result.
The classes represent the json structure.
Any help is appreciated.
Thanks

Finally solved it using this -
DBObject unwindParam = new BasicDBObject("$unwind","$dataItems");
DBObject matchParam = new BasicDBObject("$match",
new BasicDBObject("dataItems.version",
new BasicDBObject("$gt",requestedModule.getVersion())));
DBObject fields = new BasicDBObject("dataItems", 1);
DBObject projectParam = new BasicDBObject("$project", fields);
AggregationOutput output = mongoTemplate.getCollection(
"appModules").aggregate(
unwindParam, matchParam,projectParam);
CommandResult updatedData = output.getCommandResult();
BasicDBList resList = (BasicDBList) updatedData.get("result");

Get All Result in Solr with Solrj

I want to get all result with solrj, I add 10 document to Solr, I don't get any exception, but if I add more than 10 document to SolrI get exception. I search that, I get this exception for this, in http://localhost:8983/solr/browse 10 document in first page,11th document go to second page. How I can get all result?
String qry="*:*";
CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr");
QueryResponse rsp=server.query(new SolrQuery(qry));
SolrDocumentList docs=rsp.getResults();
for(int i=0;i<docs.getNumFound();i++){
System.out.println(docs.get(i));
}
Exception in thread "AWT-EventQueue-0" java.lang.IndexOutOfBoundsException: Index: 10, Size: 10

Integer start = 0;
query.setStart(start);
QueryResponse response = server.query(query);
SolrDocumentList rs = response.getResults();
long numFound = rs.getNumFound();
int current = 0;
while (current < numFound) {
ListIterator<SolrDocument> iter = rs.listIterator();
while (iter.hasNext()) {
current++;
System.out.println("************************************************************** " + current + " " + numFound);
SolrDocument doc = iter.next();
Map<String, Collection<Object>> values = doc.getFieldValuesMap();
Iterator<String> names = doc.getFieldNames().iterator();
while (names.hasNext()) {
String name = names.next();
System.out.print(name);
System.out.print(" = ");
Collection<Object> vals = values.get(name);
Iterator<Object> valsIter = vals.iterator();
while (valsIter.hasNext()) {
Object obj = valsIter.next();
System.out.println(obj.toString());
}
}
}
query.setStart(current);
response = server.query(query);
rs = response.getResults();
numFound = rs.getNumFound();
}
}

An easier way:
CloudSolrServer server = new CloudSolrServer(solrZKServerUrl);
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
query.setRows(Integer.MAX_VALUE);
QueryResponse rsp;
rsp = server.query(query, METHOD.POST);
SolrDocumentList docs = rsp.getResults();
for (SolrDocument doc : docs) {
Collection<String> fieldNames = doc.getFieldNames();
for (String s: fieldNames) {
System.out.println(doc.getFieldValue(s));
}
}

numFound gives you the total number of results that matched the Query.
However, by default Solr will return only top 10 results which is controlled by parameter rows.
You are trying to iterate over numFound, However as the results returned are only 10 it fails.
You should use the rows parameter for Iteration.
For getting the next set of results, you would need to requery Solr with a different start parameter. This is to support pagination so that you don't have to pull all the results at one go which is a very heavy operation.

If you refactor your code like this it will work
String qry="*:*";
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
query.setRows(Integer.MAX_VALUE); //Add me to avoid IndexOutOfBoundExc
CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr");
QueryResponse rsp=server.query(query);
SolrDocumentList docs=rsp.getResults();
for(int i=0;i<docs.getNumFound();i++){
System.out.println(docs.get(i));
}
The answer to why it's quite simple.
The response is telling you that there are getNumFound() matching documents,
but if you do not specify in your query how many of them the response must carry, this limit is automatically setted to 10,
ending up
fetching only the top 10 documents out of getNumFound() documents found
For this reason the docs list will have just 10 elements and trying to do the get of the i-th elementh with i > 9 (Eg 10) will take you to a
java.lang.IndexOutOfBoundsException
just like you are experimenting.
P.S i suggest you to use the for iterator just like #Chen Sheng-Lun did.
P.P.S at first this drove me crazy too.

Indexing and Searching Date in Lucene

I tried it to index date with DateTools.dateToString() method. Its working properly for indexing as well as searching.
But my already indexed data which has some references is in such a way that it has indexed Date as a new Date().getTime().
So my problem is how to perform RangeSearch Query on this data...
Any solution to this???
Thanks in Advance.

You need to use a TermRangeQuery on your date field. That field always needs to be indexed with DateTools.dateToString() for it to work properly. Here's a full example of indexing and searching on a date range with Lucene 3.0:
public class LuceneDateRange {
public static void main(String[] args) throws Exception {
// setup Lucene to use an in-memory index
Directory directory = new RAMDirectory();
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
MaxFieldLength mlf = MaxFieldLength.UNLIMITED;
IndexWriter writer = new IndexWriter(directory, analyzer, true, mlf);
// use the current time as the base of dates for this example
long baseTime = System.currentTimeMillis();
// index 10 documents with 1 second between dates
for (int i = 0; i < 10; i++) {
Document doc = new Document();
String id = String.valueOf(i);
String date = buildDate(baseTime + i * 1000);
doc.add(new Field("id", id, Store.YES, Index.NOT_ANALYZED));
doc.add(new Field("date", date, Store.YES, Index.NOT_ANALYZED));
writer.addDocument(doc);
}
writer.close();
// search for documents from 5 to 8 seconds after base, inclusive
IndexSearcher searcher = new IndexSearcher(directory);
String lowerDate = buildDate(baseTime + 5000);
String upperDate = buildDate(baseTime + 8000);
boolean includeLower = true;
boolean includeUpper = true;
TermRangeQuery query = new TermRangeQuery("date",
lowerDate, upperDate, includeLower, includeUpper);
// display search results
TopDocs topDocs = searcher.search(query, 10);
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
Document doc = searcher.doc(scoreDoc.doc);
System.out.println(doc);
}
}
public static String buildDate(long time) {
return DateTools.dateToString(new Date(time), Resolution.SECOND);
}
}

You'll get much better search performance if you use a NumericField for your date, and then NumericRangeFilter/Query to do the range search.
You just have to encode your date as a long or int. One simple way is to call the .getTime() method of your Date, but this may be far more resolution (milli-seconds) than you need. If you only need down to the day, you can encode it as YYYYMMDD integer.
Then, at search time, do the same conversion on your start/end Dates and run NumericRangeQuery/Filter.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

SolrCloud Commit Slow for Indexing One Document - java

Related

How to read MongoDB array in java

Prefix search using lucene

Complex Mongo DB query - Sub documents

Get All Result in Solr with Solrj

Indexing and Searching Date in Lucene

Categories

Resources