Cloudant With Lucene Search Fails To Sort As Expected

Cloudant With Lucene Search Fails To Sort As Expected - java

I am pretty new to Cloudant but have developed in SQL on DB2 for some time. I am running into an issue where I *think I am using the Lucene query engine and Cloudant indexes to return results from my query. The query gets all the results I want however, they are not sorted correctly. I am wanting to sort the results alphabetically based on the "officialName" field. Because we are only returning the first 21 out of n results (and then we have a js handler to call more results via paging) we cannot sort in the java side but must do so via Cloudant. Our application is running Java and executed using IBM's Bluemix and WebSphere Liberty Profile. I have packaged the cloudant-client-2.8.0.jar and cloudant-HTTP-2.8.0.jar files to access the Cloudant database. We have many queries that are working so the connection itself is fine.
Here is the code that builds the Cloudant Client search object:
Search search = getCloudantDbForOurApp().search("bySearchPP-ddoc/bySearchPP-indx").includeDocs(true);
SearchResult<DeliverableDetails> result = search.sort(getSortJsonString(searchString)).querySearchResult(getSearchQuery(searchString), DeliverableDetails.class);
Here is the method getSortJsonString. It should be noted that the search string is typically NOT null. I should also note that leaving in or taking out the -score attribute does effect the search but never achieves alpha sorted results.
private String getSortJsonString(String searchString) {
String sortJson;
if (searchString != null && !searchString.isEmpty()) {
sortJson = "[\"-<score>\",\"officialName<string>\"]";
} else {
sortJson = "\"officialName<string>\"";
}
return sortJson;
}
Here is the getSearchQuery method's relevant code for reference:
...
query += "(";
query += "officialName:" + searchString + "^3";
query += " OR " + "deliverableName:" + searchString + "^3";
query += " OR " + "alias:" + searchString + "^3";
query += " OR " + "contact:" + searchString;
query += ")";
....
// The query will look like below, where<search_string> is some user inputted value
// (officialName:<search_string>*^3 OR deliverableName:<search_string>*^3 OR alias:<search_string>*^3 OR contact:<search_string>*)
I have setup a design doc and index using the Cloudant dashboard as follows:
{
"_id": "_design/bySearchPP-ddoc",
"_rev": "4-a91fc4ddeccc998c58adb487a121c168",
"views": {},
"language": "javascript",
"indexes": {
"bySearchPP-indx": {
"analyzer": {
"name": "perfield",
"default": "standard",
"fields": {
"alias": "simple",
"contact": "simple",
"deploymentTarget": "keyword",
"businessUnit": "keyword",
"division": "keyword",
"officialName": "simple",
"deliverableName": "simple",
"pid": "keyword"
}
},
"index": "function(doc) {
if (doc.docType === \"Page\") {
index(\"officialName\", doc.officialName, {\"store\":true, \"boost\":4.0});
index(\"deliverableName\", doc.deliverableName, {\"store\":true, \"boost\":3.0});
if (doc.aliases) {
for (var i in doc.aliases) {
index(\"alias\", doc.aliases[i], {\"store\":true, \"boost\":2.0});
}
}
if (doc.allContacts) {
for (var j in doc.allContacts) {
index(\"contact\", doc.allContacts[j], {\"store\":true, \"boost\":0.5});
}
}
index(\"deploymentTarget\", doc.deploymentTarget, {\"store\":true});
index(\"businessUnit\", doc.businessUnit, {\"store\":true});
index(\"division\", doc.division, {\"store\":true});
index(\"pid\", doc.pid.toLowerCase(), {\"store\":true});
}
}"
}
}
}
I am not sure if the sort is working and just not working how I want it to or if I have misconfigured something. Either way, any help would be greatly appreciated. -Doug

Solved my own issue w/ help from comments above. Apparently everything was setup correctly but once I debug per #markwatsonatx I could see the field I wanted wasn't being returned. Did some digging online and apparently for sort the field must be both indexed and NOT tokenized. Thus I checked my index and noticed that the filed was being analyzed by the Simple analyzer. Changed it to the Keyword and the sort works as expected. Hoep this helps someone.

Related

Query to search value in side array of object

I want to apply criteria inside object of array if it matches, but I am not able to find any documentation or example where I can find that using spring-data-cosmosdb library. I am using 2.3.0 version of library.
Example of Json
{
"id" : 1,
"address" : [
{
"street" : "abc"
...
},
{
"street" : "efg"
...
}
]
}
I wan to search all documents in which address is having street name equals "abc". Below is spring boot code that I am using to search in cosmosDb. But it is not returning expected results.
List<Criteria> criteriaList = new ArrayList<>();
criteriaList.add(Criteria.getInstance(CriteriaType.IN, "addresses.street", Collections.singletonList("abc")));
List<User> users = cosmosTemplate.find(new DocumentQuery(criteriaList.get(0), CriteriaType.AND)), User.class, COLLECTION_NAME);
I also tried with address[0].street, but it is throwing exception of operation not supported.

Strongly recommend upgrading to spring-data-cosmosdb v3 (at least version 3.22.0). The v2 connector has been legacy for some time. Using the latest connector, the below would accomplish your goal.
Criteria filterCriteria = Criteria.getInstance(CriteriaType.ARRAY_CONTAINS, "address",
Collections.singletonList(new ObjectMapper().readTree("{\"street\":\"abc\"}")),
Part.IgnoreCaseType.NEVER);
CosmosQuery cosmosQuery = new CosmosQuery(filterCriteria);
Iterable<User> results = cosmosTemplate.find(cosmosQuery, User.class, COLLECTION_NAME);
for (User user : results)
System.out.println("doc id: " + user.getId());

Spring Boot Pagination

I'm facing the problem for pagination data. Page data is getting calculated using the outer array dXPRecommendationResponses where I want it should be calculated through the nested array recommendations. I've mentioned the response data in result section.
I tried to change the page data calculation but it is getting calculated using the the data which we are passing in PageImpl() constructor.
I've used this approach to paginate the data which we are getting from third party API.
But it is getting calculated using the dxpRecommendationslist.
#This is the code through I need to generate the response
List<DXPRecommendationResponse> dxpRecommendationslist = new ArrayList<>();
List<DXPActivity> dxpActivities = getThirdResponse(pageable, correlationId,
dxpRecommendationslist, reservationGuestId, reservationId, nbxRecommendationRequest);
return new PageImpl<>(dxpRecommendationslist, pageable, dxpActivities.size());
#This method is used to get data from third party
public List<DXPActivity> getThirdResponse(final Pageable pageable, final String correlationId,
List<DXPRecommendationResponse> list, String reservationGuestId, String reservationId,
NBXRecommendationRequest nbxRecommendationRequest) {
List<DXPActivity> dxpActivities = new ArrayList<>();
NBXRecommendationResponse nbxRecommendationResponse = vVNBXRecommendationService
.getCalendarRecommendation(nbxRecommendationRequest, correlationId);
if (nbxRecommendationResponse != null) {
DXPRecommendationResponse dxpRecommendationResponse = new DXPRecommendationResponse();
dxpRecommendationResponse.setReservationGuestId(reservationGuestId);
dxpRecommendationResponse.setReservationNumber(reservationId);
dxpRecommendationResponse.setRecommendationType("CalendarType");
dxpRecommendationResponse.setDateTime(new Date());
populateActivities(dxpActivities, nbxRecommendationResponse);
List<DXPActivity> filteredList;
if (!CollectionUtils.isEmpty(dxpActivities)
&& dxpActivities.size() >= (pageable.getPageSize() * pageable.getPageNumber())) {
filteredList = Lists.partition(dxpActivities, pageable.getPageSize()).get(pageable.getPageNumber());
} else {
filteredList = new ArrayList<DXPActivity>();
}
dxpRecommendationResponse.setRecommendations(filteredList);
list.add(dxpRecommendationResponse);
}
return dxpActivities;
}
#This method is used to populate data
private void populateActivities(List<DXPActivity> dxpActivities,
NBXRecommendationResponse nbxRecommendationResponse) {
for (Activity activity :nbxRecommendationResponse.getCalendarRecommendation().getRecommendations().getActivities()) {
DXPActivity dxpActivity = new DXPActivity();
orikaMapper.map(activity, dxpActivity);
dxpActivities.add(dxpActivity);
}
}
The response data:
{
"_embedded": {
"dXPRecommendationResponses": [
{
"recommendationType": "CalendarType",
"reservationGuestId": "525dab66-1492-4908-a3bf-b5de558368e5",
"reservationNumber": "3a39f9ad-7e34-4bdb-91eb-b907fd6986c7",
"dateTime": "2019-08-19T14:38:18.413",
"recommendations": [
{
"productCode": "BIKE2006111000",
"activityName": "Bimini Bike Tour",
"recommendationId": "1565948843492_410387839_BIKE2006111000_cal",
"categoryCode": "DARING",
"activityStartTime": "2020-06-11T08:30:00.000",
"activityEndTime": "2020-06-11T11:30:00.000",
"activityDescription": "Bimini Bike Tour",
"sequence": 34.0,
"packageId": 103806,
"sourceId": "BIMINI BIKE",
"levelOfActivity": "EASY"
},
{
"productCode": "CUL2006110900",
"activityName": "Bimini Culinary Tour",
"recommendationId":"156594884349,
"categoryCode": "CULTURED",
"activityStartTime": "2020-06-11T07:30:00.000",
"activityEndTime": "2020-06-11T12:30:00.000",
"activityDescription": "Bimini Culinary Tour",
"sequence": 29.0,
"packageId": 103940,
"sourceId": "BIMINI CUL",
"levelOfActivity": "MODERATE"
}
]
}
]
},
"page": {
"size": 10,
"totalElements": 1,
"totalPages": 1,
"number": 0
}
}

The problem here is implementation of PageImpl constructor and how the resulting total is being computed here. Below is code taken from the constructor:
this.total = pageable.toOptional().filter(it -> !content.isEmpty())//
.filter(it -> it.getOffset() + it.getPageSize() > total)//
.map(it -> it.getOffset() + content.size())//
.orElse(total);
The important line here is second one, as described in doc comment, it is insurance against inconsistensies. It checks if you are on the last page and if so, it will use third line to compute resulting total. Therefore totally omitting total given by you. If you want to rewrite this behavior you should implement your Page which uses only given total.
BUT, I don't think it is semantically correct. You are mixing together things which probably should not be mixed (page object of one type, but the actual pagination done on another). The list given as pageable always has one element and you are paging subelements which does not seem right, is there even need to paginate the subelements?
One of the possible resolutions would be to make pageable only subelements by specifying Page<DXPActivity> recommendations member directly inside DXPRecommendationResponse and returning non-pageable DXPRecommendationResponse. But again, it seems a bit off. It really depens on what you are trying to build here and what is the logic behind that.

Mongo DB Aggregate Query returns in Batches

I have the following code, :
CommandResult cr = db.doEval("db." + collectionName + ".aggregate("
+ query + ")");
Command result is giving in batches, where I need to get in single value.
Batch Result:{ "serverUsed" : "/servername" , "retval" : { **"_firstBatch**" : [ { "visitor_localdate" : 1367260200} , { "visitor_localdate"
Expected Result:
{ "serverUsed" : "/servername" , "retval" : { "**result**" : [ { "visitor_localdate" : 1367260200} , { "visitor_localdate"
The Mongo DB we are using is 2.6.4 with 64 bit.
Can any one help with this?. I am guessing there is some Configuration issue.

Your doing this all wrong. You don't need to jump through hoops like this just to get a dynamic collection name. Just use this syntax instead:
var collectionName = "collection";
var cursor = db[collectionName].aggregate( pipeline )
Where pipeline also is just the array of pipeline stage documents, ie:
var pipeline = [{ "$match": { } }, { "$group": { "_id": "$field" } }];
At any rate the .aggregate() method returns a cursor, you can iterate the results with standard methods:
while ( cursor.hasNext() ) {
var doc = cursor.next();
// do something with doc
}
But you are actually doing this in Java and not JavaScript, so from the base driver with a connection on object db you just do this:
DBObject match = new BasicDBObject("$match", new BasicDBObject());
DBObject group = new BasicDBObject("$group", new BasicDBObject());
List pipeline = new ArrayList();
pipeline.add(match);
pipeline.add(group);
AggregationOutput output = db.getCollection("collectionName").aggregate(pipeline);
The pipeline is basically a list interface of DBObject information where you construct the BSON documents representing the operations required.
The result here is of AggregationOutput, but cursor like results are obtainable by additionally supplying AggregationOptions as an additional option to pipeline

There was something related to bacth added in mongodb 2.6, more details here: http://docs.mongodb.org/manual/reference/method/db.collection.aggregate/#example-aggregate-method-initial-batch-size
From the link
db.orders.aggregate(
[
{ $match: { status: "A" } },
{ $group: { _id: "$cust_id", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } },
{ $limit: 2 }
],
{
cursor: { batchSize: 0 }
}
)
You might be having a cursor batch in your aggregate query

The answer from Neil Lunn is not wrong but I want to add that the result you were expecting is a result for mongodb versions earlier than v2.6.
Before v2.6, the aggregate function returned just one document containing a result field, which holds an array of documents returned by the pipeline, and an ok field, which holds the value 1, indicating success.
However, from mongodb v2.6 on, the aggregate function returns a cursor (if $out option was not used).
See examples in mongodb v2.6 documentation and compare how it worked before v2.6 (i.e. in v2.4):

How to bind an hstore[] value in Java

I'm trying to store people's telephone numbers and addresses in a database table. I would like to support multiple phone numbers and addresses and expect the format to be different in different countries. I've decided to use hstore to allow that flexibility and allow efficient querying by specific fields. As it stands I can receive the values from database, but could not find a way to insert them from Java. The table (simplified) looks like that:
CREATE TABLE IF NOT EXISTS contacts ( "
+ "id uuid NOT NULL, "
+ "title character varying NOT NULL DEFAULT '', "
+ "first_name character varying NOT NULL, "
+ "last_name character varying NOT NULL, "
+ "phones hstore[] NOT NULL DEFAULT '{}', "
+ "addresses hstore[] NOT NULL DEFAULT '{}')"
I have created a custom JDBI Binder to bind the values, but however I try I can't get the statement to execute. Currently Binder code snippet looks like this:
#Override
public void bind(SQLStatement<?> q, BindContactBean bind, ContactBean bean) {
q.bind("phones",
getHstoreArray(q, PhoneDetailMapper.toMapArray(bean.phones.get())));
q.bind("addresses",
getHstoreArray(q, AddressDetailMapper.toMapArray(bean.addresses.get())));
The getHstoreArray function is a helper that converts java Array into SQL array and looks like this:
private Array getHstoreArray(SQLStatement<?> q, Map<String, String>[] map) {
try {
return q.getContext().getConnection().createArrayOf("hstore", map);
} catch (SQLException e) {
throw new IllegalArgumentException(e);
}
}
I think the problem is in encoding of the data. For example, for data (in JSON notation for simplicity)
{
"firstName": "Maximum",
"lastName": "Details",
"status": "active",
"phones": [{
"type": "mobile",
"number": "0777 66 55 44"
}]
}
the query is expanded to:
INSERT INTO contacts (
id, first_name, last_name, status, phones )
VALUES ( '9be1a040-b408-11e3-bb43-00231832fa86', 'Maximum', 'Details', 4,
'{"{type=mobile, extracted_number=extracted, number=07777 66 55 44}"}'
)
and if I try to run it from PGAdmin's SQL editor the error returned is:
ERROR: Syntax error near 'm' at position 6
LINE 5: '{"{type=mobile, extracted_number=extracted, number=07777 66...
^
********** Error **********
ERROR: Syntax error near 'm' at position 6
SQL state: XX000
Character: 179
I have considered using JSON instead of hstore[], but that would make querying by specific fields slower and less accurate (essentially a text search) and I'd rather avoid it.
Another option I tried before hstore is array of UDT, but couldn't even get it to read from database without writing parser for PGobject which doesn't look like a simple task.
EDIT
I had a look at the data in the database and when escaped in the following way:
'{"\"type\"=>\"mobile\", \"number\"=>\"07777 66 55 44\", \"extracted_number\"=>\"777665544\""}'
I can run query manually from SQL editor, but still no luck in Java.

I have found a solution, there is a class available in Postgres driver called HStoreConverter which can convert Map to String literal. Not sure this is the best approach, but it seems to work, modified helper function below.
private Array getHstoreArray(SQLStatement<?> q, Map<String, String>[] maps) {
try {
String[] hstores = new String[maps.length];
for (int i = 0; i < maps.length; i++)
hstores[i] = HStoreConverter.toString(maps[i]);
return q.getContext().getConnection().createArrayOf("hstore", hstores);
} catch (SQLException e) {
throw new IllegalArgumentException(e);
}
}

Full Text Search in CouchDB

I am developing an web application on GWT Framework (JAVA). I am using CouchDB(NoSQL Database)
for storing user profile, user question and answers. I am new in NoSQL Database so i need to implement full text search in my application.
Example : " What is Java ?"
Desired Result : It could be found all the question which have all three words What, is, Java .
So there is any idea how to achieve this result in couchdb.

Use couchdb lucene The integration with couchdb is straightforward and it would be perfect for your use case. Couch-db lucene supports the entire query syntanx of lucene. For your problem the + could be used.
The "+" or required operator requires that the term after the "+" symbol exist somewhere in a the field of a single document.
Here is a sample query
http://localhost:5984/_fti/local/database/_design/design_name/index_name?q=+"What is java"

You can implement it using CouchDB List Functions.
I have a document where I need to search for keywords in name and description field. So, I created a view which will emit doc id as key and doc.name,doc._id,doc.description as value.
Now I created a List function which will use Javascript match function and give me the matching list of doc ids.
Sample Query:
http://localhost:5984/dashboard/_design/testSearch/_list/results/ByName?searchQuery=What is Java
{
"_id": "_design/testSearch",
"lists": {
"results": "function(head, req) { var query= new RegExp(req.query.searchQuery,'i'); var arr=new Array(); var key; var row; while(row = getRow()) { if(row.value[0].match(query) || row.value[2].match(query)) { arr.push([row.value[0].toUpperCase(),row.value[1]]); key = row.key;}} arr.sort(); send('{\"'+key+'\":\"'+arr+'\"}');}"
},
"views": {
"ByName": {
"map": "function (doc) {\n if((doc.isdeleted==\"false\" || doc.isdeleted==false) && doc.userid && doc.name){\n emit(doc._id,[doc.name,doc._id,doc.description]);\n }\n}"
}
},
"language": "javascript"
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Cloudant With Lucene Search Fails To Sort As Expected - java

Related

Query to search value in side array of object

Spring Boot Pagination

Mongo DB Aggregate Query returns in Batches

How to bind an hstore[] value in Java

Full Text Search in CouchDB

Categories

Resources