Elastic search handling missing indices - java

I would like to know if there is a way to specify to elastic search that I don't mind missing or erroneous indices on my search query. In other words I have a query which tries to query 7 different indices but one of them might be missing depending on the circumstances. What I want to know is that if there is a way to say, forget the broken one and get me the results of the other 6 indices?
SearchRequestBuilder builder = elasticsearchClient.getClient().prepareSearch(indices)
.setQuery(Query.buildQueryFrom(term1, term2))
.addAggregation(AggregationBuilders.terms('term')
.field('field')
.shardSize(shardSize)
.size(size)
.minDocCount(minCount));
As an example query you can find the above one.

Take a look at the ignore_unavailable option, which is part of the multi index syntax. This has been available since at least version 1.3 and allows you to ignore missing or closed indexes when performing searches (among other multi index operations).
It is exposed in the Java API by IndicesOptions. Browsing through the source code, I found there is a setIndicesOptions() method on the SearchRequestBuilder used in the example. You need to pass it an instance of IndicesOptions.
There are various static factory methods on the IndicesOptions class for building an instance with your specific desired options. You would probably benefit from using the more convenient lenientExpandOpen() factory method (or the deprecated version, lenient(), depending on your version) which sets ignore_unavailable=true,allow_no_indices=true, and expand_wildcards=open.
Here is a modified version of the example query which should provide the behavior you are looking for:
SearchRequestBuilder builder = elasticsearchClient.getClient().prepareSearch(indices)
.setQuery(Query.buildQueryFrom(term1, term2))
.addAggregation(AggregationBuilders.terms('term')
.field('field')
.shardSize(shardSize)
.size(size)
.minDocCount(minCount))
.setIndicesOptions(IndicesOptions.lenientExpandOpen());

Have you tried using Index Aliases?
Rather than referring to individual aliases you can specify a single index value. Behind this can be several indexes.
Here I'm adding two indexes to the alias and removing the missing / broken one:
curl -XPOST 'http://localhost:9200/_aliases' -d '
{
"actions" : [
{ "remove" : { "index" : "bad-index", "alias" : "alias-index" } },
{ "add" : { "index" : "good-index1", "alias" : "alias-index" } },
{ "add" : { "index" : "good-index2", "alias" : "alias-index" } }
]
}'

Related

Specifying keyword type on String field

I started using hibernate-search-elasticsearch(5.8.2) because it seemed easy to integrate it maintains elasticsearch indices up to date without writing any code. It's a cool lib, but I'm starting to think that it has a very small set of the elasticsearch functionalities implemented. I'm executing a query with a painless script filter which needs to access a String field, which type is 'text' in the index mapping and this is not possible without enabling field data. But I'm not very keen on enabling it as it consumes a lot of heap memory. Here's what elasticsearch team suggests to do in my case:
Fielddata documentation
Before you enable fielddata, consider why you are using a text field for aggregations, sorting, or in a script. It usually doesn’t make sense to do so.
A text field is analyzed before indexing so that a value like New York can be found by searching for new or for york. A terms aggregation on this field will return a new bucket and a york bucket, when you probably want a single bucket called New York.
Instead, you should have a text field for full text searches, and an unanalyzed keyword field with doc_values enabled for aggregations, as follows:
PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"my_field": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
Unfortunately I can't find a way to do it with the hibernate-search annotations. Can someone tell me if this is possible or I have to migrate to the vanilla elasticsearch lib and not using any wrappers?
With the current version of Hibernate Search, you need to create a different field for that (e.g. you can't have different flavors of the same field). Note that that's what Elasticsearch is doing under the hood anyway.
#Field(analyzer = "your-text-analyzer") // your default full text search field with the default name
#Field(name="myPropertyAggregation", index = Index.NO, normalizer = "keyword")
#SortableField(forField = "myPropertyAggregation")
private String myProperty;
It should create an unanalyzed field with doc values. You then need to refer to the myPropertyAggregation field for your aggregations.
Note that we will expose much more Elasticsearch features in the API in the future Search 6. In Search 5, the APIs are designed with Lucene in mind and we couldn't break them.

Query Elastic document field with and without characters

I have the following documents stored at my elasticsearch index (my_index):
{
"name": "111666"
},
{
"name": "111A666"
},
{
"name": "111B666"
}
and I want to be able to query these documents using both the exact value of the name field as well as a character-trimmed version of the value.
Examples
GET /my_index/my_type/_search
{
"query": {
"match": {
"name": {
"query": "111666"
}
}
}
}
should return all of the (3) documents mentioned above.
On the other hand:
GET /my_index/my_type/_search
{
"query": {
"match": {
"name": {
"query": "111a666"
}
}
}
}
should return just one document (the one that matches exactly with the the provided value of the name field).
I didn't find a way to configure the settings of my_index in order to support such functionality (custom search/index analyzers etc..).
I should mention here that I am using ElasticSearch's Java API (QueryBuilders) in order to implement the above-mentioned queries, so I thought of doing it the Java-way.
Logic
1) Check if the provided query-string contains a letter
2) If yes (e.g 111A666), then search for 111A666 using a standard search analyzer
3) If not (e.g 111666), then use a custom search analyzer that trims the characters of the `name` field
Questions
1) Is it possible to implement this by somehow configuring how the data are stored/indexed at Elastic Search?
2) If not, is it possible to conditionally change the analyzer of a field at Runtime? (using Java)
You can easily use any build in analyzer or any custom analyzer to map your document in elasticsearch. More information on analyzer is here
The "term" query search for exact match. You can find more information about exact match here (Finding Exact Values)
But you can not change a index once it created. If you want to change any index, you have to create a new index and migrate all your data to new index.
Your question is about different logic for the analyzer at index and query time.
The solution for your Q1 is to generate two tokens at index time (111a666 -> [111a666, 111666]) but only on token at query time (111a666 -> 111a666 and 111666 -> 111666).
I.m.h.o. your have to generate a new analyzer like
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern_replace-tokenfilter.html which supported "preserve_original" like https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-capture-tokenfilter.html does.
Or you could use two fields (one with original and one without letters) and search over both.

How to express advanced expressions between query parameters in a REST API?

The problem (or missing feature) is the lack of expression possibility between different query parameters. As I see it you can only specify and between parameters, but how do you solve it if you want to have not equal, or or xor?
I would like to be able to express things like:
All users with age 20 or the name Bosse
/users?age=22|name=Bosse
All users except David and Lennart
/users?name!=David&name!=Lennart
My first idea is to use a query parameter called _filter and take a String with my expression like this:
All users with with age 22 or a name that is not Bosse
/users?_filter=age eq 22 or name neq Bosse
What is the best solution for this problem?
I am writing my API with Java and Jersey, so if there is any special solution for Jersey, let me know.
I can see two solutions to achieve that:
Using a special query parameter containing the expression when executing a GET method. It's the way OData does with its $filter parameter (see this link: https://msdn.microsoft.com/fr-fr/library/gg309461.aspx#BKMK_filter). Here is a sample:
/AccountSet?$filter=AccountCategoryCode/Value eq 2 or AccountRatingCode/Value eq 1
Parse.com also uses such approach with its where parameter but the query is described using a JSON structure (see this link: https://parse.com/docs/rest/guide#queries). Here is a sample:
curl -X GET \
-H "X-Parse-Application-Id: ${APPLICATION_ID}" \
-H "X-Parse-REST-API-Key: ${REST_API_KEY}" \
-G \
--data-urlencode 'where={"score":{"$gte":1000,"$lte":3000}}' \
https://api.parse.com/1/classes/GameScore
If it's something too difficult to describe, you could also use a POST method and specify the query in the request payload. ElasticSearch uses such approach for its query support (see this link: https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html). Here is a sample:
$ curl -XGET 'http://localhost:9200/twitter/tweet/_search?routing=kimchy' -d '{
"query": {
"bool" : {
"must" : {
"query_string" : {
"query" : "some query string here"
}
},
"filter" : {
"term" : { "user" : "kimchy" }
}
}
}
}
'
Hope it helps you,
Thierry
OK so here it is
You could add + or - to include or exclude , and an inclusive filter keyword for AND and OR
For excluding
GET /users?name=-David,-Lennart
For including
GET /users?name=+Bossee
For OR
GET /users?name=+Bossee&age=22&inclusive=false
For AND
GET /users?name=+Bossee&age=22&inclusive=true
In this way the APIs are very intuitive, very readable also does the work you want it to do.
EDIT - very very difficult question , however I would do it this way
GET /users?name=+Bossee&age=22&place=NewYork&inclusive=false,true
Which means the first relation is not inclusive - or in other words it is OR
second relation is inclusive - or in other words it is AND
The solution is with the consideration that evaluation is from left to right.
Hey it seems impossible if you go for queryparams...
If it is the case to have advanced expressions go for PathParams so you will be having regular expressions to filter.
But to allow only a particular name="Bosse" you need to write a stringent regex.
Instead of taking a REST end point only for condition sake, allow any name value and then you need to write the logic to check manually with in the program.

MongoDb - Update collection atomically if set does not exist

I have the following document in my collection:
{
"_id":NumberLong(106379),
"_class":"x.y.z.SomeObject",
"name":"Some Name",
"information":{
"hotelId":NumberLong(106379),
"names":[
{
"localeStr":"en_US",
"name":"some Other Name"
}
],
"address":{
"address1":"5405 Google Avenue",
"city":"Mountain View",
"cityIdInCitiesCodes":"123456",
"stateId":"CA",
"countryId":"US",
"zipCode":"12345"
},
"descriptions":[
{
"localeStr":"en_US",
"description": "Some Description"
}
],
},
"providers":[
],
"some other set":{
"a":"bla bla bla",
"b":"bla,bla bla",
}
"another Property":"fdfdfdfdfdf"
}
I need to run through all documents in collection and if "providers": [] is empty I need to create new set based on values of information section.
I'm far from being MongoDB expert, so I have the few questions:
Can I do it as atomic operation?
Can I do this using MongoDB console? as far as I understood I can do it using $addToSet and $each command?
If not is there any Java based driver that can provide such functionality?
Can I do it as atomic operation?
Every document will be updated in an atomic fashion. There is no "atomic" in MongoDB in the sense of RDBMS, meaning all operations will succeed or fail, but you can prevent other writes interleaves using $isolated operator
Can I do this using MongoDB console?
Sure you can. To find all empty providers array you can issue a command like:
db.zz.find(providers :{ $size : 0}})
To update all documents where the array is of zero length with a fixed set of string, you can issue a query such as
db.zz.update({providers : { $size : 0}}, {$addToSet : {providers : "zz"}})
If you want to add a portion to you document based on a document's data, you can use the notorious $where query, do mind the warnings appearing in that link, or - as you had mentioned - query for empty provider array, and use cursor.forEach()
If not is there any Java based driver that can provide such functionality?
Sure, you have a Java driver, as for each other major programming language. It can practically do everything described, and basically every thing you can do from the shell. Is suggest you to get started from the Java Language Center.
Also there are several frameworks which facilitate working with MongoDB and bridge the object-document world. I will not give a least here as I'm pretty biased, but I'm sure a quick Google search can do.
db.so.find({ providers: { $size: 0} }).forEach(function(doc) {
doc.providers.push( doc.information.hotelId );
db.so.save(doc);
});
This will push the information.hotelId of the corresponding document into an empty providers array. Replace that with whatever field you would rather insert into the providers array.

How to retrieve the Field that "hit" in Lucene

Maybe I'm really missing something.
I have indexed a bunch of key/value pairs in Lucene (v4.1 if it matters). Say I have
key1=value1 and key2=value2, e.g. as read from a properties file.
They get indexed both as specific fields and into a catchall "ALL" field, e.g.
new Field("key1", "value1", aFieldTypeMimickingKeywords);
new Field("key2", "value2", aFieldTypeMimickingKeywords);
new Field("ALL", "key1=value1", aFieldTypeMimickingKeywords);
new Field("ALL", "key2=value2", aFieldTypeMimickingKeywords);
// then get added to the Document of course...
I can then do a wildcard search, using
new WildcardQuery(new Term("ALL", "*alue1"));
and it will find the hit.
But, it would be nice to get more info, like "what was complete value (e.g. "key1=value1") that goes with that hit?".
The best I can figure out it to get the Document, then get the list of IndexableFields, then loop over all of them and see if the field.stringValue().contains("alue1"). (I can look at the data structures in the debugger and all the info is there)
This seems completely insane cause isn't that what Lucene just did? Shouldn't the Hit information return some of the Fields?
Is Lucene missing what seems like "obvious" functionality? Google and starting at the APIs hasn't revealed anything straightforward, but I feel like I must be searching on the wrong stuff.
You might want to try with IndexSearcher.explain() method. Once you get the ID of the matching document, prepare a query for each field (using the same search keywords) and invoke Explanation.isMatch() for each query: the ones that yield true will give you the matched field. Example:
for (String field: fields){
Query query = new WildcardQuery(new Term(field, "*alue1"));
Explanation ex = searcher.explain(query, docID);
if (ex.isMatch()){
//Your query matched field
}
}

Categories