Not able to Query alphanumeric fields from ELASTIC SEARCH using TERMS QUERY

Not able to Query alphanumeric fields from ELASTIC SEARCH using TERMS QUERY - java

I am trying to query Alphanumeric values from the index using TERMS QUERY, But it is not giving me the output.
Query:
{
"size" : 10000,
"query" : {
"bool" : {
"must" : {
"terms" : {
"caid" : [ "A100945","A100896" ]
}
}
}
},
"fields" : [ "acco", "bOS", "aid", "TTl", "caid" ]
}
I want to get all the entries that has caid A100945 or A100896
The same query works fine for NUmeric fields.
I am not planning to use QueryString/MatchQuery as i am trying to build general query builder that can build query for all the request. Hence am looking to get the entries usinng TERMS Query only.
Note: I am using Java API org.elasticsearch.index.query.QueryBuilders for building the Query.
eg: QueryBuilders.termQuery("caid", "["A10xxx", "A101xxx"]")
Please help.
Regards,
Mik

If you have not customized the mappings/analysis for the caid-field, then your values are indexed as e.g. a100945, a100896 (note the lowercasing.)
The terms-query does not do query-time text-analysis, so you'll be searching for A100945 which does not match a100945.
This is quite a common problem, and is explained a bit more in this article on Troubleshooting Elasticsearch searches, for Beginners.

You better use match query.match query are analyzed[applied default analyzer and query] like
QueryBuilders.matchQuery("caid", "["A10xxx", "A101xxx"]");

Related

Looking for Java code for Mongo aggregation query

Can somebody guide me on the aggregation query in Java for the following Mongo query. I am trying to sum up the distance covered every day by the vehicle. There are some duplicate records (which I cannot eliminate) so I have to use group by to filter them out.
db.collection1.aggregate({ $match: { "vehicleId": "ABCDEFGH", $and: [{ "timestamp": { $gt: ISODate("2022-08-24T00:00:00.000+0000") } }, { "timestamp": { $lt: ISODate("2022-08-25T00:00:00.000+0000") } }, { "distanceMiles": { "$gt": 0 } }] } }, { $group: {"_id": {vehicleId: "$vehicleId", "distanceMiles" : "$distanceMiles" } } }, { $group: { _id: null, distance: { $sum: "$_id.distanceMiles" } } })
If possible can you also suggest some references? I am stuck at the last group by involving $_id part.
The Java code that I have except the last group by is:
Criteria criteria = new Criteria();
criteria.andOperator(Criteria.where("timestamp").gte(start).lte(end),
Criteria.where("vehicleId").in(vehicleIdList));
Aggregation aggregation = Aggregation.newAggregation(Aggregation.match(criteria),
Aggregation.sort(Direction.DESC, "timestamp"),
Aggregation.project("distanceMiles", "vehicleId", "timestamp").and("timestamp")
.dateAsFormattedString("%Y-%m-%d").as("yearMonthDay"),
Aggregation.group("vehicleId", "yearMonthDay").first("vehicleId").as("vehicleId").
first("timestamp").as("lastReported").sum("distanceMiles").as("distanceMiles"));
Note. there is a slight difference between the raw mongo query and the query in Java on the date param.

Generally if you are looking for advice on how to directly convert an aggregation pipeline into Java code (not necessarily using the builders), check out this answer.
I'm not really clear on what component you're currently stuck on though. Is it just the direct translation between the aggregation pipeline and the Java code? Is the aggregation pipeline not giving correct results? You haven't mentioned some information such as driver version that would help us advise further if needed.
A few other general things come to mind that might be worth mentioning:
The sample .aggregate() snippet you provided does not have the square brackets ([ and ]) wrapping the pipeline which would be needed in the shell.
When referencing existing field names, you probably need to prefix them with $ in the Java code similar to how you do in the shell.
You should be able to access the values nested inside of the _id field after the first $group stage using dot notation (eg "$_id.distanceMiles") as you are in the sample aggregation.
Depending on which specific driver you are using, documentation such as this may be helpful with respect to working with the builders.

optimize mongo query to get max date in a very short time

I'm using the query bellow to get max date (field named extractionDate) in a collection called KPI, and since I'm only interested in the field extractionDate:
#Override
public Mono<DBObject> getLastExtractionDate(MatchOperation matchOperation,ProjectionOperation projectionOperation) {
return Mono.from(mongoTemplate.aggregate(
newAggregation(
matchOperation,
projectionOperation,
group().max(EXTRACTION_DATE).as("result"),
project().andExclude("_id")
),
"kpi",
DBObject.class
));
}
And as you see above, I need to filter the result firstly using the match operation (matchOperation) after that, I'm doing a projection operation to extract only the max of field "extractionDate" and rename it as result.
But this query cost a lot of time (sometimes more than 20 seconds) because I have a huge amount of data, I already added an index on the field extractionDate but I did not gain a lot, so I'm looking for a way to mast it fast as max as possible.
update:
Number of documents we have in the collection kpi: 42.8m documents
The query that being executed:
Streaming aggregation: [{ "$match" : { "type" : { "$in" : ["INACTIVE_SITE", "DEVICE_NOT_BILLED", "NOT_REPLYING_POLLING", "MISSING_KEY_TECH_INFO", "MISSING_SITE", "ACTIVE_CIRCUITS_INACTIVE_RESOURCES", "INCONSISTENT_STATUS_VALUES"]}}}, { "$project" : { "extractionDate" : 1, "_id" : 0}}, { "$group" : { "_id" : null, "result" : { "$max" : "$extractionDate"}}}, { "$project" : { "_id" : 0}}] in collection kpi
explain plan:
Example of a document in the collection KPI:
And finally the indexes that already exist on this collection :

Index tuning will depend more on the properties in the $match expression. You should be able to run the query in mongosh with and get an explain plan to determine if your query is scanning the collection.
Other things to consider is the size of the collection versus the working set of the server.
Perhaps update your question with the $match expression, and the explain plan and perhaps the current set of index definitions and we can refine the indexing strategy.
Finally, "huge" is rather subjective? Are you querying millions or billions or documents, and what is the average document size?
Update:
Given that you're filtering on only one field, and aggregating on one field, you'll find the best result will be an index
{ "type":1,"extractionDate":1}
That index should cover your query -- because the $in will mean that a scan will be selected but a scan over a small index is significantly better than over the whole collection of documents.
NB. The existing index extractionDate_1_customer.irType_1 will not be any help for this query.

I was able to optimize the request thanks to previous answers using this approach:
#Override
public Mono<DBObject> getLastExtractionDate(MatchOperation matchOperation,ProjectionOperation projectionOperation) {
return Mono.from(mongoTemplate.aggregate(
newAggregation(
matchOperation,
sort(Sort.Direction.DESC,EXTRACTION_DATE),
limit(1),
projectionOperation
),
"kpi",
DBObject.class
));
}
Also I had to create a compound index on extractionDate and type (the field I had in matchOperation) like bellow:

mongodb java driver pullByFilter

I have document schema such as
{
"_id" : 18,
"name" : "Verdell Sowinski",
"scores" : [
{
"type" : "exam",
"score" : 62.12870233109035
},
{
"type" : "quiz",
"score" : 84.74586220889356
},
{
"type" : "homework",
"score" : 81.58947824932574
},
{
"type" : "homework",
"score" : 69.09840625499065
}
]
}
I have a solution using pull that copes with removing a single element at a time but saw
I want to get a general solution that would cope with irregular schema where there would be between one and many elements to the array and I would like to remove all elements based on a condition.
I'm using mongodb driver 3.2.2 and saw this pullByFilter which sounded good
Creates an update that removes from an array all elements that match the given filter.
I tried this
Bson filter = and(eq("type", "homework"), lt("score", highest));
Bson u = Updates.pullByFilter(filter);
UpdateResult ur = collection.updateOne(studentDoc, u);
Unsurprisingly, this did not have any effect since I wasn't specifying the array scores
I get an error
The positional operator did not find the match needed from the query. Unexpanded update: scores.$.type
when I change the filter to be
Bson filter = and(eq("scores.$.type", "homework"), lt("scores.$.score", highest));
Is there a one step solution to this problem?
There seems very little info on this particular method I can find. This question may relate to How to Update Multiple Array Elements in mongodb

After some more "thinking" (and a little trial and error), I found the correct Filters method to wrap my basic filter. I think I was focusing on array operators too much.
I'll not post it here in case of flaming.
Clue: think "matches..." (as in regex pattern matching) when dealing with Filters helper methods ;)

Like search in Elasticsearch

I am using elasticsearch for filtering and searching from json file and I am newbie in this technology. So I am little bit confused how to write like query in elasticsearch.
select * from table_name where 'field_name' like 'a%'
This is mysql query. How do I write this query in Elasticsearch? I am using elasticsearch version 0.90.7.

I would highly suggest updating your ElasticSearch version if possible, there have been significant changes since 0.9.x.
This question is not quite specific enough, as there are many ways ElasticSearch can fulfill this functionality, and they differ slightly on your overall goal. If you are looking to replicate that SQL query exactly then in this case use the wildcard query or prefix query.
Using a wildcard query:
Note: Be careful with wildcard searches, they are slow. Avoid using wildcards at the beginning of your strings.
GET /my_index/table_name/_search
{
"query": {
"wildcard": {
"field_name": "a*"
}
}
}
Or Prefix query
GET /my_index/table_name/_search
{
"query": {
"prefix": {
"field_name": "a"
}
}
}
Or partial matching:
Note: Do NOT blindly use partial matching, while there are corner cases for it's use, correct use of analyzers is almost always better.
Also this exact query will be equivalent to LIKE '%a%', which again, could be better setup with correct use of mapping and a normal query search!
GET /my_index/table_name/_search
{
"query": {
"match_phrase": {
"field_name": "a"
}
}
}
If you are reading this wondering about querying ES similarly for search-as-you-type I would suggest reading up on edge-ngrams, which relate to proper use of mapping depending on what you are attempting to do =)

GET /indexName/table_name/_search
{
"query": {
"match_phrase": {
"field_name": "your partial text"
}
}
}
You can use "type" : "phrase_prefix" to prefix or post fix you search
Java code for the same:
AndFilterBuilder andFilterBuilder = FilterBuilders.andFilter();
andFilterBuilder.add(FilterBuilders.queryFilter(QueryBuilders.matchPhraseQuery("field_name",
"your partial text")));
Gave 'and filter' example so that you can append extra filters if you want to.
Check this for more detail:
https://www.elastic.co/guide/en/elasticsearch/guide/current/slop.html

Below query I wrote, this is something like
SELECT * FROM TABLE WHERE api='payment' AND api_v='v1' AND status='200' AND response LIKE '%expired%' AND response LIKE '%token%'
Please note table = document here
GET/POST both accepted
GET /transactions-d-2021.06.24/_search
{
"query":{
"bool":{
"must":[
{
"match":{
"api":"payment"
}
},
{
"match":{
"api_v":"v1"
}
},
{
"match":{
"status":"200"
}
},
{
"wildcard":{
"response":"*expired*"
}
},
{
"wildcard":{
"response":"*token*"
}
}
]
}
}
}

Writing a custom bool query worked for me
#Query("{\"bool\":{\"should\":[{\"query_string\":{\"fields\":[\"field_name\"],\"query\":\"?0*\"}}]}}")

Ensuring index for nested repeating entities

I need to enforce unique constraint on a nested document, for example:
urlEntities: [
{ "url" : "http://t.co/ujBNNRWb0y" , "display_url" : "bit.ly/11JyiVp" , "expanded_url" :
"http://bit.ly/11JyiVp"} ,
{ "url" : "http://t.co/DeL6RiP8KR" , "display_url" : "ow.ly/i/2HC9x" ,
"expanded_url" : "http://ow.ly/i/2HC9x"}
]
url, display_url, and expaned_url need to be unique. How to issue ensureIndex command for this condition in MongoDB?
Also, is it a good design to have nested documents like this or should I move them to a separate collection and refer them from here inside urlEntities? I'm new to MongoDB, any best practices suggestion would be much helpful.
Full Scenario:
Say if I have a document as below in the db which has millions of data:
{ "_id" : { "$oid" : "51f72afa3893686e0c406e19"} , "user" : "test" , "urlEntities" : [ { "url" : "http://t.co/64HBcYmn9g" , "display_url" : "ow.ly/nqlkP" , "expanded_url" : "http://ow.ly/nqlkP"}] , "count" : 0}
When I get another document with similar urlEntities object, I need to update user and count fields only. First I thought of enforcing unique constraint on urlEntities fields and then handle exception and then go for an update, else if I check for each entry whether it exists before inserting, it will have significant impact on the performance. So, how can I enforce uniqueness in urlEntities? I tried
{"urlEntities.display_url":1,"urlEntities.expanded_url":1},{unique:true}
But still I'm able to insert the same document twice without exceptions.

Uniqueness is only enforced per document. You can not prevent the following (simplified from your example):
db.collection.ensureIndex( { 'urlEntities.url' : 1 } );
db.col.insert( {
_id: 42,
urlEntities: [
{
"url" : "http://t.co/ujBNNRWb0y"
},
{
"url" : "http://t.co/ujBNNRWb0y"
}
]
});
Similarily, you will have the same problem with a compound unique key for nested documents.
What you can do is the following:
db.collection.insert( {
_id: 43,
title: "This is an example",
} );
db.collection.update(
{ _id: 43 },
{
'$addToSet': {
urlEntities: {
"url" : "http://t.co/ujBNNRWb0y" ,
"display_url" : "bit.ly/11JyiVp" ,
"expanded_url" : "http://bit.ly/11JyiVp"
}
}
}
);
Now you have the document with _id 43 with one urlEntities document. If you run the same update query again, it will not add a new array element, because the full combination of url, display_url and expanded_url already exists.
Also, have a look at the $addToSet query operator's examples: http://docs.mongodb.org/manual/reference/operator/addToSet/

for indexes on nested documents read this.
regarding the second part (nested documents best practices) - it really depends on your business logic and queries. if those nested documents don't make sense as first class entities, meaning you won't be searching for them directly but only in the context of their parent document then having them nested make sense. otherwise you should consider extracting them out.
i think that there isn't absolute answer to your question. read the chapter about indexing... it helped me a lot.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Not able to Query alphanumeric fields from ELASTIC SEARCH using TERMS QUERY - java

You better use match query.match query are analyzed[applied default analyzer and query] like QueryBuilders.matchQuery("caid", "["A10xxx", "A101xxx"]");

Related

Looking for Java code for Mongo aggregation query

optimize mongo query to get max date in a very short time

mongodb java driver pullByFilter

Like search in Elasticsearch

Ensuring index for nested repeating entities

Categories

Resources