Like search in Elasticsearch - java

I am using elasticsearch for filtering and searching from json file and I am newbie in this technology. So I am little bit confused how to write like query in elasticsearch.
select * from table_name where 'field_name' like 'a%'
This is mysql query. How do I write this query in Elasticsearch? I am using elasticsearch version 0.90.7.

I would highly suggest updating your ElasticSearch version if possible, there have been significant changes since 0.9.x.
This question is not quite specific enough, as there are many ways ElasticSearch can fulfill this functionality, and they differ slightly on your overall goal. If you are looking to replicate that SQL query exactly then in this case use the wildcard query or prefix query.
Using a wildcard query:
Note: Be careful with wildcard searches, they are slow. Avoid using wildcards at the beginning of your strings.
GET /my_index/table_name/_search
{
"query": {
"wildcard": {
"field_name": "a*"
}
}
}
Or Prefix query
GET /my_index/table_name/_search
{
"query": {
"prefix": {
"field_name": "a"
}
}
}
Or partial matching:
Note: Do NOT blindly use partial matching, while there are corner cases for it's use, correct use of analyzers is almost always better.
Also this exact query will be equivalent to LIKE '%a%', which again, could be better setup with correct use of mapping and a normal query search!
GET /my_index/table_name/_search
{
"query": {
"match_phrase": {
"field_name": "a"
}
}
}
If you are reading this wondering about querying ES similarly for search-as-you-type I would suggest reading up on edge-ngrams, which relate to proper use of mapping depending on what you are attempting to do =)

GET /indexName/table_name/_search
{
"query": {
"match_phrase": {
"field_name": "your partial text"
}
}
}
You can use "type" : "phrase_prefix" to prefix or post fix you search
Java code for the same:
AndFilterBuilder andFilterBuilder = FilterBuilders.andFilter();
andFilterBuilder.add(FilterBuilders.queryFilter(QueryBuilders.matchPhraseQuery("field_name",
"your partial text")));
Gave 'and filter' example so that you can append extra filters if you want to.
Check this for more detail:
https://www.elastic.co/guide/en/elasticsearch/guide/current/slop.html

Below query I wrote, this is something like
SELECT * FROM TABLE WHERE api='payment' AND api_v='v1' AND status='200' AND response LIKE '%expired%' AND response LIKE '%token%'
Please note table = document here
GET/POST both accepted
GET /transactions-d-2021.06.24/_search
{
"query":{
"bool":{
"must":[
{
"match":{
"api":"payment"
}
},
{
"match":{
"api_v":"v1"
}
},
{
"match":{
"status":"200"
}
},
{
"wildcard":{
"response":"*expired*"
}
},
{
"wildcard":{
"response":"*token*"
}
}
]
}
}
}

Writing a custom bool query worked for me
#Query("{\"bool\":{\"should\":[{\"query_string\":{\"fields\":[\"field_name\"],\"query\":\"?0*\"}}]}}")

Related

Looking for Java code for Mongo aggregation query

Can somebody guide me on the aggregation query in Java for the following Mongo query. I am trying to sum up the distance covered every day by the vehicle. There are some duplicate records (which I cannot eliminate) so I have to use group by to filter them out.
db.collection1.aggregate({ $match: { "vehicleId": "ABCDEFGH", $and: [{ "timestamp": { $gt: ISODate("2022-08-24T00:00:00.000+0000") } }, { "timestamp": { $lt: ISODate("2022-08-25T00:00:00.000+0000") } }, { "distanceMiles": { "$gt": 0 } }] } }, { $group: {"_id": {vehicleId: "$vehicleId", "distanceMiles" : "$distanceMiles" } } }, { $group: { _id: null, distance: { $sum: "$_id.distanceMiles" } } })
If possible can you also suggest some references? I am stuck at the last group by involving $_id part.
The Java code that I have except the last group by is:
Criteria criteria = new Criteria();
criteria.andOperator(Criteria.where("timestamp").gte(start).lte(end),
Criteria.where("vehicleId").in(vehicleIdList));
Aggregation aggregation = Aggregation.newAggregation(Aggregation.match(criteria),
Aggregation.sort(Direction.DESC, "timestamp"),
Aggregation.project("distanceMiles", "vehicleId", "timestamp").and("timestamp")
.dateAsFormattedString("%Y-%m-%d").as("yearMonthDay"),
Aggregation.group("vehicleId", "yearMonthDay").first("vehicleId").as("vehicleId").
first("timestamp").as("lastReported").sum("distanceMiles").as("distanceMiles"));
Note. there is a slight difference between the raw mongo query and the query in Java on the date param.
Generally if you are looking for advice on how to directly convert an aggregation pipeline into Java code (not necessarily using the builders), check out this answer.
I'm not really clear on what component you're currently stuck on though. Is it just the direct translation between the aggregation pipeline and the Java code? Is the aggregation pipeline not giving correct results? You haven't mentioned some information such as driver version that would help us advise further if needed.
A few other general things come to mind that might be worth mentioning:
The sample .aggregate() snippet you provided does not have the square brackets ([ and ]) wrapping the pipeline which would be needed in the shell.
When referencing existing field names, you probably need to prefix them with $ in the Java code similar to how you do in the shell.
You should be able to access the values nested inside of the _id field after the first $group stage using dot notation (eg "$_id.distanceMiles") as you are in the sample aggregation.
Depending on which specific driver you are using, documentation such as this may be helpful with respect to working with the builders.

Query Elastic document field with and without characters

I have the following documents stored at my elasticsearch index (my_index):
{
"name": "111666"
},
{
"name": "111A666"
},
{
"name": "111B666"
}
and I want to be able to query these documents using both the exact value of the name field as well as a character-trimmed version of the value.
Examples
GET /my_index/my_type/_search
{
"query": {
"match": {
"name": {
"query": "111666"
}
}
}
}
should return all of the (3) documents mentioned above.
On the other hand:
GET /my_index/my_type/_search
{
"query": {
"match": {
"name": {
"query": "111a666"
}
}
}
}
should return just one document (the one that matches exactly with the the provided value of the name field).
I didn't find a way to configure the settings of my_index in order to support such functionality (custom search/index analyzers etc..).
I should mention here that I am using ElasticSearch's Java API (QueryBuilders) in order to implement the above-mentioned queries, so I thought of doing it the Java-way.
Logic
1) Check if the provided query-string contains a letter
2) If yes (e.g 111A666), then search for 111A666 using a standard search analyzer
3) If not (e.g 111666), then use a custom search analyzer that trims the characters of the `name` field
Questions
1) Is it possible to implement this by somehow configuring how the data are stored/indexed at Elastic Search?
2) If not, is it possible to conditionally change the analyzer of a field at Runtime? (using Java)
You can easily use any build in analyzer or any custom analyzer to map your document in elasticsearch. More information on analyzer is here
The "term" query search for exact match. You can find more information about exact match here (Finding Exact Values)
But you can not change a index once it created. If you want to change any index, you have to create a new index and migrate all your data to new index.
Your question is about different logic for the analyzer at index and query time.
The solution for your Q1 is to generate two tokens at index time (111a666 -> [111a666, 111666]) but only on token at query time (111a666 -> 111a666 and 111666 -> 111666).
I.m.h.o. your have to generate a new analyzer like
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern_replace-tokenfilter.html which supported "preserve_original" like https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-capture-tokenfilter.html does.
Or you could use two fields (one with original and one without letters) and search over both.

Query opitimization in MongoDB using Java

At first let me introduce my use case - I have a collection with documents where I store my XML requests and corresponding responses. Also each document has plenty of accompanying properties and some of them are indexed, but request and response aren't.
Whenever I search using indexed field the performance is sufficient. But there are situations where I have to prepare search using a regular expression basing on the request or response value.
For now I do something like this:
db.traffic.find(
{ $or:
[ { request: { $regex: "some.* code=\"123\"} },
{ response: { $regex: "some.* code=\"123\"} }] })
and I translated this into Java code. But querying is to slow and it takes significant amount of time comparing to other queries.
I can see two solutions:
indexing requests and responses - but I suppose that this is not a good idea as they are really long and most probably index will be huge.
querying using some indexed field first and then applying already mentioned query but in the descending order of found records and picking the very first found so I would like to do something like
db.traffic.find({"conversationID": { $regex: "vendorName" }}).sort({"counter": -1})
.findOne(
{ $or:
[ { request: { $regex: "some.* code=\"123\"} },
{ response: { $regex: "some.* code=\"123\"} }] })
So wrapping up - my question is: should I choose the simpler solution which is indexing requests and responses? And what impact will it have on the size of my index?
Or should I choose the second way? But is my code correct and does it what I want?
Have you tried an explain on both methods?
Use the mongo shell to test queries, and add explain before the query, so:
db.traffic.explain()...
try both and you should get some information that indicates the direction.
In the end I tried the second solution but I had slightly change it because of the fact that I can't run findOne on the query result. But I found the equivalent syntax.
So it looks now sth like this:
db.traffic.findOne($query: { $or:
[ { request: { $regex: "some.* code=\"123\"} },
{ response: { $regex: "some.* code=\"123\"} }] },
$orderby: { "counter": -1})
and the performance is much better now.
Also I used explain to check the real "speed".

Not able to Query alphanumeric fields from ELASTIC SEARCH using TERMS QUERY

I am trying to query Alphanumeric values from the index using TERMS QUERY, But it is not giving me the output.
Query:
{
"size" : 10000,
"query" : {
"bool" : {
"must" : {
"terms" : {
"caid" : [ "A100945","A100896" ]
}
}
}
},
"fields" : [ "acco", "bOS", "aid", "TTl", "caid" ]
}
I want to get all the entries that has caid A100945 or A100896
The same query works fine for NUmeric fields.
I am not planning to use QueryString/MatchQuery as i am trying to build general query builder that can build query for all the request. Hence am looking to get the entries usinng TERMS Query only.
Note: I am using Java API org.elasticsearch.index.query.QueryBuilders for building the Query.
eg: QueryBuilders.termQuery("caid", "["A10xxx", "A101xxx"]")
Please help.
Regards,
Mik
If you have not customized the mappings/analysis for the caid-field, then your values are indexed as e.g. a100945, a100896 (note the lowercasing.)
The terms-query does not do query-time text-analysis, so you'll be searching for A100945 which does not match a100945.
This is quite a common problem, and is explained a bit more in this article on Troubleshooting Elasticsearch searches, for Beginners.
You better use match query.match query are analyzed[applied default analyzer and query] like
QueryBuilders.matchQuery("caid", "["A10xxx", "A101xxx"]");

MongoDb - Update collection atomically if set does not exist

I have the following document in my collection:
{
"_id":NumberLong(106379),
"_class":"x.y.z.SomeObject",
"name":"Some Name",
"information":{
"hotelId":NumberLong(106379),
"names":[
{
"localeStr":"en_US",
"name":"some Other Name"
}
],
"address":{
"address1":"5405 Google Avenue",
"city":"Mountain View",
"cityIdInCitiesCodes":"123456",
"stateId":"CA",
"countryId":"US",
"zipCode":"12345"
},
"descriptions":[
{
"localeStr":"en_US",
"description": "Some Description"
}
],
},
"providers":[
],
"some other set":{
"a":"bla bla bla",
"b":"bla,bla bla",
}
"another Property":"fdfdfdfdfdf"
}
I need to run through all documents in collection and if "providers": [] is empty I need to create new set based on values of information section.
I'm far from being MongoDB expert, so I have the few questions:
Can I do it as atomic operation?
Can I do this using MongoDB console? as far as I understood I can do it using $addToSet and $each command?
If not is there any Java based driver that can provide such functionality?
Can I do it as atomic operation?
Every document will be updated in an atomic fashion. There is no "atomic" in MongoDB in the sense of RDBMS, meaning all operations will succeed or fail, but you can prevent other writes interleaves using $isolated operator
Can I do this using MongoDB console?
Sure you can. To find all empty providers array you can issue a command like:
db.zz.find(providers :{ $size : 0}})
To update all documents where the array is of zero length with a fixed set of string, you can issue a query such as
db.zz.update({providers : { $size : 0}}, {$addToSet : {providers : "zz"}})
If you want to add a portion to you document based on a document's data, you can use the notorious $where query, do mind the warnings appearing in that link, or - as you had mentioned - query for empty provider array, and use cursor.forEach()
If not is there any Java based driver that can provide such functionality?
Sure, you have a Java driver, as for each other major programming language. It can practically do everything described, and basically every thing you can do from the shell. Is suggest you to get started from the Java Language Center.
Also there are several frameworks which facilitate working with MongoDB and bridge the object-document world. I will not give a least here as I'm pretty biased, but I'm sure a quick Google search can do.
db.so.find({ providers: { $size: 0} }).forEach(function(doc) {
doc.providers.push( doc.information.hotelId );
db.so.save(doc);
});
This will push the information.hotelId of the corresponding document into an empty providers array. Replace that with whatever field you would rather insert into the providers array.

Categories